Chemical Aspects of Synthetic Biology Luisi 2007

Chemical Aspects of Synthetic Biology Luisi 2007

(Parte 2 de 4)

Although our criterion of folding should be considered at this point is an approximate one, 20% is a surprisingly high figure. It suggests that folding is indeed a general property, something that arises naturally, even for proteins of medium length.

The characterization of some of those folded proteins has begun, and, in Fig. 1,t he circular dichroic properties in the far UV region of two of them, labeled preliminarly as A and B, are shown (for the primary sequence, see the original reference [15]). It is apparent that, in both, a significant percentage of periodic structure, a-helix in particular, is present, and, furthermore, very interestingly, the globular folding is thermoreversible, indicating that is under thermodynamic control.

In Fig. 2, the computed folding [15b] of these two proteins are illustrated, according to the analysis carried out by Dr. Fabio Polticelli in Roma3. Although the computational method used (Rosetta) is the most reliable of present day6s literature, such threedimensional drawings cannot be considered yet as the definitive structure; for the actual structure, one should await for NMR or X-ray data.

We have now about one dozen of such computed structures, and, although one should wait before attempting generalizations, it appears rather safe at this point to state that folding and thermodynamic stability are not properties that are restricted to our extant proteins, and that, on the contrary, they appear to be rather common features of randomly created polypeptides.

On the basis of this, one is tempted to propose that 5our6 proteins do not belong to a class of polypeptides with privileged physical properties. And, by inference, one could say that this kind of data, once confirmed by a larger number of cases, permit to brake a lance in favor of the scenario of contingency.

Of course, the NBPs may also have bio-technological importance, and may be also very interesting from the structural point of view: could they, for example, display novel catalytic and structural features that have not been observed in 5our6 proteins? The answer to these question must await for much more data.

Fig. 1. CD Spectrum of two $never born proteins). Note the significant content of secondary structure and the reversibility of the folding with temperature. For a detailed description of these CD experiments, see [15b]. The predicted tertiary structure of the two proteins is shown in Fig. 2.

Fig. 2. The structure prediction of the two $never born proteins) firstly characterized. The predicted structures appear to be qualitatively in agreement with the spectroscopic data of CD and fluorescence.

1.3. The Case of Never Born RNAs. There is an important addition to be made to the above synthetic-biology program: this is that the synthesis of NBPs is automatically accompanied by the synthesis of the corresponding m-RNAs. This permits to tackle the question whether and to what extent such totally random RNAs are going to be folded. We have conducted an analysis of several randomly chosen 5never born6 RNAs [15c][15d], and found indeed an extensive folding, which we could partly classify in different classes by utilizing an ad hoc developed method of analysis, based on a nuclease enzyme (S1) coupled with a temperature gradient (the 5Foster assay6; see [15c]). Particularly interesting was the observation of an RNA structure which did not unfold at temperatures as high as 608. This led us to the hypothesis that thermoresistant RNA structures may not be so rare.

Now, we are developing the work on NBPs and corresponding RNAs in two directions: one is the characterization of the already made NBPs: we would like to obtain a large number of NBP structures so as to have a statistically significant display. The other direction of work is to prepare another library of NBPs, this time with a length of 20 residues. In this case, we will be mostly looking for primitive forms of catalysis, which, given the small size, is not expected to be exceptional, but relevant for the origin of life (see the following project). A length of 20 amino acid residues corresponds to a corresponding RNA length of 60 nucleotides, and this is a particularly interesting size, i.e., close to most of the ribozymes6 sizes.

2. Synthesis of Polypeptides under Simulated Prebiotic Evolution. – 2.1. The Status of the Matter. The previous Section addresses the question of the frequency of foldable chains in a library of totally random de novo polypeptides, whereby such chains have been obtained by modern molecular-biology techniques. In this respect, we were interested in the properties of such polypeptides, and not on the chemistry of their formation under prebiotic conditions.

Therefore, one of the main questions about the origin of macromolecules remains open: how have multiple copies of identical long specific chains been produced? Again, that the polypeptides came from long nucleic acids is not an answer, as the question would then be referred to the etiology of specific sequences of polynucleotides.

The synthesis of long homo-polypeptides or homo-polynucleotides, i.e., chains containing only one type of residue [16a–c][17], has been described, but this does not solve the problem. The problem is the synthesis of co-oligopeptides, i.e., chains containing different amino acid residues (or nucleotides), and it is well-known from standard theory of copolymerization that the synthetic procedures valid for homopolymers are generally not applicable to the synthesis of a mixture of co-monomers, and that the monomer composition in the copolymer can be significantly different from that in the starting monomer mixture [9]. Furthermore, even if all amino acids present in the mixture would be polymerized with the same probability – the case of an ideal copolymerisation – one would obtain copolymers with a random distribution of residues along the chain, which is not what we want. A method that produces long copolymers with random composition is the one used by Fox and co-workers [18], which also – aside from the problem of the lack of characterization of these compounds – does not solve the problem of the synthesis of identical chains.

In fact, if one searches in the literature for the prebiotic syntheses (Merrifield method excluded) of relatively long co-oligopeptides (say at least 30 residues, so that they partly begin to assume a stable folding), one finds almost nothing. Some references are collected in [9].

The group of Auguste Commeyras has approached the problem of the prebiotic formation of peptides by using the condensation of N-carboxy anhydrides (NCA) [19][20], a method that, according to the authors, is prebiotic; but also in this case the critical question of the production of multiple identical copies of long (30 residues or more) co-oligopeptides could not yet be achieved.

How then can one conceive a co-polymerization scheme which produces, for example, lysozyme-kind of molecules? This question forms the basis of our next chemical synthetic-biology project. 2.2. The Underlying Model. We need first a work hypothesis for the formation of multiple copies of identical long co-oligopeptide chains. One such hypothesis is contained in our research project, conducted in collaboration with Peter Strazewski and Peter Goekjian at the University of Lyon, France. The basic idea is that such a chain elongation proceeds by successive fragment condensation of prebiotically formed short co-oligopetides (i.e., peptide-bond formation, i.e., the reverse reaction of the peptide bond hydrolysis) as indicated in [9]. In particular, the synthesis of short peptides is realized by the prebiotic NCA condensation. A key assumption is that, in this random library, some peptides may arise, which possess proteolitic activity. Further, one assumes that fragment condensation may be induced by the catalytic action of such peptides.

How realistic are these two assumptions? It has been already reported that even simple peptides may be endowed with proteolytic activity. For example, His–Ser appears to be capable of cleaving peptide and nucleic acid bonds [21]; and even Gly– Gly [2] appears to posses some catalytic activity.

Thus, the idea that a random family of peptides containing Ser and His may possess proteolitic activity is not so unreasonable. In our case, we need, however, the reverse reaction, i.e., the synthesis of peptide bonds. Here again, it is known that, in principle, proteolitic enzymes are capable of inducing peptide-bond formation. Extensive review articles have been presented in the past by Jakubke et al. [23], and by others, including our own group [24]; and, within the field of the origin of life, scenarios of alternate dry and wet environments have been theoretically proposed as conditions for bondformation and chain elongation [20].

One may then consider to start from a prebiotic library of, say, decapeptides (this length is quite possible with the NCA method) and proceed with fragment condensation induced by catalytically active peptides.

It should be taken into consideration that the random condensation of all partners of a medium-size library of co-oligopeptides, after a few condensation steps, would give rise to an astronomic number of longer chains. The selection of only one or very few chain configurations out of this random library is possible only in the presence of some stringent selection criteria. Which selection criteria might have been possible in the prebiotic scenario? Clearly, only those based upon chemo-physical properties and chemo-physical conditions.

Thus, we arrive at another key assumption of our working program: The idea is that the selection is governed by the contingency of the environmental conditions, such as pH, solubility, temperature, salinity, etc. Contingently upon these conditions, the largest majority of the library structures may be eliminated (e.g., by lack of solubility, or due to aggregation), and only a few chain products may 5survive6 in solution, undergoing then further elongation in solution. Thus, the selection criteria conceived in our work is one that is assumed to simulate the natural chemical evolution – in particular a kind of survival of the best fit as governed by the interplay of contingent conditions (the actual pH or salinity or temperature operating at that moment of growth) and the actual physical properties of the candidate chains. Fig. 3, taken from [9], gives an illustration of this process.

We have started experiments on this project, and they were already successful in obtaining a large series of relatively long NCA condensates starting from mixtures of amino acids. The problem is now to produce co-oligopeptides containing His and other catalytically important residues. As already mentioned, co-polymerization is not as easy as the polymerization of just one amino acid: starting the NCA condensation from a 1:1:1:1 mixture of four different NCA amino acids, the composition in the

Fig. 3. The fragment condensation scheme under simulated prebiotic environmental conditions. This illustration shows how the initial library of n decapeptides may contain some compounds endowed with catalytic activity (indicated by an asterisk), and how then the ideal mutual condensation of these n decapeptides gives rise ideally to n 20-mers, of which only m are capable of 5surviving6, being soluble in H O under the given conditions. These, in turn, give rise to m 40-residues-long peptides, of which a large number are insoluble under the given environmental conditions, and so on.

copolymer may bear no relation with the initial composition of the monomer mixture. One generally obtains a library of products varying both in composition and in primary sequences, and conditions should still be worked out that permit the synthesis of a definite family of co-oligopeptides with a specific sequence (which does not have to be pre-ordered).

The next step, after having checked the reproducibility of the poly-condensation of

NCAs, would be the search for the enzymatic activity in the products. And the step following that would be the attempt at fragment condensation catalyzed by such peptides.

Is it then realistic to expect that, by this method, a sizable concentration of a given long co-oligopeptide would be synthesized? The answer appears to be positive on the basis of the procedure described in the next Section, which reports the fragment condensation of a 4-residues-long de novo protein – although not based on peptide catalysts, but preliminarily only on peptide synthesis.

The whole research program is still in the initial phase, and most of the critical steps must still be worked out. It is a program of chemical synthetic biology, as we are simulating the chemistry that possibly occurred under prebiotic molecular evolution, thus reproducing a biology process. The chain elongation would proceed with a reduction of the chain candidates due to the environmental conditions, and eventually we would then obtain a sizable amount of a given, although not a priori programmed, co-oligo-polypeptide sequence. 2.3. A Preliminary Experimental Implementation. We decided to verify the validity of the theoretical scheme of the previously exposed project by utilizing, instead of catalytic peptides which are not yet at our disposal, an organic-chemistry fragment condensation based on the Merrifield solid-phase synthesis.

This was the work carried out in a Ph. D. program by Salvo Chessari under the assistance of Richard Thomas in my laboratory at the ETH-ZFrich, and recently published [25].

First, two parent 40-residue peptides, P1 and P2, were designed randomly but with the constraint that the relative abundance of the 20 amino acids used in their

construction maintained a 1:1:1relationship.

A matrix, A·B, of 16 20-residue peptides was constructed by the systematic combination of two small libraries A and B each comprising four ten-residue peptide sequences (Fig. 4). The 16 20-residue sequences arrived at in this way were synthesized by the solid-phase method. The peptide products were subjected to selection on the basis of their solubility in

H2O under well-defined conditions.

Tris buffer in the pH range of 5.2–8.6; A1B3 and A3B3 were insoluble, whereas A2B3 was totally soluble, in contrast to prediction. The subsets (A·B)s that fulfilled the mentioned criterion of being soluble in H2O were then subjected to chain elongation by combination with a further small set of 20-residue sequences, C (Fig. 4), giving rise to the new library C·(A·B)s consisting of 16 peptides which are 40-residues long.

None of the latter were soluble in aqueous buffer, but two of them, A1B2C1 and

A2B2C1, turned out to be soluble in 6m guanidinium chloride (GuCl). The addition of a polar N-terminal extension to them (DE) resulted in the 4-residue sequences

DE-A1B2C1 and DE-A2B2C1. Of these two samples, only the latter was soluble in H2O. The whole sequence of this peptide is:

DDDE |fflfflfflffl{zfflfflfflffl}

Polar extension

WARCFLYHQTQSWREIMYHS |fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl}

(Parte 2 de 4)