The Covalent Structure of Collagen

The Covalent Structure of Collagen

(Parte 1 de 3)

Eur. J. Biochem. 59, 113- 118 (1975)

The Covalent Structure of Collagen

The Amino-Acid Sequence of a2-CB4 from Calf-Skin Collagen

Peter P. FIETZEK and Friedrich W. REXRODT Max-Planck-Institut fur Biochemie, Martinsried bei Munchen

(Received May 2/July 28, 1975)

Sequencing of chymotrypsin, trypsin, collagenase- and hydroxylamine-derived peptides, using the automated Edman degradation procedure, yielded the complete amino acid sequence of a2-CB4 from calf skin collagen (321 residues). Together with the data from earlier work, an uninterrupted sequence in the helical region of the a2-chain from residues 1 - 393 is now known. Glycine is found in every third position of the peptide. Hydroxylation of proline and lysine occurs only in the Y-position of the triplet Gly-X-Y and is not complete in every position. Some residues, such as glutamic acid, leucine, phenylalanine and arginine, are distributed non-randomly between the X and Y-positions and this non-random distribution is different in the a1 and a2-chains. Comparison of the N-terminal 393 residues from the helical region of the a1 and a2-chains revealed a nearly identical distribution of charged polar residues arginine, lysine, glutamic and aspartic acids. The distribution of the triplet Gly-Pro-Hyp is similar in both chains. The remaining residues in the a2-chain exhibit a high degree of substitutions when compared with those in the El-chain. Approximately one in every two residues in both the X and Y-positions are substituted.

Type I collagen, which is the main protein con- stituent of skin, bone and tendon, consists of one a2-chain and two al-chains. The amino acid sequence of the entire al-chain is known from studies on col- lagen extracted from calf and rat skin [l]. The cya- nogen bromide peptides of the a2-chain of calf skin [2] have been isolated and characterized and their order along the a2-chain determined i.e. 1-0-4-2-3.5) [3,4]. The amino acid sequences of the three small cyanogen bromide peptides a2-CB0, a2-CB1 and a2- CB2 with 3, 12 and 30 residues respectively, are known [4,5]. From the remaining two large peptides a2-CB4 (321 residues) and a2-CB3.5 (568 residues) only 42 and 36 residues respectively have been sequenced from their N-terminal ends [6,7]. Treatment of native collagen with crude bacterial collagenase has yielded native fragments [8], which when denatured produced a2-chain fragments, four of which have been isolated (P. P. Fietzek and D. Breitkreutz, unpublished). Two of the fragments ct2(260) and a2(190), which cover large regions of r2-CB4 and a2-CB3.5, have been sequenced at their N-terminal ends (Fietzek and Breit- kreutz, unpublished). ~. - - . -

Ahhrrviurions. CB-peptides. cyanogen-bromide-derived pep- tides; > PhNCS. phenylthiohydantoin derivative; Quadrol, N.N-

”,A”- tetrakis(2-hydroxypropyl)-et hylenediamine. O1;yme. Carboxypeptidase C (EC

In a preceding paper the isolation and purification of chymotrypsin, trypsin and hydroxylamine-derived peptides from a2-CB4 of calf skin collagen wcre de- scribed [lo]. The present paper reports the sequencing of these peptides and the elucidation of the complete amino acid sequence of ct2-CB4. Together with data published earlier, there are now known 393 consecu- tive residues from the N-terminal end of the helical portion of the a2-chain of calf skin collagen.

The preparation of all peptides used for sequencing has been described in detail in a preceding paper [lo].

Sequence analysis was performed automatically in a sequencer (model 890 from Beckman Instruments, Palo Alto, Calif., U.S.A.). The normal cup as well as the undercut cup was used. The degradation pro- grams with Quadrol and with dimethylallylamine as buffer substances were supplied by Beckman Instru- ments (Palo Alto). Peptides were dissolved in water and introduced into the reaction cup with a pasteur pipette, and by alternate application of a nitrogen stream and vacuum a dry film was formed on the wall of the cup. For each peptide, the amount used and the degradation program applied is given in

114 Amino-Acid Sequence of a2-CB4

Table 1. Before subjecting to sequential analysis, five peptides were reacted with Braunitzer's reagent 3 (3-isothiocyano-l,5-naphthalene disulfonic acid di- sodium salt from Pierce Chem.), in order to decrease their solubility in the organic solvents used. The coupling reaction with Braunitzer's reagent 3 was per- formed outside the cup, as previously described [l]. All reagents and solvents used were purchased from Beckman (Palo Alto, U.S.A.). Phenylthiohydantoin derivatives from Pierce Chemicals (U.S.A.) were used as standards.

The phenylthiohydantoin derivatives formed [I 21 were identified by thin-layer or gas-liquid chromatog- raphy or after hydrolysis, as amino acids on an amino acid analyzer. Thin-layer chromatography was carried out on silica-gel plates (F254 from Merck AG, Darm- stadt, Germany), using the H-system (ethylene chlo- ride/acetic acid) of Edman [13]. For gas-liquid chro- matography, columns containing 10 % SP 400 (Beck- man, U.S.A.) were used in either a Beckman GC-45 or a Hewlett-Packard gas-chromatograph 5700 A, equipped with automatic sample application. Hydrol- ysis of phenylthiohydantoin derivatives to amino acids was carried out using 6 N HC1 at 130 "C for 24 h under nitrogen. The hydrolyses were examined on an amino acid analyzer model Multichrom from

Beckman (Munich, Germany), or on a modelD500 from Durrum (Palo Alto, U.S.A.).

Usually all samples from the sequencer were in- vestigated by thin-layer and gas-liquid chromatog- raphy. Except for the discrimination betwccn iso- leucine and leucine, the phenylthiohydantoin deriva- tives of all apolar residues (Hyp, Pro, Gly, Ala, Val, Ile, Leu and Phe) were determined by thin-layer and gas-liquid chromatography. Since there was no sepa- ration of Ile > PhNCS and Leu > PhNCS in the gas chromatographic system used, and on thin-layer chromatography, both phenylthiohydantoin deriva- tives were little resolved, hydrolysis to isoleucine or followed by identification on the amino acid analyzer was always used. Serine and threonine were best identified on thin-layer chromatography when im- mediately prepared after liberation from the peptide. The charged polar residues, glutamic and aspartic acid and lysine, were identified on thin-layer chro- matography. Asparagine and glutamine derivatives were not always well separated on thin-layer chro- matography, therefore they were hydrolyzed and identified on the amino acid analyzer which clearly distinguished between the resulting aspartic and glutamic acids. Hydroxylysine, histidine and arginine were routinely identified on the amino acid analyzer after hydrolysis of their phenylthiohydantoin deriva- tives.

Peptide T17 (1 pmol) was treated with carboxy- peptidase C (from Roth, Karlsruhe, Germany) as described earlier [14]. Peptide T20 was sequenced using a solid-phase sequenator ; experimental details will appear later (E. Wachter et al., unpublished).

A list of all peptides used for sequence determi- nation, their size, the number of degradation experi- ments performed, the amount used in micromoles, the degradation program applied and the determined positions within the peptide ct2-CB4 are shown in

Table 1. The same peptides, assembled in their order within a2-CB4, are depicted in Fig.1, the length of the lines corresponding to the number of amino acid residues in each individual peptide. The filled blocks represent those parts of the peptides which were sequenced. A detailed display of the sequence determi- nation is shown in Fig. 2.

The identities of the residues that occupy each position along the a2-CB4 peptide were determined by sequencing the peptides shown on the right of Fig.2. The majority of positions were determined by the sequence data of two or more peptides (e.g. positions 121 123 from peptides C3-4, C4 and HA3-4). Wherc only one peptide was sequenced (e.g. positions 217 -248 from peptide HA4) the peptide was prepared and at least two independent degrada- tions were performed on different preparations. The last three residues Wosition 3 19 - 321) were determined by sequencing peptide T20, using a solid-phase sequenator.

Position 302 could not be identified by automated sequence analysis and therefore peptide T17 was digested with carboxypeptidase C. After 6, 12, 30 and 60 min, there was a fast release or arginine followed by a release of leucine and after 9 h a small amount of glycine was detected. These results indicated the sequence Gly-Leu-Arg.

The amino acid sequence of a2-CB4 from calf skin collagen was determined by the automated Edman degradation procedure. Of the 321 residues, only one (leucine in position 302) could not be determine using this automated procedure. The identity of position 302 was deduced from the action of carboxypeptidase on peptide T17. It was further substantiated by the amino acid composition of peptide T17.

The amino acid sequence described here is in complete agreement with the amino acid composition of the chymotrypsin, trypsin and hydroxylamine- derived peptides as described in the preceding paper

[lo] (a reagent used to cleave Asn-Gly bonds). This served as additional confirmation of the sequence reported here.

P. P. Fietzek and F. W. Rexrodt 115

Table 1. Peptides used for sequential analyses Programs 1 1070 and 072172B employed the Quadrol buffer system. Programs 032671 and 090872 employed the dimethylallylamine buffer system. BR3: peptides were coupled with Braunitzer’s reagent 3 and then program no. 072172B was used. Numbers in parentheses refer to the quantity of the peptide used in pmol; the number before the parentheses indicates the number of times a peptide was sequenced

Peptide Number Positions Beckman sequencer program .- of residues sequenced 1 1070 072172B 032671 090872 BR3 a2-CB4 321 1- 47 3 (0.25-0.3) 1 (0.3) - - - c1-2 80 1- 24 - - 2 (0.3) - - HA1 87 1- 19 - 1 (0.6) - - - c2 63 18- 6 T4 16 45- 54 a2(260)-CBN 270 52- 81 - 1 (0.3) - - - T5 9 61- 6 - - - 1 (1.9) - T6 15 70- 80 - - 1 (1.1) - 1 (1.1) c3-4 168 81 - 123 2 (0.4-0.5) 1 (0.7) - - - c4 156 93-144 - 2 (0.3-0.6) - - -

HA3-4 210 112-151 - 2 (0.4) - - - Tll-12 75 139-150 - 1 (0.4) - - - Tll-12-13 93 139-192 - 2 (0.6-1.1) - - - T12-13 45 187-216 - -

HA4 114 208 - 257 - 3 (0.3-0.9) - C5-6 73 249 - 293 - - 1 (0.9) 2 (0.3) -

T17 18 286-301 - 1 (1.2) - 1 (1.2) - C6 19 303-318 - - - 2 (1.2) -

1 (1) 1 (2)

1 (1) - - - 1 (1)

- - - 1 (0.3) - a2(190)-CBN 6 256 - 285 T16 27 259 - 282 - - 1 (0.6) - -

20 60 80 100 IM 160 ieo zoo 220 ZM 280 300 320

C2 CL , HA 3-L -

15 1-12 )I I a2 1190)- CBN 717 - C6 - 116 - z?

Fig. 1. Schematic representation of the a2-CB4 peptides which were wed for sequencing. The filled blocks are those portions which were sequenced

The first 42 N-terminal residues from a2-CB4, determined by sequencing the intact peptide, were published earlier [6]. The sequence described here agrees with the earlier results except for position 30. This was reported to be a hydroxyproline residue when in fact it is threonine.

Hydroxylation of proline occurred only in the Y-position of the triplet Gly-X-Y. Two residues out of the 40 proline residues in the Y-position (numbers 54 and 63) were found to be completely unhydroxylated. Another two residues (in positions 51 and 96) were found to be hydroxylated to about 50%. In 18 in- stances 100% hydroxylation was found. In the re- maining residues, the positions of which are indicated by a star in Fig.4, the exact degree of hydroxylation could not be determined for experimental reasons

(overlap, background) during the automated degra- dation procedure. However, in all instances the degree

116 Amino-Acid Sequence of a2-CB4

10 20 Gly-Pro-Arg-Gly-Pro-liyp-Gly-Ala-Ser-Gly-Ala-Hyp-~ly-?ro-Gln-Gly-?he-Gln-Gly-Pro-Hy~-Gly-Glu-€iyp-Gly-Gl~-~yp-

1-1111711-1777-7771-7777177 1-7-77171

30 40 50 Gly-Gln-Thr-Gly-Pro-Ala-Gly-Ala-Arg-Gly-Pr~-~iyp-Gly-Pro-f~~-Gly-Lys-Ala-Gly-Glu-A~p-Gly-H~s-lly~-Gly-Lys-Pr~-

17711717711171111-l11 1171117111111771111111 1717 717777777 1

60 70 80 Gly-Arg-tlyp-Gly-Glu-Arg-Gly-Val-Pro-Gly-Pr~-Gln-Gly-Ala-Arq-Gly-Phe-tlyp-Gly-Thr-€l~p-Gly-Leu-fl~p-Gly-?he-tlyl-

11711171 1771171111 177171 -71-171

90 100 Gly-Ile-Arg-Gly-His-Asn-Gly-Leu-Asp-Gly-Leu-Thr-Gly-Gln-~yp-Gly-Ala-Flyp-Gly-Val-Hyl-Gly-Glu-llyp-Gly-Ala-Hyp-

110 120 130 Gly-Glu-Asn-Gly-Thr- H~p-Gly-Gln-llyl-Cly-Ala-Arg-Gly-Leu-l~yp-Gly-Glu-Ar~-Gly-Ar~-Val-Gly-A la-Myp-Gly-Pro-*la-

(Parte 1 de 3)