banner



What Might Happen If A Protein Has A Change In One Amino Acid?

BMC Genomics. 2012; thirteen(Suppl 4): S4.

Predict impact of single amino acid change upon poly peptide structure

Christian Schaefer

1Breadbasket, Bioinformatics - I12, Informatik, Boltzmannstr. 3, 85748 Garching, Germany

2TUM Graduate Schoolhouse of Information Science in Health (GSISH), Boltzmannstr. xi, 85748 Garching, Germany

Burkhard Rost

1Stomach, Bioinformatics - I12, Informatik, Boltzmannstr. 3, 85748 Garching, Deutschland

iiTum Graduate School of Computer science in Health (GSISH), Boltzmannstr. 11, 85748 Garching, Germany

3Plant of Advanced Study (IAS), Tum, Boltzmannstr. iii, 85748 Garching, Deutschland

4New York Consortium on Membrane Protein Structure (NYCOMPS), TUM Bioinformatics, Boltzmannstr. three, 85748 Garching, Germany

vDepartment of Biochemistry and Molecular Biophysics, Columbia University, 701 West, 168th Street, New York, NY 10032, U.s.a.

Supplement

SNP-SIG 2011: Identification and annotation of SNPs in the context of structure, function and affliction

Yana Bromberg and Emidio Capriotti

Briefing

SNP-SIG 2011: Identification and annotation of SNPs in the context of structure, part and disease

2011 Jul 15

Vienna, Austria

Supplementary Materials

Additional file 1 Datasets of mutants with observed furnishings on function and stability. Archive of the two dissimilar mutant sets with observed effects forth with predictions of their effect on local structure.

GUID: 8B0D96AA-A62D-42D1-909C-1C2718C1CB4E

Abstract

Background

Amino acid indicate mutations (nsSNPs) may change protein structure and part. However, no method directly predicts the bear on of mutations on construction. Here, we compare pairs of pentamers (5 consecutive residues) that locally alter protein three-dimensional structure (3D, RMSD>0.4Ã…) to those that do not alter construction (RMSD<0.2Ã…). Mutations that alter construction locally can be distinguished from those that do not through a car-learning (logistic regression) method.

Results

The method achieved a rather high overall performance (AUC>0.79, two-land accuracy >72%). This discriminative power was especially unexpected given the enormous structural variability of pentamers. Mutants for which our method predicted a change of construction were also enriched in terms of disrupting stability and function. Although distinguishing change and no modify in structure, the new method overall failed to distinguish between mutants with and without effect on stability or office.

Conclusions

Local structural modify can be predicted. Future work volition accept to establish how useful this new perspective on predicting the effect of nsSNPs will be in combination with other methods.

Background

Protein structures very robust nether sequence change

Evolution creates the specific protein landscape that we discover today. Mutations are random but pick is the driving force that shapes the observable protein multifariousness by favoring those deviations that maintain or improve phenotype. This constrained sampling process explains the sequence diversity compatible with a given protein three-dimensional (3D) structure: over fifty-80% of all residues can be changed without altering structure significantly [1-iii].

Local construction change can touch on phenotype

Although many different sequences map to similar structures, point mutants tin change construction dramatically [4-half dozen]. Some of the intricate details of 3D structures are crucial for part. Therefore, such local conformational changes may impact protein office and may cause disease. Normally, this is more likely for structure changes continued to binding sites. For instance, the disruption of hydrophobic interactions, or the introduction of charged residues into buried sites, or mutations that break beta-sheets oftentimes impact phenotype severely and raise the susceptibility for disease [seven-9]. Using 83 Ten-ray mutant structures from 13 classes of proteins, an early work pioneered the prediction of local structural changes past expert rules operating on position-dependent rotamers [10]. It is unclear, how well such an arroyo would cope with the protein variety constitute in the current PDB [11]. Thus, nosotros followed a different approach. We compiled a set of structurally superimposed pairs of protein fragments with identical sequence except for ane central balance mismatch, and applied car-learning to predict structural change from sequence.

Methods

Primal pentamer data

We extracted 146,296 protein chains from X-Ray structures in the Poly peptide Data Bank (PDB, July 2010) [11]. Then we applied two techniques for back-up reduction. The get-go set (dubbed "cdhit98") contained 24,890 chains; it resulted from clustering with CD-Striking [12] to a level at which no pair had over 98% percentage sequence identity. The 2d set up (dubbed "hval0") contained 3,767 chains; it resulted from filtering at HVAL>0 [ii,3,13] (corresponding to ~20% maximal pairwise sequence identity for alignments over 250 residues). We chopped each chain in each fix into all overlapping fragments of five sequent residues (pentamers), removing: (i) pentamers with concatenation breaks (peptide bail length >2.5Ã…, every bit defined in DSSP [14]), (2) pentamers with non-standard amino acids, and (iii) all simply the start set of atomic coordinates for residues with alternative locations. Each pentamer from the first set (cdhit98) was paired with each pentamer from the 2nd ready (hval0).

We selected pairs of pentamers that differed but in the key amino acrid, and that originated from proteins with over 30% overall percent pairwise sequence identity. We as well filtered out pairs for which either fragment was already in a much larger fragment that fulfilled the above criteria. This procedure yielded 35,533 pentamer pairs. For each pair, we calculated the root mean square displacement (RMSD) over all C-alpha atoms after optimal superposition of the two pentamer backbones (McLachlan algorithm [fifteen] as implemented in ProFit [16]). To plow the continuous RMSD differences into a binary problem (mutant changes structure or non), we had to decide what constitutes a structural effect and what is neutral in that sense. In lack of a scientifically meaningful definition for structural change of pentamers, we chose thresholds that appeared reasonable given the observed distributions and that separated all pentamer pairs into an even amount of structurally neutrals and non-neutrals. We defined RMSD values <0.2Ã… as structurally neutral and values >0.4Ã… as structurally non-neutral, i.eastward. as structural change; we ignored all pairs in betwixt these ii. These item thresholds assigned 12,046 pentamer pairs to the class of "structural change" and 13,675 to the course "neutral". For each such pair we randomly designated 1 fragment as wild type fragment and the primal mismatch rest of the other fragment as the mutant amino acid.

Boosted functional data

For comparing, we also used two data sets that had been used previously (Additional file ane). The commencement set comprised 12,461 functionally neutral and 35,585 functional upshot mutants from three,444 proteins [17,18]. The second consisted of 657 mutants having an event on protein stability and 652 mutants with no event on stability covered by 47 proteins [xix,20]. Mutations leading to a change in the Gibbs free free energy (ΔΔG) < -i kcal/mol or >1 kcal/mol were considered as non-neutral (i.e. both stabilizing and destabilizing mutations were taken every bit assays of alter); all other mutations were treated as neutral (i.east. no effect).

Additional prediction methods

Various methods predict other aspects of the impact for amino acid changes, e.thousand. furnishings on protein function or stability. In detail, we practical SNAP [17] and I-Mutant3 [21] to test their discriminative power on our data sets. Both methods return raw numerical scores reflecting direction and reliability of the prediction. SNAP values range from -100 (neutral for function) to 100 (modify of part). The distance of the actual prediction to the decision boundary (0) reflects the reliability of the prediction and the severity of the predicted result (large distance = high reliability and severity [17]). I-Mutant3 predicts the ΔΔG value upon mutation. We adhered to the same decision cutoffs as mentioned above to define neutral and non-neutral.

Prediction method: basics

Nosotros applied logistic regression to learn the structural change upon amino acid change. Logistic regression is a parameter-free machine-learning algorithm; nosotros adhered to an implementation offered by the LIBLINEAR package (L2-regularized logistic regression, dual) [22].

Many protein features may be relevant for the given prediction task. Our characteristic construction procedure adhered to a protocol established during the development of SNAP [17]. All features were derived from protein sequence alone and were extracted from PredictProtein [23], a wrapper that combines a big number of independent prediction methods. We used iii conceptually different types of features: (one) global features describing the global characteristics of a protein, (2) local features describing 1 particular pentamer and its immediate sequence neighborhood, and (3) difference features that explicitly depict sequence-derived aspects by which wild type and mutant amino acid differ.

(1) Global features: We represented sequence length as iv different values each representing a length interval (i-threescore, 61-120, 121-180, 181-240 consecutive residues). The bin that represented the sequence length was gear up to 0.5, bins below were assigned to one, bins above to 0. Amino acid composition was encoded by 20 values representing relative frequencies of standard amino acids. We predicted secondary structure and solvent accessibility using PROFphd [24,25]. Three values represented the relative content of residues in predicted helix, strand and loop conformation and, similarly, 3 values were used to encode the relative content of predicted buried, intermediate and exposed residues.

(two) Local features: Nosotros used features that described the local sequence neighborhood of the amino acid change. We considered window lengths of i (position of modify only), 5, 9, 13, 17 and 21 sequent residues centered on the position of change. Values were normalized to the interval [0, 1]. The biochemical characteristics of an amino acid influence the local structural conformation. We considered six different structural and biochemical propensities: mass, volume [26], hydrophobicity [27], C-beta branching [28], helix breaker (only proline) and electric charge of side chain. Evolutionary information contained in sequence profiles is a valuable source to obtain knowledge about which amino acids are compatible with a specific region in the protein. While some residues are tolerated others could disrupt structure. We used position specific scoring matrices (PSSMs), relative amino acid frequencies and the information content per alignment position taken from PSI-Boom [29] runs (options: -j 3 –b 3000 –e 1 –h 1e-3) against a sequence database consisting of UniProt [30] and PDB [11]. Sequences were back-up-reduced to a level where no protein pair had more than 80% sequence identity [12]. Furthermore, we took position-specific independent counts (PSIC [31]) and adhered to a protocol necessary for sequence extraction and generation of multiple alignment equally described elsewhere [17]. In addition, we used the post-obit predicted structural and functional features: secondary structure [32,33] and solvent accessibility [24,25,32], protein flexibility [34], protein disorder [35-38], protein-poly peptide interaction hotspots [39-41] and Dna-binding residues [42]. Most prediction methods used to generate features returned both a discrete prediction and a score reflecting the forcefulness and reliability of the prediction. We incorporated both outputs in our feature prepare. Two-state predictions (disorder, protein and DNA interaction) were encoded as two mutually exclusive combinations of 1 and 0, each representing the presence (1) and absence (0) of a state (e.g. disorder vs. no disorder). Three-state predictions (secondary construction elements helix, strand, other and solvent accessibility states cached, intermediate, exposed) were handled similarly. Flexibility was predicted equally a numerical value only. We considered information about the location of the site of change in the sequence relative to a protein domain as an important characteristic. For instance, a hydrophobic-to-polar substitution within the core of a domain may have a more than severe impact on local construction than a change that happens in a surface loop. We extracted relevant per-residue data out of the poly peptide family database Pfam-A [43] using the output from HMMER3 [44]. Of specific interest was the information near whether the residue resided in a domain, the conservation of that position within the domain alignment, how well the residue fitted into the alignment position and the posterior probability of that match.

(3) Difference features: Of detail interest were features that captured the difference in characteristics between the two differing key amino acids in a pair of pentamers. We represented the difference of a item property separately past its accented and its sign, encoded as 0 (negative) or 1 (positive). The following properties were encoded in that respect: Change in any of the six amino acrid propensities, difference in conservation scores (PSSM, relative frequency, PSIC), change in IUPred predictions for both brusk and long disorder, change in predicted secondary structure and solvent accessibility. For the latter two we ran PROFphd on raw sequence rather than sequence contour. Although this mode resulted in reduced prediction performance, it immune usa to observe an bodily difference in the prediction outcome, which would accept been disguised by the employ of sequence alignments otherwise.

Prediction method: feature selection

We concentrated the training of our model just on the most predictive sequence features. Toward this cease, we considered ane fifth of the pentamer pairs (2,243 structurally non-neutral, 2,882 neutral) and ensured that those pairs were derived from proteins without meaning sequence similarity (EVAL>x-3) to any poly peptide in the remaining 4 5th of the data. Those 5,125 instances were further partitioned into x subsets. Nine such sets participated in training a logistic regression model, while its performance was tested on the remainder. Nosotros rotated ten times over all sets such that each instance served once during testing and training and guaranteed that no significant sequence similarity existed betwixt railroad train and test folds (EVAL>x-3). Before each new rotation, a gear up of features for training and testing the model was adamant by the following iterative protocol. We started with one feature and established its predictive operation during one complete rotation as explained above. We did that for all global and difference features as well as every combination betwixt local features and window lengths. We measured characteristic operation by ways of average AUC (area under the receiver-operator bend) derived from rotating ten times over the testing folds. The all-time performing characteristic was automatically included for the subsequent evaluation of the remaining features. Nosotros stopped this forward selection after no further increase in average AUC>0.001 was observed.

Performance estimates

Nosotros assessed performance simply on the test sets (every bit described above). In lack of a biological intuition for how to mensurate the success of our prediction method, we cruel back to standard measures. Following the typical acronyms, we used TP (truthful positives) to denote pairs correctly predicted to change structure (positive) and FP (false positives) are neutral pairs predicted every bit alter. In illustration, TN (true negatives) describes correctly predicted neutral pairs (no change) and FN (false negatives) are structure-changing pairs incorrectly predicted as being neutral. With these, we compiled ROC (Receiver Operating Characteristic) plots, as well as the True Positive Rate (TPR), and the corresponding Faux Positive Charge per unit (FPR) divers by:

equation image

(i)

The area nether the ROC-curve (AUC) averaged over ten rounds of grooming and testing served as a single operation estimator. We besides employed the overall two-country accurateness, often referred to as the Q2 measure. Finally, we monitored grade-specific values for AccuracyC, i.e. the accurateness for the course "structural modify", AccuracyN (accurateness for the grade "neutral"), CoverageC (coverage for grade "change") and CoverageN (coverage neutral) divers by:

equation image

(2)

equation image

Our logistic regression model yielded a probability for an instance to be structurally non-neutral rather than a discrete class label. Past iterating over dissimilar probability thresholds, we sampled a ROC-like space of Accuracy-Coverage pairs for each of the ii classes.

Box plots

We presented distributions through box plots. The lower and upper box edges depict the first and third quartile, respectively. The length of a box is the interquartile range of the distribution. The bold bar inside the box represents the median, while dashed lines accomplish to the about extreme data point that is no more than than 1.v times the interquartile range away from the upper or lower box edge. It is worth noticing that per definition the box covers half of the distribution.

Results and discussion

Plumbing equipment parameters to observations easily ends in the trap of over-optimization [45]. We take addressed this event in ii ways (Methods). Firstly, we carefully applied standard cross-validation techniques. This included setting pentamer pairs aside that were used but for feature selection, ascertaining minimal sequence similarity between cross-validation sets, and avoiding to over-sample the data ready. Secondly, we compared the terminal method on completely different data sets.

Evolutionary and structural features well-nigh predictive

Our forward selection scheme (Methods) yielded the following features as about informative (Fig. 1): deviation in PSIC between "native" and "mutant", predicted secondary structure (w=17), BLAST information for each residue (westward=21), residue flexibility (w=21), departure in PSSM and predicted secondary structure betwixt "native" and "mutant", HMMER scores for fitting amino acids into a PFam domain alignment (w=13), predicted protein-protein interaction hotspots (due west=13), and finally the amino acid volume (w=5). Due to the specific encoding of those properties (Methods), the overall characteristic infinite covered 147 numerical feature values.

An external file that holds a picture, illustration, etc.  Object name is 1471-2164-13-S4-S4-1.jpg

Structural and evolutionary features most predictive. Input features according to their cumulative contribution to performance measured by AUC, i.e. the area under the ROC bend (AUC* indicates that these values refer to results for a subset of the full cantankerous-validation prepare). Our frontward characteristic choice scheme suggested that 3 features raised performance above 0.8: evolutionary information (PSIC [31] diff), predicted secondary structure (from PROFsec [32,33]) around mutant (mutant position ± 8, i.east. 17 input units), and the PSI-Boom information per residue for 21 consecutive residues. Additional six features only marginally increase performance up to mean AUC* ~0.84: predicted flexibility (PROFbval, w=21), difference in both PSI-Smash PSSM (PSSM diff) and predicted secondary construction scores (PFOFsec unequal), the fit of modify position into a PFam domain (PFam fit, due west=13), scores for predicted protein-protein interaction hotspots (ISIS, w=13) and rest volumes (Book, w=v). Loftier variability in AUC* distributions (long box plots, strong overlap betwixt box plots) indicates instability in selected features.

Three features dominate, almost features unstable

For the concluding assessment of our method, we applied full cross-validation. However, in this paragraph, focus is on assessing the relative contribution of input features. Toward this end, nosotros only used ane 5th of the data as 1 attempt to avert over-fitting. The numbers are, therefore, merely relevant in a relative way.

The success of the method was dominated past the first iii features, as indicated by the steepest ascent in average AUC (Fig. one, first three box plots and solid line). Already the very first property solitary (deviation in PSIC values between wild type and mutant remainder) gave an AUC of almost 0.72 (compared to the random value of 0.v). With the 3rd feature (Blast information per position, w=21), the bigotry reached an AUC of almost 0.82, shut to the performance maximum. The inclusion of the last feature (residue volume) gave an AUC of ~0.84 (Fig. one, last box plot). Thus, the most informative feature increased the AUC by 0.2, the terminal six together past only one 10th of this.

The per-feature operation varied strongly in their AUC distributions (Fig. one, long box plots). While this variance was most pronounced for the first feature (PSIC difference), the tendency continued throughout the feature pick (decrease in variability easily explained past the decreasing performance). In the performance plateau regime, features were no longer distinguishable by the distributions of their ten AUC values (Fig. 1, nearly complete box plot overlap after the third feature). Nonetheless, we stopped the feature selection when the operation did not improve more than AUC>ten-iii. This early stop was implemented as another safeguard against over-fitting.

Sequence-based prediction of structural impact successful

All operation measures reported in following were compiled from a 10-fold cantankerous validation (Methods). The logistic regression model estimates the probability for structural alter. Through a uncomplicated threshold, this probability gives a binary prediction (due east.g. change>0.five, neutral≤0.5) with an overall two-state per-rest accuracy Q2>72%. However, we also established ROC-curves and accuracy-coverage plots past dialing through the whole spectrum of probability values (Fig. 2A). The final model reached an overall AUC of ~0.8.

An external file that holds a picture, illustration, etc.  Object name is 1471-2164-13-S4-S4-2.jpg

Proficient discrimination between pentamers with and without effect. All values refer to full-cross validation averages of data not used for feature optimization. Left panel (A): our best model (solid line) reached an AUC of 0.eight, compared to random predictions (dashed line) with AUC=0.5. Right panel (B): predictions for effect on structure (change) and predictions for no effect (neutral) reached similar levels on the accuracy vs. coverage plot equation (2).

Both in a higher place measures assess overall performance without explicitly revealing per-class (change/neutral) levels. We investigated pairs of coverage/accuracy values sampled at unlike probability thresholds. More than than half of neutral and not-neutral predictions (52%) reached around eighty% accuracy (Fig. 2B); for higher accuracy, the correct predictions were dominated by predictions of effect.

These results suggested that sequence suffices to predict the impact of bespeak mutations upon structure through machine learning. This is particularly remarkable in light of the fact that pentamer conformations depend crucially on their structural environs outside the windows that we have considered every bit input features in our prediction method [46-48].

Structural effect predictions enriched in functional impact

Our explicit objective was to predict the impact of single betoken mutations upon local structure. The implicit objective was to also develop a new perspective that aids in the prediction of how mutations affect function. While it is clear that the subset of all mutations that locally change structure will be enriched in mutations that as well affect function, the inverse is non true: mutations that do non change structure may or may not change function, i.e. will not exist enriched in "functionally neutral". If our prediction method captured important aspects of structural modify, at best its prediction of structural impact volition exist enriched in those with functional bear on.

We tested this culling perspective on performance in two ways. On the one hand, nosotros used a data set distinguishing amino acid mutations (nsSNPs) that impact function from those that practise non. On the other hand, we used a data set of mutants that do and practice not bear on protein stability. Two results stood out from this assay. Start, mutations predicted to affect structure were enriched in those that as well affect function (Fig. 3, ascending dashed curve). Second, the enrichment was proportional to the severity of predicted structural modify: starting at over 76% to values over 81% at a probability >0.nine (Fig. 3). We observed a similar tendency for the stability data: enrichment in predicted structural outcome mutations was 8-13 per centum points above random (random: 50%, enrichment: 58%-63%, Fig. 3). Due to little sample size, the stability enrichment was less significant than that for functional affect.

An external file that holds a picture, illustration, etc.  Object name is 1471-2164-13-S4-S4-3.jpg

Mutations predicted to affect structure often touch function. We investigated the subset of mutants predicted by our method to alter structure from uncertain predictions (>0.v) to very strong predictions (>0.9). Residues in this subset, we found more often than expected to also have an observed consequence in other data sets than used for our method, namely on protein stability (solid line) and protein function (dashed line). This suggests that strong structural modify upon amino acid change results in increased likelihood to alter function or stability.

The to a higher place results strongly suggested that our method captured important data across its explicit training chore. The enrichment over the background might not seem particularly strong (for part: groundwork most 74% vs. 81% predicted, for stability: groundwork fifty% vs. 63% predicted). Still, information technology remains unclear what to compare this enrichment with: some mutations affect structure but not office. So what would the enrichment go if we had the complete experimental information correlating all possible assays for structure and function change? Does our method pick up a significant fraction of the possible signal? We have no means of answering this question. Yet, our prediction method undoubtedly captured a signal pointing into the expected direction: The increasing severity of structural effect upon amino acid change is linked with an aggregating of mutants having an effect on protein office or stability, and this accomplishment was truly "novel" and it provides information that seems orthogonal to what any other method could have provided.

Signal for the opposite: predicted functional impact more than pronounced in structural alter

In the previous paragraph, we established that our structure impact predictions capture some signal of functional modify. What about the reverse, i.eastward. to which extent do methods that aim at predicting bear on on role (eastward.chiliad. SNAP [17]) and on stability (e.thousand. I-Mutant3 [21]) correctly capture the touch of mutations upon construction? Outset, we provided the "background" past the application of our structural effect method (Fig. 4A+D; data for cantankerous-validation). Both SNAP (Fig. 4B+E) and I-Mutant3 (Fig. 4C+F) failed to split up mutations with and without impact on structure. SNAP at least was able to observe some signal: very few mutations with impact on structure were predicted at scores corresponding to predictions of strong consequence upon function. At the default probability threshold of 0.5 our method correctly predicted 69% of all effect (Fig. 4D left nighttime blue bar), and 76% of all the neutral pentamers (Fig. 4D, right calorie-free blue). The corresponding numbers were 39% functional effect in structural effect / 88% functional neutral in neutral for SNAP (Fig. 4E), and 33% consequence on stability in structural outcome / 72% no result on stability in neutral for I-Mutant3 (Fig. 4F).

An external file that holds a picture, illustration, etc.  Object name is 1471-2164-13-S4-S4-4.jpg

Correlation between structure and office not picked upward by other methods. We applied three prediction methods to our dataset of structural issue: (A, D) the new method introduced here, (B, E) SNAP [17] predicting impact on part, and (C, F) I-Mutant3 [21] predicting the impact on stability. In lack of a better culling, we chose the default threshold for each method (horizontal dashed lines) to distinguish neutral from effect. The method introduced here that is specialized to separate structural outcome from neutral performs all-time at this chore (A: footling overlap between boxes; note: data in cross-validation mode of our method). The distributions from SNAP (functional effect prediction) and I-Mutant3 (stability prediction) both practice non capture the structure signal.

1 conclusion from applying SNAP and I-Mutant3 to our data is that only our method succeeded in managing the chore that nosotros had gear up. Ane possible explanation is that our job is incorrectly formulated, i.e. our data set of pentamers with and without local structural alter is wrong. Imagine, we assigned labels to pentamers randomly. Then SNAP and I-Mutant3 would fail. If the labels had truly been random, our ain method would fail, as well. Assume they are not random but biophysically meaningless (east.g. mutations to aromatic amino acids cause change, all others are neutral). If this assumption were fully true, our method would not have picked upwardly a signal in the other data sets that we tested (Fig. iii). Furthermore, if our data set were fully non-sense, SNAP could non accept picked up a weak indicate. The fact that I-Mutant3 does non pick upwards a signal may point to the difference between local changes – as targeted here – and global changes – every bit targeted by I-Mutant3.

All the above considerations back up the view that our definition of local structural change captures an important characteristic of the response of proteins to amino acid changes, and that the method introduced hither succeeds at solving the chore that we posed.

Conclusions

How practise bespeak mutations modify the life of a protein? Here, we introduced three new views toward tackling this question. Firstly, we introduced a dissimilar perspective of change. Structural effect past our definition is perceived equally two protein fragments having a meaning dissimilarity in backbone conformation. Secondly, we created a new dataset that allowed us to successfully train a machine-learning model with the incentive to separate structural neutral from non-neutral fragments. Thirdly, we established that both our method and definition of structural change also capture to some extent the impact of change on protein function. It remains to be investigated in more detail how exactly the new method can aid in annotating the impact of amino acrid changes and nsSNPs.

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

CS carried out the data analysis, programming, and helped to draft the manuscript. BR conceived and supervised the project, and helped to typhoon the manuscript. All authors read and approved the last manuscript.

Supplementary Material

Additional file 1:

Datasets of mutants with observed furnishings on office and stability. Annal of the 2 different mutant sets with observed effects forth with predictions of their event on local structure.

Acknowledgements

We give thanks Yana Bromberg (Rutgers), Marco Punta (Sanger) and Ulrich Mansmann (LMU Munich) for helpful discussions. Special thanks become to Laszlo Kajan, Guy Yachdav and Tim Karl (TUM Munich) for maintenance of our compute cluster and to Marlena Drabik (Breadbasket Munich) for administrative support. Particular thanks to the anonymous reviewers and the 2 editors (Emidio Capriotti, University of Balearic Islands, and Yana Bromberg, Rutgers) for their important help. Last non least, thank you to those who deposit their experimental data in public databases and those maintaining those databases.

Funding: CS and BR were funded by Alexander von Humboldt Foundation.

This article has been published every bit function of BMC Genomics Volume xiii Supplement iv, 2012: SNP-SIG 2011: Identification and annotation of SNPs in the context of construction, function and disease. The full contents of the supplement are bachelor online at http://www.biomedcentral.com/bmcgenomics/supplements/13/S4.

References

  • Shakhnovich EI, Gutin AM. Influence of point mutations on protein structure: probability of a neutral mutation. Periodical of theoretical biology. 1991;149(4):537–546. [PubMed] [Google Scholar]
  • Sander C, Schneider R. Database of homology-derived poly peptide structures and the structural meaning of sequence alignment. Proteins. 1991;9(1):56–68. [PubMed] [Google Scholar]
  • Rost B. Twilight zone of protein sequence alignments. Protein engineering science. 1999;12(ii):85–94. [PubMed] [Google Scholar]
  • Eriksson AE, Baase WA, Zhang XJ, Heinz DW, Blaber Thou, Baldwin EP, Matthews BW. Response of a protein structure to crenel-creating mutations and its relation to the hydrophobic upshot. Science. 1992;255(5041):178–183. [PubMed] [Google Scholar]
  • Garcia-Seisdedos H, Ibarra-Molero B, Sanchez-Ruiz JM. How many ionizable groups can sit on a poly peptide hydrophobic cadre? Proteins. 2011;80(1):one–vii. [PubMed] [Google Scholar]
  • Xu J, Baase WA, Baldwin East, Matthews BW. The response of T4 lysozyme to big-to-small substitutions within the core and its relation to the hydrophobic issue. Poly peptide science : a publication of the Protein Society. 1998;seven(1):158–177. [PMC costless article] [PubMed] [Google Scholar]
  • Gong S, Blundell TL. Structural and functional restraints on the occurrence of single amino acid variations in human being proteins. PLoS 1. 2010;5(2):e9186. [PMC free article] [PubMed] [Google Scholar]
  • Sunyaev S, Ramensky V, Bork P. Towards a structural footing of human not-synonymous unmarried nucleotide polymorphisms. Trends Genet. 2000;16(5):198–200. [PubMed] [Google Scholar]
  • Wang Z, Moult J. SNPs, protein structure, and disease. Human being mutation. 2001;17(4):263–270. [PubMed] [Google Scholar]
  • De Filippis V, Sander C, Vriend Thou. Predicting local structural changes that upshot from indicate mutations. Poly peptide engineering science. 1994;vii(x):1203–1208. [PubMed] [Google Scholar]
  • Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE. The Protein Data Bank. Nucleic Acids Res. 2000;28(1):235–242. [PMC free article] [PubMed] [Google Scholar]
  • Li W, Godzik A. Cd-hitting: a fast program for clustering and comparison large sets of protein or nucleotide sequences. Bioinformatics. 2006;22(13):1658–1659. [PubMed] [Google Scholar]
  • Mika S, Rost B. UniqueProt: Creating representative protein sequence sets. Nucleic Acids Res. 2003;31(13):3789–3791. [PMC free commodity] [PubMed] [Google Scholar]
  • Kabsch Westward, Sander C. Dictionary of protein secondary construction: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers. 1983;22(12):2577–2637. [PubMed] [Google Scholar]
  • McLachlan A. Rapid comparing of poly peptide structures. Acta Crystallographica Section A. 1982;38(6):871–873. [Google Scholar]
  • Profit. http://world wide web.bioinf.org.uk/software/profit/
  • Bromberg Y, Rost B. SNAP: predict effect of non-synonymous polymorphisms on office. Nucleic Acids Res. 2007;35(11):3823–3835. [PMC costless commodity] [PubMed] [Google Scholar]
  • Kawabata T, Ota Thou, Nishikawa G. The Protein Mutant Database. Nucleic Acids Res. 1999;27(1):355–357. [PMC free article] [PubMed] [Google Scholar]
  • Capriotti E, Fariselli P, Casadio R. I-Mutant2.0: predicting stability changes upon mutation from the protein sequence or structure. Nucleic Acids Res. 2005;33(Web Server issue):W306–310. [PMC gratis article] [PubMed] [Google Scholar]
  • Kumar Md, Bava KA, Gromiha MM, Prabakaran P, Kitajima G, Uedaira H, Sarai A. ProTherm and ProNIT: thermodynamic databases for proteins and protein-nucleic acid interactions. Nucleic Acids Res. 2006;34(Database issue):D204–206. [PMC complimentary commodity] [PubMed] [Google Scholar]
  • Capriotti East, Fariselli P, Rossi I, Casadio R. A three-state prediction of single point mutations on protein stability changes. BMC bioinformatics. 2008;ix(Suppl 2):S6. [PMC free article] [PubMed] [Google Scholar]
  • Fan R-Due east, Chang K-W, Hsieh C-J, Wang Ten-R, Lin C-J. LIBLINEAR: A Library for Big Linear Classification. J Mach Acquire Res. 2008;9:1871–1874. [Google Scholar]
  • Rost B, Yachdav G, Liu J. The PredictProtein server. Nucleic Acids Res. 2004;32(Web Server consequence):W321–326. [PMC free article] [PubMed] [Google Scholar]
  • Rost B. In: Methods in enzymology. Russell FD, editor. Vol. 266. Academic Press; 1996. PHD: Predicting one-dimensional poly peptide structure by profile-based neural networks; pp. 525–539. [PubMed] [Google Scholar]
  • Rost B. How to Use Protein 1- D Structure Predicted by PROFphd. The Proteomics Protocols Handbook. 2005. pp. 875–901.
  • Zamyatnin AA. Poly peptide volume in solution. Progress in biophysics and molecular biology. 1972;24:107–123. [PubMed] [Google Scholar]
  • Kyte J, Doolittle RF. A simple method for displaying the hydropathic character of a poly peptide. Journal of molecular biology. 1982;157(1):105–132. [PubMed] [Google Scholar]
  • Betts MJ, Russell RB. Amino acid properties and consequences of substitutions. Bioinformatics for Geneticists. 2003;317 [Google Scholar]
  • Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped Nail and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25(17):3389–3402. [PMC complimentary article] [PubMed] [Google Scholar]
  • Bairoch A, Apweiler R, Wu CH, Barker WC, Boeckmann B, Ferro South, Gasteiger E, Huang H, Lopez R, Magrane M. et al.The Universal Protein Resource (UniProt) Nucleic Acids Res. 2005;33(Database outcome):D154–159. [PMC gratuitous commodity] [PubMed] [Google Scholar]
  • Sunyaev SR, Eisenhaber F, Rodchenkov Four, Eisenhaber B, Tumanyan VG, Kuznetsov EN. PSIC: profile extraction from sequence alignments with position-specific counts of contained observations. Protein engineering. 1999;12(v):387–394. [PubMed] [Google Scholar]
  • Rost B, Sander C. Combining evolutionary information and neural networks to predict poly peptide secondary construction. Proteins. 1994;nineteen(1):55–72. [PubMed] [Google Scholar]
  • Rost B, Sander C. Prediction of protein secondary structure at better than 70% accuracy. Periodical of molecular biological science. 1993;232(2):584–599. [PubMed] [Google Scholar]
  • Schlessinger A, Yachdav G, Rost B. PROFbval: predict flexible and rigid residues in proteins. Bioinformatics. 2006;22(7):891–893. [PubMed] [Google Scholar]
  • Schlessinger A, Liu J, Rost B. Natively unstructured loops differ from other loops. PLoS computational biology. 2007;three(7):e140. [PMC free commodity] [PubMed] [Google Scholar]
  • Schlessinger A, Punta M, Rost B. Natively unstructured regions in proteins identified from contact predictions. Bioinformatics. 2007;23(eighteen):2376–2384. [PubMed] [Google Scholar]
  • Dosztanyi Z, Csizmok V, Tompa P, Simon I. IUPred: spider web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content. Bioinformatics. 2005;21(16):3433–3434. [PubMed] [Google Scholar]
  • Schlessinger A, Punta Yard, Yachdav Chiliad, Kajan L, Rost B. Improved disorder prediction by combination of orthogonal approaches. PLoS I. 2009;iv(two):e4433. [PMC free article] [PubMed] [Google Scholar]
  • Ofran Y, Rost B. ISIS: interaction sites identified from sequence. Bioinformatics. 2007;23(two):e13–16. [PubMed] [Google Scholar]
  • Ofran Y, Rost B. Poly peptide-poly peptide interaction hotspots carved into sequences. PLoS computational biology. 2007;three(7):e119. [PMC complimentary commodity] [PubMed] [Google Scholar]
  • Ofran Y, Rost B. Analysing six types of poly peptide-poly peptide interfaces. Journal of molecular biology. 2003;325(ii):377–387. [PubMed] [Google Scholar]
  • Ofran Y, Mysore 5, Rost B. Prediction of Dna-binding residues from sequence. Bioinformatics. 2007;23(13):i347–353. [PubMed] [Google Scholar]
  • Finn RD, Mistry J, Tate J, Coggill P, Heger A, Pollington JE, Gavin OL, Gunasekaran P, Ceric Thousand, Forslund K. et al.The Pfam protein families database. Nucleic Acids Res. 2010;38(Database issue):D211–222. [PMC free article] [PubMed] [Google Scholar]
  • Finn RD, Clements J, Eddy SR. HMMER web server: interactive sequence similarity searching. Nucleic Acids Res. 2011;39(Spider web Server result):W29–37. [PMC complimentary commodity] [PubMed] [Google Scholar]
  • Smialowski P, Frishman D, Kramer Due south. Pitfalls of supervised characteristic selection. Bioinformatics. 2010;26(3):440–443. [PMC free article] [PubMed] [Google Scholar]
  • Kabsch W, Sander C. On the utilise of sequence homologies to predict poly peptide construction: identical pentapeptides can have completely different conformations. Proceedings of the National University of Sciences of the United States of America. 1984;81(4):1075–1078. [PMC free article] [PubMed] [Google Scholar]
  • Cerpa R, Cohen FE, Kuntz ID. Conformational switching in designed peptides: the helix/sheet transition. Folding & design. 1996;1(2):91–101. [PubMed] [Google Scholar]
  • Fliess A, Motro B, Unger R. Swaps in poly peptide sequences. Proteins. 2002;48(2):377–387. [PubMed] [Google Scholar]

Source: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3395892/

Posted by: hansonandid1954.blogspot.com

0 Response to "What Might Happen If A Protein Has A Change In One Amino Acid?"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel