-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Potential useful information to review and add to the readthedocs page - from learning unit 5 #368
Comments
Map content in HPLC/MS The quantitative information in a LC-MS map can be used in numerous applications. The spectrum ranges from additive measurements in analytical chemistry, over analysis of time series in expression experiments, to applications in clinical diagnostics, in which we want to find statistical significant markers for detecting certain disease states. All these applications have in common that we need to relate the same peptides in different measurements to each other. This is usually done under the assumption that the measured m/z and RT of a peptide stay roughly constant. As with each laboratory experiment, this only holds true to a certain extent. In particular, the RT often shows large shifts and possibly distortions when different runs are compared, but also the m/z dimension might show (relatively smaller) distortions. This fact makes the assignment of similar peptides difficult since the relative shift of two maps to each other is not known in advance. But it is crucial to correct for those warps. Otherwise, it is hard or even impossible to find for a peptide in the first map and the corresponding partner in the second map Goal of map alignment Many applications are only possible if we achieve this goal, since we only then know which features belong to each other. For example when running replicate measurements of a sample (to assess the variance in quantitation) we need to group corresponding features together. |
Often an affine alignment is sufficient. However non linear distortions are possible. In that case one can compute a more accurate local alignment using LOESS regression. LOESS regression (often also called LOWESS) is a locally weighted polynomial regression, based on a pre-defined window size. Points within this window contribute to the local regression. |
Stable isotope labeling of amino acids in cell culture (SILAC) |
Label-free quantification
Label-free quantification is a method that aims to determine the relative amount of proteins in two or more biological samples. It may be based on precursor signal intensity or on spectral counting. The first method is useful when applied to high precision mass spectra. In contrast, spectral counting simply counts the number of spectra identified for a given peptide in different biological samples and then integrates the results for all measured peptides of the protein(s) that are quantified. The computational framework includes detecting peptides, matching the corresponding peptides across multiple maps, selecting discriminatory peptides.
Analysis strategy
Feature finding
Let us focus on the quantification through the ion current in MS spectra. In this case, MS intensity follows the chromatographic concentration. Some properties of MS map are given below:
Up to millions of points per spectrum
Tens of thousands of spectra per LC run
Huge 2D datasets of up to hundreds of GB per sample
Raw data: unmodified detector signal
Centroided data: peaks called on the MS level
We then implement feature finding to reduce the data complexity while keeping the features (peaks).
Isotope patterns
The monoisotopic peak is the mass peak corresponding to the monoisotopic mass of an analyte. It plays a central role in many mass spectrometry processing tasks.
For most elements exist several naturally occurring isotopes so we usually don’t observe isotopically pure molecules. Instead we observe each possible version with a certain probability determined by the relative isotopic abundances.
Molecule species that differ in the number of neutrons are called isotopologues. Note that this implies that different isotopologues have different masses.
For example, for a single carbon atom there are two variants. Either, it is a carbon-12 or a carbon-13 isotope. Because the relative abundance for carbon-12 is 98.93% and for carbon-13 1.07% we observe carbon-12 with p=0.9893 and carbon-13 with p=0.017. C has two isotopologues. Measuring a single carbon in a mass spectrometer therefore gives rise to two peaks. One is the monoisotopic peak (carbon-12) and one is the carbon-13 peak (see below).
For C_n
we already have n different possible places where the carbon-12 is replaced by a carbon-13 resulting in 2n configurations. In the mass spectrum, we observe n + 1 peaks corresponding to all n + 1 isotopologues.
Example: Isotope pattern of C1000
. Note the bell shape (=isotopic envelope) of the pattern.
Biomolecules contain more than one element. Peptides e.g. contain C, H, N, O, P and S giving rise to more complex isotope patterns. Depending on e.g. the instrument resolution a modern mass spectrometer can resolve mass peaks stemming from different isotopes of different elements. One says, it can resolve the isotopic fine structure of an analyte. This is often required to distinguish small molecules but e.g. not that important for the analysis of peptides.
Example: The molecule CO has 6 different configurations. Two (for carbon-12 and carbon-13) times three (for oxygen-16, oxygen-17, oxygen-18) different peaks of the isotopic fine structure can be observed in theory. This means that our mass spectrometer must resolve mass-to-charge peaks of configurations with the same number of protons and neutrons. E.g. consider the two configurations 13C16O
and 12C17O
. Both have in total 14 protons and 15 neutrons but their mass is slightly different: 28.9983 u and 28.9991 u.
Example 2 (see Figure below): The big spectrum on the bottom shows the isotopic fine structure of the isotopic peak of insulin that is roughly 5 Da above the monoisotopic peak (*, upper left).
A mass resolving power, m/FWHM > 2,300,000 is required to resolve the two closely spaced species with five 13Cs vs. three 13Cs and one 18O, differing by only 2.5 mDa.
Averagine
Since the isotope pattern changes with the composition of the peptide, it is unknown which pattern should be fitted. If we want to determine the monoisotopic mass from average mass, it's better to use a distribution of amino acids instead of assuming all amino acids have the same probability of occurrence. If we assume an average composition of an amino acid, then we can estimate the elemental composition of the peptide. Such an average amino acid, also called ‘averagine’, can be derived statistically from protein databases(Senko et. al, 1995):
C4.94H7.76N1.36O1.48S0.04
,
with an average mass of 111.1254 Da.
[1] Senko, Michael W., Steven C. Beu, and Fred W. McLaffertycor. "Determination of monoisotopic masses and ion populations for large biomolecules from resolved isotopic distributions." Journal of the American Society for Mass Spectrometry 6.4 (1995): 229-233.
Based on averagine compositions one can compute the isotope patterns for any given mass. To obtain a model molecular formula, the number of averagine units in a molecule is determined from the average molecular mass, and then this number is multiplied by the number of atoms of each element in an averagine residue. Because calculation of the theoretical isotopic profile requires integral numbers of atoms, the values obtained for C, N, 0, and S are rounded to the nearest integer and the final average molecular mass is corrected by adjustment of the number of Hs. Rounding errors induced by the addition or
subtraction of half a C, N, 0, or S and numerous Hs do not shift the isotopic distribution a significant amount.
For example, the 20-kDa model compound would be composed of 179.98 averagine units and should therefore contain 7.5 sulfur atoms. The abundances for the isotopic peaks obtained when the number of sulfurs is rounded down to 7 (while adding 16 hydrogens) differ by less than 1% relative to the isotopic abundances
The text was updated successfully, but these errors were encountered: