-
Notifications
You must be signed in to change notification settings - Fork 22
Data Processing
This section covers the data processing features in UniDec.
The m/z range can be set in multiple ways. The first is to manually enter a minimum and maximum in the top right corner of the user interface. After entering a minimum and maximum in the boxes labeled min and max, click the process data button. This will narrow the range of the mass spectrum to the minimum and maximum that have been set. The other option is to hold down the left click and drag across the mass spectrum. This will zoom into the mass spectrum. You can then right click to save the new minimum and maximum in m/z range. When you right click on the spectrum, it will automatically process the data, so there is no need to press the process data button. Clicking the full button (Version 4.2.1 and later) in the top right will completely zoom out of the mass spectrum.
Once the data has been processed, by either clicking the data process button or right-clicking on the spectrum, UniDec creates a file with the processed mass spectrum. If you open the folder that originally had the data, you will see a new folder called _unidecfiles. There are three files in the folder, an input, config, and raw data file. Opening the input file will give the x and y coordinates of the processed spectrum. You can use this file for plotting in other programs. The raw data file is a copy of the raw data in UniDec. The config file contains the parameters that were set in UniDec for the processing of the data. The config file can be opened (File > Load External Config File or drag and drop) in UniDec for other data to ensure each spectrum goes through the same data processing.
Under the standard data processing options, we can use the check box to turn on background subtraction. Remember that you need to click the Process Data button after turning on background subtraction to save the new parameters for processing the data. The background subtraction check box turns on a curved background with a setting of 100, which can be adjusted further in the Advanced Data Processing tab.
Under the Additional Data Processing Parameters tab, Gaussian smoothing is useful for smoothing out noise in a mass spectrum. Entering a value of 5 will smooth the data with a Gaussian that has a width of 5 data points. Specifically, the number specifies the sigma in the gaussian_filter SciPy function. This can be adjusted according to what is appropriate for the given spectrum. Clicking process data once an input value is set will apply the Gaussian smoothing.
This parameter sets a certain number or width of bins collected for linearization, which is the last option in the Advanced Data Processing Features (see more info below). Note: if the bin size is set to 0, the linearization function is turned off.
In UniDec, there are three options for background subtraction: minimum, line, and curved. The default option from the checkbox is subtract curved 100.
In subtract minimum, the minimum data point’s intensity will be subtracted from the whole spectrum. This will shift the whole spectrum down so that the lowest intensity data point has an intensity of 0. For example, if the minimum data point has a relative intensity of 0.1 then every data point’s intensity in the spectrum will be subtracted by 0.1. The minimum background subtraction is most useful for a constantly raised baseline where you want to drop all points by the same amount. The value in the box doesn’t matter for minimum; any value other than 0 will turn it on.
Subtract line averages the first and last n data points of the spectrum and creates a line between them, which the whole spectrum will be subtracted by. The value in the box determines the number of data points that will be averaged at the front and back end of the spectrum. If the value is set to 100, then the first 100 data points and last 100 data points will be averaged. A line will be created between the two averages, which the spectrum will be subtracted by. The line baseline subtraction is most useful for a linearly sloping baseline.
Subtract curved creates a smooth baseline based on local minima throughout the spectrum. The number in the box specifies how many points will be included in generating each local minimum across the baseline of the spectrum. Using a smaller value will generate lots of local minima from only a few data points, resulting in a rougher baseline that closely matches the spectrum. Using a larger number will generate only a few local minima from many data points, resulting in a smoother baseline for subtraction. In other words, a smaller number will give a more dramatic baseline subtraction, and a larger number will give a less dramatic baseline subtraction. For example, using a value of 10 will have a more dramatic baseline subtraction then a value of 100. The default value of 100 works well in many cases, but the value is adjustable whether you want more or less baseline subtraction. For more information, see Figure S-1 in Morgner and Robinson.
Intensity thresholding is useful for removing lower intensity data points such as noise. Normally, UniDec normalizes the data so that the maximum intensity is 1. If the threshold is set to 0.1, any data point that has a relative intensity that is less than 10% of the highest abundant peak will be subtracted from the spectrum. However, the normalization option can be turned off in UniDec, in which case the threshold will need to be adjusted to reflect the absolute signal intensity. Note: turning on Publication Mode of plotting will show everything normalized to 100%, but the spectrum will still be normalized to 1, not 100.
The adduct mass is set by default to the mass of hydrogen because in positive ionization mode we are usually analyzing [M+nH]n+ ions. However, if we were analyzing sodiated ions then the adduct mass would need to be switched to the mass of sodium. Most of the time, we are measuring [M+nH]n+ ions unless negative ionization mode is being used. Then the adduct mass will have to be switched to a negative to match the [M-nH]n- ions that are being analyzed. Thankfully in UniDec there is a negative mode option located in the Additional Deconvolution Parameters tab that will automatically switch the adduct mass to negative.
Setting the ToF acceleration voltage can help correct for the detector efficiency between big ions and small ions in ToF mass spectra. See: Fraser, G. W. The ion detection efficiency of microchannel plates (MCPs). International Journal of Mass Spectrometry 2002, 215, 13-30. The specific form of the function is:
eff = (1 - np.exp(-1620 * (va / data[:, 0]) ** 1.75))
data[:, 1] = data[:, 1] / eff
Data reduction removes lower intensity data points, similarly to the intensity threshold parameter (see above for more information). However, data reduction does not use relative intensity. Instead, data reduction sorts the data from most intense to least intense and removes a lower fixed percentage, which we control by changing the value in the box. When 10% data reduction is used, UniDec is removing the least abundant 10% of all the data points. Therefore, processing the data with 10% data reduction is very different than setting a 10% intensity threshold. When choosing which to use, think about whether you want to specify a fixed intensity, which is intensity threshold, or a fixed percentage of the data to remove, which is data reduction. Note: Data reduction is commonly used for high-resolution Fourier-transform ion cyclotron resonance (FTICR) mass spectra.
Beneath the Data reduction parameter, we see the normalization check box to normalize the data. By default, UniDec normalizes the data to 1 or 100% relative intensity on the y axis. Normalization can be turned off by leaving the check box empty, which will leave the y axis in absolute signal intensity. This is useful to compare absolute intensities between different mass spectra.
At the end of these parameters is a dropdown menu, which determines how the data will be linearized in UniDec. There are five options for linearization in UniDec: Linear, Linear Resolution, Nonlinear, Linear Interpolated, and Linear Interpolated Resolution. By default, linearization is turned off because the bin value is zero (see the Bin Every option above for more information).
When the data is linearized with Linear, UniDec picks data points in every n m/z values based on the n that we set in the Bin Every parameter. For example, if we set a bin value of 5, UniDec will linearize the spectrum such that there is a data point every 5 m/z. All the intensities around each 5 m/z step will be summed into 1 data point. Linear resolution is similar to Linear; but rather than sampling at a constant m/z step, it samples at a constant resolution. For example, if we set the bin value to 5, the first step will be 5 m/z but the next step will be slightly larger to keep a constant m/z resolution across the m/z range.
Nonlinear works differently from linear. When a bin size is set for nonlinear, every n number of data points will be averaged into one data point in m/z. If we choose a bin value of 5, then the m/z values and intensities of every 5 data points will be averaged into 1 new data point throughout the mass spectrum. Nonlinear is the default because it is least disruptive to the data; it preserves the nonlinear spacing found in many mass analyzers and is effective at smoothing the data without broadening the peaks. Finally, it is a simple way to reduce the number of data points, which speeds up the algorithm in UniDec.
Linear Interpolated is similar to Linear in that it samples the data every n number of m/z units. However, rather than combining all the data points between sampled points, the linearization interpolates the original data and generates a spectrum based on the interpolation. Interpolation should only be used if you are trying to oversample the data and is not recommended for normal use. In Linear, oversampling would result in many bins with intensities of zero. Therefore, the resulting mass spectrum would be distorted with many sporadic data points with intensities of zero throughout the spectrum. With Linear Interpolated, the resulting spectrum more closely matches the original data because of the interpolation, which predicts that there is intensity between data points. Linear Interpolation Resolution uses the same interpolation in Linear Interpolation but keeps all of the data points at a constant resolution.
Ultimately, nonlinear is recommended because it has the least potential for artifacts and distortion of the data. However, linear can also be useful to speed up deconvolution, especially with native quadrupole time-of-flight (QtoF) mass spectra where it is common to have broad peaks that are highly sampled.