Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ADD] FileConverter FeatureXML to OMS #17

Open
wants to merge 4 commits into
base: idfile-integration
Choose a base branch
from

Conversation

oliveralka
Copy link

No description provided.

@oliveralka
Copy link
Author

Should it be possible to convert legacy featureXML? Currently it is failing.

Error:
score type must have a name (as part of its CV term)
https://github.com/hendrikweisser/OpenMS/blob/idfileintegration/src/openms/source/METADATA/ID/IdentificationData.cpp#L200

I am using the OpenMS/src/tests/topp/AssayGeneratorMetabo_ams_input.featureXML as test case.

@oliveralka
Copy link
Author

oliveralka commented Oct 5, 2021

Ok, I think the error mentioned above is specific for metabolomics .feautreXML.
This is due to a non-filled ProteinIdentification, which is needed to allow for the storage of PeptideIdentification/PeptideHits to the features.

	<IdentificationRun id="PI_0" date="2021-10-04T18:07:27" search_engine="AccurateMassSearch" search_engine_version="2.6.0-pre-idf-ams-2021-10-04">
		<SearchParameters db="CustomDB" db_version="0.0" taxonomy="" mass_type="monoisotopic" charges="" enzyme="unknown_enzyme" missed_cleavages="0" precursor_peak_tolerance="5" precursor_peak_tolerance_ppm="true" peak_mass_tolerance="0" peak_mass_tolerance_ppm="false" >
		</SearchParameters>
		<ProteinIdentification score_type="" higher_score_better="true" significance_threshold="0">
			<UserParam type="stringList" name="spectra_data" value="[file://I:\OpenSWATH_Metabolomics_data\20181121_full_data\04_PestMixes_individually_Solvent_DDA_20-50/PestMix1_1ngSolventDDA20-50.wiff]"/>
		</ProteinIdentification>
	</IdentificationRun>

This is needed in the importIDs function:
https://github.com/hendrikweisser/OpenMS/blob/idfile-integration/src/openms/source/METADATA/ID/IdentificationDataConverter.cpp#L67

Setting a dummy score in the exportIDs function:
https://github.com/hendrikweisser/OpenMS/blob/idfile-integration/src/openms/source/METADATA/ID/IdentificationDataConverter.cpp#L543

Leads to further isses in the importIDs function, such as:

Progress of 'converting peptide identifications':
Warning: Trying to import PeptideHit without a sequence. This should not happen!
-- done [took 0.00 s (CPU), 0.00 s (Wall)] --
Error: Unexpected internal error (error inserting data:  Parameter count mismatch)
<Warning: Trying to import PeptideHit without a sequence. This should not happen!> occurred 12 times

IMHO it would be best to allow empty score types (at least in this case) - what do you think @hendrikweisser?

Edit:
Maybe this can also be fixed with the "correct" settings in the AccurateMassSearchEnginge. I am setting the ProcessingSteps, so I am not sure why there is no score_type attached to the ProteinIdentification.
https://github.com/OpenMS/OpenMS/blob/idf_ams/src/openms/source/ANALYSIS/ID/AccurateMassSearchEngine.cpp#L656

@hendrikweisser
Copy link
Owner

Thanks for the investigation, @oliveralka!

IMHO it would be best to allow empty score types (at least in this case) - what do you think @hendrikweisser?

I don't think that would really solve the problem. As you've seen, with a dummy score you just run into the next issue further on ("PeptideHit without a sequence").

I think we should improve how the empty ProteinIdentification (which is only needed for legacy technical reasons) is handled. If there are no ProteinHits and score_type is empty, no ScoreType should be created. (Caveat: I haven't looked at the code again to see how this is handled currently.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants