Skip to content

unexpected data in *_drugs.tsv #319

Open
Open
@ymahlich

Description

@ymahlich

Three improve_drug_ids resolve to two different canSMILES strings:

  • SMI_20644
  • SMI_9830
  • SMI_55606

SMI_20644

improve_drug_id canSMILES
SMI_20644 C1CN(P(=O)(OC1)NCCCl)CCCl
SMI_20644 NaN

SMI_9830

improve_drug_id canSMILES
SMI_9830 CC1C(C(CC(O1)OC2C(OC(CC2O)OC3C(OC(CC3O)OC4CCC5...
SMI_9830 NaN

SMI_55606

improve_drug_id canSMILES
SMI_55606 COCCOC1=C(C=C2C(=C1)C(=NC=N2)NC3=CC=CC(=C3)C#C...
SMI_55606 COCCOC1=C(C=C2C(=C1)C(=NC=N2)NC3=CC=CC(=C3)C

Three improve_drug_ids resolve to NaN canSMILES:

improve_drug_id canSMILES
SMI_56588 NaN
SMI_9830 NaN
SMI_20644 NaN
  • SMI_9830 & SMI_20644 are an overlap with the "two canSMILES" per drug_id
  • SMI_56588 only resolves to NaN

Once I have figured out which dataset this information is coming from I will update the Issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    invalidThis doesn't seem right

    Type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions