Skip to content

[GSoC] Project: Federated Phenotyping and Patient Representation Learning #23

Closed
@MarcioPorto

Description

Title: Federated Phenotyping and Patient Representation Learning Using SyferText
Mentor: Marcio Porto
Level: Intermediate

Description

In the clinical setting, the goal of electronic phenotyping is to identify whether someone might have a given medical condition based on their medical record. An enormous amount of medical data is stored as unstructured text, making it very time-consuming for clinicians to review that information manually. Applications of NLP in this domain can help with early condition diagnosis as well as with time and cost savings for health organizations. Furthermore, by leveraging medical records from multiple institutions using federated learning, we can train more accurate phenotyping models while preserving patient privacy and protecting health organizations from issues related to sharing this type of sensitive data.

GSoC Task

The goal of this task is to use SyferText and PySyft to train two models, as discussed in Two-stage Federated Phenotyping and Patient Representation Learning. Both models are trained in a federated setting that simulates a number of health organizations. The first model learns a vector representation from patient records which effectively encodes essential information about a patient. The second model takes this vector representation and performs the actual phenotyping. In essence, this consitutues a multi-task learning problem, where the goal of the first model is to create the best possible patient representation such that it is flexible enough to be used for a variety of downstream tasks (in this case, phenotyping).

An ideal final output of this project would be a blog post and/or Colab notebook describing how to use SyferText and PySyft for this task to the wider community.

Completing this project will require adding new features to SyferText, which will benefit future users of the library and will also show that the library is ready to tackle real-world use cases in the healthcare space and beyond.

Required Skills

  • Prior experience in NLP
  • Familiarity with Python
  • Experience using deep learning frameworks like PyTorch and TensorFlow
  • Familiarity with PySyft and privacy-preserving concepts such as federated learning

Application

If this task peaks your interest, submit your GSoC application here and join the #gsoc channel on the OpenMined Slack and let us know.

References

  • Two-stage Federated Phenotyping and Patient Representation Learning (link)

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions