Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document reading tabular pedigree formats into sgkit #1012

Open
timothymillar opened this issue Jan 30, 2023 · 3 comments
Open

Document reading tabular pedigree formats into sgkit #1012

timothymillar opened this issue Jan 30, 2023 · 3 comments
Labels
documentation Improvements or additions to documentation IO Issues related to reading and writing common third-party file formats

Comments

@timothymillar
Copy link
Collaborator

We don't currently have any IO functionality for pedigree formats. These are usually tabular but can be quite variable. We should document how to read in some generic examples and add them to an sgkit style dataset.

Basic workflow:

  • Read tabular format as pandas dataframe
  • Assign sample identifiers to the sample_id variable
  • Assign parental columns to the parent_id variable
  • Optionally set coords for the parents dim (['Father', 'Mother'], ['Sire', 'Dam'], etc.)
  • Use parent_indices to generate the parents array and explain the 0-based indexing etc.
  • Do something interesting like calculating kinship.
@timothymillar timothymillar added the documentation Improvements or additions to documentation label Jan 30, 2023
@timothymillar timothymillar added the IO Issues related to reading and writing common third-party file formats label Jan 30, 2023
@timothymillar
Copy link
Collaborator Author

@jeromekelleher suggests also adding "Pedigree statistics" top level section to the documentation https://github.com/pystatgen/sgkit/issues/1025#issuecomment-1436755539.

I was actually just looking for documentation this morning answering a question about computing inbreeding from simulations

Could you give some more detail on these simulations?

@jeromekelleher
Copy link
Collaborator

Discussion here: tskit-dev/tskit#2711

@timothymillar
Copy link
Collaborator Author

This is mostly covered by #1072, but I might leave it open for now as I think there is room for more pedigree specific documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation IO Issues related to reading and writing common third-party file formats
Projects
None yet
Development

No branches or pull requests

2 participants