A set of functions for pedigree analysis, designed for use with data from the GÉNÉO portal. Based on the functionality of GENLIB: see the article by Gauvin et al. (2015) <doi:10.1186/s12859-015-0581-5>.
The GENLIB reference manual and this README file are sufficient to learn how to use GeneaKit. In addition, documentation is available for all functions through the help() function, e.g. help(gen.phi).
- Easily port R code using GENLIB into Python code using GeneaKit;
- Integrate with Python libraries such as Pandas and NumPy;
- Provide speed and convenience;
- Present a modular structure for further development.
- Create a pedigree structure from a file or
DataFrame; - Output a pedigree as a
DataFrame; - Identify individuals in a pedigree, such as probands and founders;
- Extract a subpedigree from a pedigree;
- Describe a pedigree, such as the number of individuals and its completeness;
- Compute information about a pedigree, such as the pairwise kinship coefficients of probands and the genetic contributions of ancestors;
- (Eventually) Simulate information about pedigrees and individuals.
This software was tested with Python 3.10+, used mainly on Linux 5.14.0 x64 (with GCC 12) and developed with Python 3.13.7 on macOS 26.0 ARM64 (with Clang 17.0.0). It was not tested on Windows, but in theory should be compatible with Windows Subsystem for Linux (WSL). Otherwise OpenMP may cause the compilation to fail.
-
Clone this repository,
cdinto it, then runpip install .while running a virtual Python environment. Alternatively, without cloning, run:pip install https://github.com/Genopop/geneakit/archive/main.zipBoth options install two packages,
geneakitandcgeneakit(used by the former internally), and their dependencies. -
If OpenMP is found during installation, the
geneakit.phi()function will run in parallel. If you use macOS, you may need to follow these instructions to enable OpenMP. -
On a MacBook Air M3, it took about four seconds for the remote
pip installto complete the installation.
-
If the pedigree is loaded from a file, using
geneakit.genealogy("path/to/pedigree.csv"), the file must start with an irrelevant line (such asind father mother sex) and the following lines must contain, as digits, each individual's ID, their father's ID (0if unknown), their mother's ID (0if unknown), and their sex (0if unknown,1if male,2if female), in that order. Each information must be separated by anything but digits (tabs, spaces, commas, etc.), with one line per individual. -
Three datasets come from the GENLIB source code:
geneakit.geneaJi,geneakit.genea140andgeneakit.pop140. They are part of the project for testing and practice. More information on these datasets is available in the GENLIB reference manual. They may be loaded usinggeneakit.genealogy(geneakit.geneaJi), etc. -
You may also load the pedigree from a Pandas DataFrame, for instance:
import geneakit as gen
import pandas as pd
inds = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
fathers = [0, 0, 0, 1, 1, 0, 3, 3, 6, 6]
mothers = [0, 0, 0, 2, 2, 0, 4, 4, 5, 5]
sexes = [1, 2, 1, 2, 2, 1, 2, 1, 1, 2]
df = pd.DataFrame({'ind': inds, 'father': fathers,
'mother': mothers, 'sex': sexes})
ped = gen.genealogy(df)The function calls are almost verbatim copies of GENLIB's. For instance:
# With GENLIB
library(GENLIB)
data(genea140)
ped <- gen.genealogy(genea140)
pro <- gen.pro(ped)
phi <- gen.phi(ped, pro=pro)
mean <- gen.phiMean(phi)
mrca <- gen.findMRCA(ped, c(802424, 868572))
dist <- gen.find.Min.Distance.MRCA(mrca)
out <- gen.genout(ped, sorted=TRUE)# With GeneaKit
import geneakit as gen
from geneakit import genea140
ped = gen.genealogy(genea140)
pro = gen.pro(ped)
phi = gen.phi(ped, pro=pro)
mean = gen.phiMean(phi)
mrca = gen.findMRCA(ped, [802424, 868572])
dist = gen.find_Min_Distance_MRCA(mrca)
out = gen.genout(ped, sorted=True)After the virtual Python environment is activated (e.g. with source venv/bin/activate), run the following commands.
from time import time
import geneakit as gen # Import the package
from geneakit import genea140 # Locate the sample dataset
ped = gen.genealogy(genea140) # Load the genealogy
pro = gen.pro(ped) # Identify the probands
start = time()
phi = gen.phi(ped, pro=pro) # Compute all pairwise kinship coefficients between the probands
mean = gen.phiMean(phi) # Compute the mean kinship coefficient
end = time()
print(mean)
print(f"The computation took {end-start:.3f} seconds.")The mean kinship coefficient should be 0.0011437357709631094.
On a MacBook Air M3, the computation took about 3 seconds. As a comparison, the equivalent computation in R takes about 3 minutes on the same computer.
| Function | Description |
|---|---|
gen.graph |
Pedigree graphical tool |
gen.simuHaplo |
Gene dropping simulations - haplotypes |
gen.simuHaplo_convert |
Convert proband simulation results into sequence data given founder haplotypes |
gen.simuHaplo_IBD_compare |
Compare proband haplotypes for IBD sharing |
gen.simuHaplo_traceback |
Trace inheritance path for results from gene dropping simulation |
gen.simuProb |
Gene dropping simulations - Probabilities |
gen.simuSample |
Gene dropping simulations - Sample |
gen.simuSampleFreq |
Gene dropping simulations - Frequencies |
gen.simuSet |
Gene dropping simulations with specified transmission probabilities |