Perform standard correspondence analysis of two categorical variables (code module ca.py
in the folder Methods/).
Code can be used to perform correspondence analysis on any dataset that can be transformed into a pandas DataFrame (see the code ca.py
in the folder Methods/).
The method mcmca.py
can be used for correspondence analysis of dataset that could be assumed to be generated from a Markov Chain Model.
Project Ef5-4: "The evolution of Ancient Egyptian - Quantitative and Non- Quantitative Mathematical Linguistics".
Institutions: ZIB (Zuse Institute Berlin) & MATH+ (Berlin Mathematics Research Center).
python version: 3.7 or +
packages: numpy, pandas, matplotlib, matplotlib.pyplot, matplotlib.backends.backend_pdf, scipy, scipy.stats, seaborn.
You can also get all these using conda by creating a new environment with the spec file myPy3_spec.txt
(for a guidance, click here)
See official publication link here
DOI: https://doi.org/10.12752/8257
Licence: Open Source Apache 2.0
Helper.py
: performs one CA analysis (in this specific project: text vs. grammatical form)
Please enter all the inputs by following the corresponding questions/decriptions.
implementation.py
is required to obtain the CA figures.
implementation.py
can be used to modify the default figure parameter settings. For further modifications, see all the codes in folder Methods/
If the dataset is already a contingency table, then the parameter isCont
must be given as True
and the table should be transformed into a panda dataframe (see example cHelper.py
)
Excel file. In our specific project, datafile contains numerical coding of texts in Égyptien de Tradition, each single data consisting of a ten digits number encoding for the grammatical structure of a sentence (files can be downloaded here).
You can also use your own python function to clean your dataset instead of the function Cleaned_Data
in implementation.py
line 9.
Figures/ folder is the default location of figure outputs.
Click here for a higher resolution
Visualising the usual correspondence analysis results
Visualising the strenght of the association between the variables
Identify similar clusters (similarity in the strenght of the associations)
Identify similar clusters of variables (chi-square similarity)