In this 4-hour workshop, students will learn basic data processing skills using Python. Attendees will learn how to import code from other modules and packages to take advantage of the existing Python ecosystem. After seeing how to access packages, we will explore popular data analysis packages. We will see how to use NumPy to perform operations on large data arrays and how to use Matplotlib to generate clear data visualisations. We will also scratch the surface on using pandas to store data in tables. Along the way, we will discuss how to approach new, unfamiliar packages and learn how to use them.
By the end of this workshop, you should be able to:
- Import code from existing modules and packages.
- Use NumPy to easily process multidimensional data.
- Use Matplotlib to generate different types of plots to visualise data.
- Use pandas to represent data stored in tables.
- Approach a new package and explore its documentation and examples.
- Basic knowledge of Python is required.
- Attendees must be comfortable using variables for simple data types, as well as collections. Attendees should also be comfortable with loops and control flow and be familiar with the basics of using functions in Python.
- To be able to participate in the exercises, participants must either:
- Have a local installation of Python and Jupyter notebooks. Microsoft Visual Studio Code with the Python extension installed can also be used to run the Notebook.
- Have a Google Account (to run in-browser as a Colab notebook)
This workshop is intended to be interactive. Before the workshop, please download the materials from this repository. You can download the material as a ZIP file using the green button higher up on this page, or you can simply clone this repository by typing the following in a terminal:
git clone https://github.com/QLS-MiCM/DataProcessingInPython.gitTo take full advantage of this interactive workshop, you must have access to a Python environment and Jupyter Lab.
You must also install the following packages:
- NumPy
- Matplotlib
- pandas
The required steps depend on how you installed Python:
- (Recommended) If you installed minconda, you can easily install all these packages by running the following on the command line:
conda install -c conda-forge jupyterlab numpy matplotlib pandas -y- If you installed Python from the official website, you can easily install Jupyter using
pipby running the following on the command line:
pip install jupyterlab numpy matplotlib pandas- If you installed Anaconda, you already have everything you need installed.
For more details on installing Jupyter Lab, see https://jupyter.org/install.
Once you have Jupyter installed, open the Data-Processing-in-Python folder on your computer and launch Jupyter Lab by typing:
jupyter labThen you can open the Jupyter notebook files in the Exercises/scripts and Exercises/solutions folders.
If you don't want to install anything locally, you can open the workshop materials using Google Colab:
- Student version (with blank fields): https://colab.research.google.com/github/QLS-MiCM/DataProcessingInPython/blob/main/Exercises/scripts/DataProcessingPython.ipynb
- Solution version (filled out): https://colab.research.google.com/github/QLS-MiCM/DataProcessingInPython/blob/main/Exercises/solutions/DataProcessingPython.ipynb
⚠ Warning: To configure for Google Colab, make sure to set
using_colab = Truein the first code cell and run that cell to download all the data files.
For a more detailed outline, see Outline/Outline.md.
- Module 1 -- Modules and Packages
- Module 2 -- Introduction to NumPy Arrays
- Module 3 -- Visualising Data with Matplotlib
- Module 4 -- Intro to Tabular Data with Pandas
- Module 5 -- A Brief Guide to Exploring the Unknown
In developing this workshop, I largely relied on the documentation of the various projects discussed, including NumPy, Matplotlib, pandas, conda and pip, as well as the official Python documentation. I've provided links to these projects in the interactive Jupyter notebook. I've also referenced a few useful other tutorials throughout the notebook.
This workshop would also not have been possible without the professors and others who helped me on my Python journey.
This workshop is based on my previous iterations of this workshop (as Intermediate Python) and my Intro to Python workshop, which can be found at the following repositories:
- Intro to Python:
- Intermediate Python:
Colab badge created using https://shields.io.
Some cool Markdown tricks can be found at https://www.markdownguide.org/hacks/.
Workshop created as part of the McGill Initiative in Computational Medicine.
For more information about the QLS-MiCM, visit: https://www.mcgill.ca/micm/.
The contents of this repository are licensed under the Creative Commons Attribution-ShareAlike 4.0 International License.