Skip to content

Commit

Permalink
first proper commit
Browse files Browse the repository at this point in the history
  • Loading branch information
zktuong committed Jun 30, 2023
1 parent 3fb170a commit 86298f3
Show file tree
Hide file tree
Showing 9 changed files with 244 additions and 5 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -158,3 +158,4 @@ cython_debug/
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/
.DS_Store
40 changes: 37 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,38 @@
# template_basic_python
Basic template repository for python classroom
# tuong_group_basic_python
Basic template repository for python classroom for Tuong group students.

This is just to set up the folders for the classrooms
This repository holds the basic scripts/questions for basic python usage to prep you for advanced analysis in single-cell. The idea is that there are simple functions and objectives included in this repository for everyday python usage. However, I have written mistakes into the codes in the `learn` folder and it is your task to find the mistakes and suggest the solutions. Good luck and hope you learn something from this!

## lists

The first learning plan is on `lists`.

A list is a data structure in Python that is a mutable, or changeable, ordered sequence of elements. Each element or value that is inside of a list is called an item. Just as strings are defined as characters between quotes, lists are defined by having values between square brackets `[]`
https://www.w3schools.com/python/python_lists.asp

The objective for this learning plan is to learn the simplest way to interact with python which is storing items/elements in a `list`. There are many ways to manipulate a list so this is just an introduction. We use `lists` all the time for single-cell analysis, including for inserting new metadata, forming lists of items we want to plot etc. Imagine you have 80,000 cells and you want to colour them by treatment status and also by cell-type. How would you construct this information so that we can create the relevant treatment status + cell-type information for each cell e.g. "treated_B cell", "untreated_B cell"? You can achieve this creating a `list` that holds this information!

There are 4 simple functions written in `learn/learn_list.py` but the tests are failing for each of them. Can you fix them up so that the tests succeed?

## dictionary

The second learning plan is on `dictionary`.

Dictionaries are used to store data values in key:value pairs. Dictionaries are defined by having values between square brackets `{}`
https://www.w3schools.com/python/python_dictionaries.asp

The objective of this part of the learning plan is to learn how to use dictionaries for matching/changing values to suit our needs, which is very relevant for single-cell analysis.
Imagine you have 10 clusters that you identified but you want to give each of them a biologically meaningful name. How would you go about changing the individual names? You can achieve this using `dictionaries`!

There are 2 functions written in `learn/learn_dict.py` but the tests are failing for each of them. Can you fix them up so that the tests succeed?

## dataframes

The third learning plan is on `dataframe`. We use the popular `pandas` package to interact with dataframes.

A `pandas` `DataFrame` is a 2 dimensional data structure, like a 2 dimensional array, or a table with rows and columns.
https://www.w3schools.com/python/pandas/pandas_dataframes.asp#:~:text=What%20is%20a%20DataFrame%3F,table%20with%20rows%20and%20columns.

The objective of this part of the learning plan is to learn how to slice the `DataFrame` object so that we end up with the information that we want. This is very important for how we deal with single-cell data. Imagine that you have 10,000 cells but you only want to subset to just 2,000 of them as they are from the spleen and not any other organ. How would you go about doing this? You can achieve this by slicing the `DataFrame` to only contain the relevant cells!

There are 2 functions written in `learn/learn_pandas.py` but the tests are failing for each of them. Can you fix them up so that the tests succeed?
30 changes: 30 additions & 0 deletions learn/learn_dict.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
ORIGINAL_LIST = ["0", "1", "2", "3", "4"]


def convert_all() -> list:
"""
This function wants to convert the numbers as a string to new labels.
Returns
-------
list
A list containing the new labels.
"""
convert_dict = {"0": "apple", "1": "boy", "2": "pie", "3": "cheese", "4": "banana"}
new_list = convert_dict[ORIGINAL_LIST]
return new_list


def convert_wrong() -> list:
"""
This function wants to convert the numbers as a string to new labels but something goes wrong.
Can you spot the mistake(s)?
Returns
-------
list
A list containing the new labels.
"""
convert_dict = {"1": "apple", "1": "pie", "2": "pie", "3": "cheese", "4": "banana"}
new_list = [convert_dict[x] for x in ORIGINAL_LIST]
return new_list
54 changes: 54 additions & 0 deletions learn/learn_list.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
LIST1 = ["1", "2", "3", "4"]
LIST2 = ["apple", "boy", "pie", "cheese"]
LIST3 = ["banana", "boy", "leek", "apple"]


def concat_lists() -> list:
"""
This function is trying to concatenate the two lists together.
Can you tell what this current code is doing?
Returns
-------
list
A new list that is the combination of 2 lists.
"""
return LIST1 ^ LIST2


def merging_contents_of_two_lists() -> list:
"""
This function is trying to create a new list so that I end up with
my output is "1_apple", "2_boy" and so on.
Can you tell why this is not working?
Returns
-------
list
A new list with outputs being "1_apple", "2_boy" and so on.
"""
return [x + "_" + y for x, y in (LIST1 + LIST2)]


def unique_items() -> list:
"""
Extracting unique elements in list.
Returns
-------
list
A list containing only unique elements.
"""
return LIST2 + LIST3


def unique_list_format() -> list:
"""
Extracting unique elements in list and output as list.
Returns
-------
list
A list containing only unique elements.
"""
return set(LIST2 + LIST3)
41 changes: 41 additions & 0 deletions learn/learn_pandas.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
import pandas as pd


def slice_dataframe_keep() -> pd.DataFrame:
"""
This function wants to slice the dataframe to just row_num 1 and 2.
Returns
-------
pd.DataFrame
A sliced dataframe with only 2 specific rows.
"""
original_df = pd.DataFrame(
{
"row_num": ["0", "1", "2", "3"],
"label1": ["banana", "boy", "apple", "apple"],
"label2": ["apple", "pie", "pie", "cheese"],
}
)
new_df = original_df[original_df["row_num"] == ["1", "2"]]
return new_df


def slice_dataframe_exclude() -> pd.DataFrame:
"""
This function is wanting to exclude certain rows from the dataframe.
Returns
-------
pd.DataFrame
A sliced dataframe with without 2 specific rows.
"""
original_df = pd.DataFrame(
{
"row_num": ["0", "1", "2", "3"],
"label1": ["banana", "boy", "apple", "apple"],
"label2": ["apple", "pie", "pie", "cheese"],
}
)
new_df = original_df[original_df["label1"].isin(["pie"])]
return new_df
2 changes: 0 additions & 2 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1 @@
pandas
numpy
seaborn
21 changes: 21 additions & 0 deletions tests/test_dict.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
from learn import learn_dict


def test_convert_all():
assert learn_dict.convert_all() == [
"apple",
"boy",
"pie",
"cheese",
"banana",
]


def test_convert_wrong():
assert learn_dict.convert_wrong() == [
"apple",
"apple",
"pie",
"pie",
"banana",
]
45 changes: 45 additions & 0 deletions tests/test_list.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
from learn import learn_list


def test_concat_list():
assert learn_list.concat_lists() == [
"1",
"2",
"3",
"4",
"apple",
"boy",
"pie",
"cheese",
]


def test_merging_contents_of_two_lists():
assert learn_list.merging_contents_of_two_lists() == [
"1_apple",
"2_boy",
"3_pie",
"4_cheese",
]


def test_unique_items():
assert learn_list.unique_items() == {
"leek",
"apple",
"pie",
"banana",
"boy",
"cheese",
}


def test_unique_list_format():
assert learn_list.unique_list_format() == [
"banana",
"boy",
"cheese",
"leek",
"apple",
"pie",
]
15 changes: 15 additions & 0 deletions tests/test_pandas.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
from learn import learn_pandas


def test_slice_dataframe_keep():
out_df = learn_pandas.slice_dataframe_keep()
assert out_df.row_num == ["1", "2"]
assert out_df.label1 == ["boy", "apple"]
assert out_df.label2 == ["pie", "pie"]


def test_slice_dataframe_exclude():
out_df = learn_pandas.slice_dataframe_exclude()
assert out_df.row_num == ["0", "3"]
assert out_df.label2 == ["banana", "apple"]
assert out_df.label1 == ["apple", "cheese"]

0 comments on commit 86298f3

Please sign in to comment.