first proper commit

tuonglab · Jun 30, 2023 · 86298f3 · 86298f3
1 parent 3fb170a
commit 86298f3
Show file tree

Hide file tree

Showing 9 changed files with 244 additions and 5 deletions.
diff --git a/.gitignore b/.gitignore
@@ -158,3 +158,4 @@ cython_debug/
 #  and can be added to the global gitignore or merged into this file.  For a more nuclear
 #  option (not recommended) you can uncomment the following to ignore the entire idea folder.
 #.idea/
+.DS_Store
diff --git a/README.md b/README.md
@@ -1,4 +1,38 @@
-# template_basic_python
-Basic template repository for python classroom
+# tuong_group_basic_python
+Basic template repository for python classroom for Tuong group students.
 
-This is just to set up the folders for the classrooms
+This repository holds the basic scripts/questions for basic python usage to prep you for advanced analysis in single-cell. The idea is that there are simple functions and objectives included in this repository for everyday python usage. However, I have written mistakes into the codes in the `learn` folder and it is your task to find the mistakes and suggest the solutions. Good luck and hope you learn something from this!
+
+## lists
+
+The first learning plan is on `lists`.
+
+A list is a data structure in Python that is a mutable, or changeable, ordered sequence of elements. Each element or value that is inside of a list is called an item. Just as strings are defined as characters between quotes, lists are defined by having values between square brackets `[]`
+https://www.w3schools.com/python/python_lists.asp
+
+The objective for this learning plan is to learn the simplest way to interact with python which is storing items/elements in a `list`. There are many ways to manipulate a list so this is just an introduction. We use `lists` all the time for single-cell analysis, including for inserting new metadata, forming lists of items we want to plot etc. Imagine you have 80,000 cells and you want to colour them by treatment status and also by cell-type. How would you construct this information so that we can create the relevant treatment status + cell-type information for each cell e.g. "treated_B cell", "untreated_B cell"? You can achieve this creating a `list` that holds this information!
+
+There are 4 simple functions written in `learn/learn_list.py` but the tests are failing for each of them. Can you fix them up so that the tests succeed?
+
+## dictionary
+
+The second learning plan is on `dictionary`.
+
+Dictionaries are used to store data values in key:value pairs. Dictionaries are defined by having values between square brackets `{}`
+https://www.w3schools.com/python/python_dictionaries.asp
+
+The objective of this part of the learning plan is to learn how to use dictionaries for matching/changing values to suit our needs, which is very relevant for single-cell analysis.
+Imagine you have 10 clusters that you identified but you want to give each of them a biologically meaningful name. How would you go about changing the individual names? You can achieve this using `dictionaries`!
+
+There are 2 functions written in `learn/learn_dict.py` but the tests are failing for each of them. Can you fix them up so that the tests succeed?
+
+## dataframes
+
+The third learning plan is on `dataframe`. We use the popular `pandas` package to interact with dataframes.
+
+A `pandas` `DataFrame` is a 2 dimensional data structure, like a 2 dimensional array, or a table with rows and columns.
+https://www.w3schools.com/python/pandas/pandas_dataframes.asp#:~:text=What%20is%20a%20DataFrame%3F,table%20with%20rows%20and%20columns.
+
+The objective of this part of the learning plan is to learn how to slice the `DataFrame` object so that we end up with the information that we want. This is very important for how we deal with single-cell data. Imagine that you have 10,000 cells but you only want to subset to just 2,000 of them as they are from the spleen and not any other organ. How would you go about doing this? You can achieve this by slicing the `DataFrame` to only contain the relevant cells!
+
+There are 2 functions written in `learn/learn_pandas.py` but the tests are failing for each of them. Can you fix them up so that the tests succeed?
diff --git a/learn/learn_dict.py b/learn/learn_dict.py
@@ -0,0 +1,30 @@
+ORIGINAL_LIST = ["0", "1", "2", "3", "4"]
+
+
+def convert_all() -> list:
+    """
+    This function wants to convert the numbers as a string to new labels.
+
+    Returns
+    -------
+    list
+        A list containing the new labels.
+    """
+    convert_dict = {"0": "apple", "1": "boy", "2": "pie", "3": "cheese", "4": "banana"}
+    new_list = convert_dict[ORIGINAL_LIST]
+    return new_list
+
+
+def convert_wrong() -> list:
+    """
+    This function wants to convert the numbers as a string to new labels but something goes wrong.
+    Can you spot the mistake(s)?
+
+    Returns
+    -------
+    list
+        A list containing the new labels.
+    """
+    convert_dict = {"1": "apple", "1": "pie", "2": "pie", "3": "cheese", "4": "banana"}
+    new_list = [convert_dict[x] for x in ORIGINAL_LIST]
+    return new_list
diff --git a/learn/learn_list.py b/learn/learn_list.py
@@ -0,0 +1,54 @@
+LIST1 = ["1", "2", "3", "4"]
+LIST2 = ["apple", "boy", "pie", "cheese"]
+LIST3 = ["banana", "boy", "leek", "apple"]
+
+
+def concat_lists() -> list:
+    """
+    This function is trying to concatenate the two lists together.
+    Can you tell what this current code is doing?
+
+    Returns
+    -------
+    list
+        A new list that is the combination of 2 lists.
+    """
+    return LIST1 ^ LIST2
+
+
+def merging_contents_of_two_lists() -> list:
+    """
+    This function is trying to create a new list so that I end up with
+    my output is "1_apple", "2_boy" and so on.
+    Can you tell why this is not working?
+
+    Returns
+    -------
+    list
+        A new list with outputs being "1_apple", "2_boy" and so on.
+    """
+    return [x + "_" + y for x, y in (LIST1 + LIST2)]
+
+
+def unique_items() -> list:
+    """
+    Extracting unique elements in list.
+
+    Returns
+    -------
+    list
+        A list containing only unique elements.
+    """
+    return LIST2 + LIST3
+
+
+def unique_list_format() -> list:
+    """
+    Extracting unique elements in list and output as list.
+
+    Returns
+    -------
+    list
+        A list containing only unique elements.
+    """
+    return set(LIST2 + LIST3)
diff --git a/learn/learn_pandas.py b/learn/learn_pandas.py
@@ -0,0 +1,41 @@
+import pandas as pd
+
+
+def slice_dataframe_keep() -> pd.DataFrame:
+    """
+    This function wants to slice the dataframe to just row_num 1 and 2.
+
+    Returns
+    -------
+    pd.DataFrame
+        A sliced dataframe with only 2 specific rows.
+    """
+    original_df = pd.DataFrame(
+        {
+            "row_num": ["0", "1", "2", "3"],
+            "label1": ["banana", "boy", "apple", "apple"],
+            "label2": ["apple", "pie", "pie", "cheese"],
+        }
+    )
+    new_df = original_df[original_df["row_num"] == ["1", "2"]]
+    return new_df
+
+
+def slice_dataframe_exclude() -> pd.DataFrame:
+    """
+    This function is wanting to exclude certain rows from the dataframe.
+
+    Returns
+    -------
+    pd.DataFrame
+        A sliced dataframe with without 2 specific rows.
+    """
+    original_df = pd.DataFrame(
+        {
+            "row_num": ["0", "1", "2", "3"],
+            "label1": ["banana", "boy", "apple", "apple"],
+            "label2": ["apple", "pie", "pie", "cheese"],
+        }
+    )
+    new_df = original_df[original_df["label1"].isin(["pie"])]
+    return new_df
diff --git a/requirements.txt b/requirements.txt
@@ -1,3 +1 @@
 pandas
-numpy
-seaborn
diff --git a/tests/test_dict.py b/tests/test_dict.py
@@ -0,0 +1,21 @@
+from learn import learn_dict
+
+
+def test_convert_all():
+    assert learn_dict.convert_all() == [
+        "apple",
+        "boy",
+        "pie",
+        "cheese",
+        "banana",
+    ]
+
+
+def test_convert_wrong():
+    assert learn_dict.convert_wrong() == [
+        "apple",
+        "apple",
+        "pie",
+        "pie",
+        "banana",
+    ]
diff --git a/tests/test_list.py b/tests/test_list.py
@@ -0,0 +1,45 @@
+from learn import learn_list
+
+
+def test_concat_list():
+    assert learn_list.concat_lists() == [
+        "1",
+        "2",
+        "3",
+        "4",
+        "apple",
+        "boy",
+        "pie",
+        "cheese",
+    ]
+
+
+def test_merging_contents_of_two_lists():
+    assert learn_list.merging_contents_of_two_lists() == [
+        "1_apple",
+        "2_boy",
+        "3_pie",
+        "4_cheese",
+    ]
+
+
+def test_unique_items():
+    assert learn_list.unique_items() == {
+        "leek",
+        "apple",
+        "pie",
+        "banana",
+        "boy",
+        "cheese",
+    }
+
+
+def test_unique_list_format():
+    assert learn_list.unique_list_format() == [
+        "banana",
+        "boy",
+        "cheese",
+        "leek",
+        "apple",
+        "pie",
+    ]
diff --git a/tests/test_pandas.py b/tests/test_pandas.py
@@ -0,0 +1,15 @@
+from learn import learn_pandas
+
+
+def test_slice_dataframe_keep():
+    out_df = learn_pandas.slice_dataframe_keep()
+    assert out_df.row_num == ["1", "2"]
+    assert out_df.label1 == ["boy", "apple"]
+    assert out_df.label2 == ["pie", "pie"]
+
+
+def test_slice_dataframe_exclude():
+    out_df = learn_pandas.slice_dataframe_exclude()
+    assert out_df.row_num == ["0", "3"]
+    assert out_df.label2 == ["banana", "apple"]
+    assert out_df.label1 == ["apple", "cheese"]
-Original file line number
+Diff line change
@@ -1,3 +1 @@
     pandas
-    numpy
-    seaborn