generated from carpentries/workbench-template-md
-
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathEpisode1.Rmd
276 lines (197 loc) · 12.6 KB
/
Episode1.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
---
title: 'Write Readable Code'
teaching: 15
exercises: 2
---
:::::::::::::::::::::::::::::::::::::: questions
- What is readable code?
- Why invest effort in writing readable code?
::::::::::::::::::::::::::::::::::::::::::::::::
::::::::::::::::::::::::::::::::::::: objectives
- Define readable code
- What is the difference between readable and unreadable code
- What to think when writing code for people to read
::::::::::::::::::::::::::::::::::::::::::::::::
## Introduction
The expression _readable code_ might sound strange at first. Code is meant to be written and executed, right? Why should it be readable? After all, code is for a machine to perform, and a machine does not “read” as we do. So why bother?
The untold truth is that whenever someone wants to add a new feature or modify an existing one, they need to understand how the existing code works first, so they need to read the code to see how it works. That person might be another developer or collaborator, or it might be you months later. If a code is messy, unstructured and without documentation, reading and understanding it becomes more complicated and time-consuming. On the other hand, well-written code reads like a well-written novel: easy to follow and easy to understand.
So, writing readable code from scratch reduces the need for extensive explanations later.
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: instructor
Ask the learners about what they think about the following code. The point is to make them read it and tell how easy/difficult it is. The function visualizes a spiral, where the radius of each point increases linearly from the centre outward, and the points are positioned according to their respective angles in the given input array.
To guide the discussion, use some of the following questions
What do you think this code does?
- Can you use it?
- What do r, x, and y represent in this code? Are these names intuitive?
What to point out:
- There is no information about the input parameter: should it be a list? A single value?
- The names of the function, variables are poorly chosen.
- The function does not handle the output well. It creates points in polar coordinates, but it does not return them. It shows the plot without returning it or allowing for extra steps.
d
To run the function (if necessary):
```python
t = np.linspace(0,4*np.pi, 1000)
f(t)
```
**Suggestion**: Take note of the main points you might need for the following discussions.
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
::::::::::::::::::::::::::::::::::::: discussion
## Discussion: What do you think about the following code?
Without additional context, can you easily understand what the function is intended to do?
```python
import numpy as np
import matplotlib.pyplot as plt
def f(t):
r = np.linspace(0,1,len(t))
x = r * np.cos(t)
y = r * np.sin(t)
plt.plot(x,y)
plt.show()
```
::::::::::::::::::::::::::::::::
## What is **readable code**?
A readable code is a code that clearly shows what it does and why without much effort.
To make our code readable, we should focus on
- **Clarity**: Everything in the code, from variable names to functions, is straightforward and unambiguous.
- **Simplicity**: It should be as simple as possible.
- **Structure**: The code should be well organised into sections (e.g. functions, modules).
- **Minimal redundancy**: Reducing code duplications makes the code more efficient and easier to modify.
- **Documentation**: Writing clear code is the first step, but it does not always explain the "why" behind it. In-line comments or external documentation might explain the reasoning and intent.
::::::::::::::::::::::::::::::::::::: discussion
## Discussion: What would you do to improve the following code?
```python
import numpy as np
import matplotlib.pyplot as plt
def f(t):
r = np.linspace(0,1,len(t))
x = r * np.cos(t)
y = r * np.sin(t)
plt.plot(x,y)
plt.show()
```
::::::::::::::::::::::::::::::::
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: instructor
**Require Live coding**
The aim of the above discussion is to modify and improve the code together, focusing on the points identified in the previous discussion.
The simpler solution might be just to focus on the names and add some comments.
```python
#Plot a spiral in polar coordianate
#for a given list of theta values
def plot_spiral(theta):
# Creates an array of evenly spaced radius values from 0 to 1.
# The final array has the same number of element than theta
radius = np.linspace(0,1,len(theta))
# Converts polar coordinates to Cartesian
x_coordinate = radius * np.cos(theta)
y_coordinate = radius * np.sin(theta)
plt.plot(x_coordinate,y_coordinate)
plt.show()
```
The point is to show that simple modification can improve code readability.
A better version can be:
```python
import numpy as np
import matplotlib.pyplot as plt
# Calculate polar coordinate for an array of angles.
def calculate_polarcoordinate(theta): #Meaningful names: changed names of the function and input argument
# Creates an array of evenly spaced radius values from 0 to 1.
# The final array has the same number of element than theta
radius = np.linspace(0,1,len(theta))
# Converts polar coordinates to Cartesian
x_coordinate = radius * np.cos(theta)
y_coordinate = radius * np.sin(theta)
return x_coordinate, y_coordinate #Better output handling
#plot a spiral in polar coordinate
def visualise_spiral_polar_coordinates(x_coordinate,y_coordinate): #split data and visualisation
plt.plot(x_coordinate, y_coordinate)
plt.xlabel('x')
plt.ylabel('y')
plt.title('Spiral in Polar Coordinates')
plt.show()
```
**Focus on the change**:
- Better variable and function names make the code self-explanatory
- Separation of Concern: The original function has been split in two so that it is clear what happens
- Let them focus on the fact that the `radius` definition depends on two magic numbers, 0 and 1.
- This example can be used to introduce the DRY and SoC principles.
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
## Why does structured code improve readability?
Imagine reading a document like the one below
::::::::::::::::::::::::::::::::::::::callout
##
This is a document that has no punctuation or structure it just keeps going on and on without any breaks or organization the words run together endlessly making it very difficult to understand what the author is trying to communicate the text flows continuously without any separation between ideas or concepts it becomes increasingly hard to follow the train of thought as the words keep flowing together in an endless stream of characters without any visual breaks or formatting to help guide the reader through the content
The text is a continuous, unbroken sequence of words. It might contain the most deep considerations or be the most interesting document, but reading it is hard.
The challenge is to find specific information, where one idea starts or ends. To modify this wall of words is even more complex: how can you be sure that changes will not affect another part?
::::::::::::::::::::::::::::::::::::::
This unstructured text is how unstructured code appears. Just as this text lacks punctuation, paragraphs, and formatting, unstructured code lacks modular organization, clear separation of functions, and logical flow control that makes structured code readable and maintainable.
Several coding design principles can help create structured code. These principles provide guidelines on what to consider and avoid when creating code. Among all, two shine for simplicity and effectiveness:
- **Separation of Concern** (**SoC**): This design principle separates a program into distinct sections. Each section (e.g., function or module) does one thing only, and it should be able to work independently. SoC helps create a clear code structure, improving the code readability and reusability. Indeed, each component can be reused across different parts of the code and projects.
- **DRY**: DRY stands for "Don't Repeat Yourself". The statement is clear: Each piece of code should only appear once. Code duplications are nightmares. Every time something needs to change, the changes must be applied in different parts of the codebase. When common logic is extracted into reusable components, any changes will only happen in one place, improving the maintainability of the code.
Let's see an example of how applying these two principles improves the code.
::::::::::::::::::::::::::::::::::::: discussion
## From unstructured to structured code
The code below generate two datasets of points and apply linear regression.
```python
import numpy as np
import matplotlib.pyplot as plt
# Dataset 1
X = np.array([1, 2, 3, 4, 5])
y = 2 * X + 1 + np.random.normal(0, 0.5, len(X))
# Calculate coefficients using np.linalg
A = np.vstack([X, np.ones(len(X))]).T
coefficients = np.linalg.lstsq(A, y, rcond=None)[0]
m, b = coefficients
# Dataset 2
X1 = np.array([1, 2, 3, 4, 5])
y1 = 1.5 * X1 + 0.5 + np.random.normal(0, 0.5, len(X))
A1 = np.vstack([X1, np.ones(len(X1))]).T
coefficients1 = np.linalg.lstsq(A1, y1, rcond=None)[0]
m1, b1 = coefficients1
# Verify all calculations give same result
print("Case 1: Slope: = ",round(m,4),"intercept =",round(b,4))
print("Case 2: Slope: = ",round(m1,4),"intercept =",round(b1,4))
```
The code works as expected and appears straightforward but violates both principles. Let us see why.
**DRY Principle violations**
The code that creates the two arrays `X` and `X1` is the same; this means that if we need to change one array, we need to apply the changes twice. Similarly, the code that calculates the slope and intercepts is repeated for both datasets.
**SoC Principle violations**
Although the operations appear sequentially, the code has no clear structure. They depend on the previous one, meaning that changing one part of the code will change everything else, so code blocks are not independent. Also, the concerns are mixed together with no clear separation.
These violations generate a lot of problems: what if you need to reuse the code? What if you need to apply to a different dataset? What if you need to repeat the same calculation for 100 datasets? Would any of these modifications be easy?
Below, a different version of the code that tries to correct these violations is presented.
```python
import numpy as np
import matplotlib.pyplot as plt
# Generate Y data using linear equation y=mx+b with Gaussian noise
def generate_data(x_values, slope, intercept):
y_values = slope * x_values + intercept + np.random.normal(0, 0.5, len(x_values))
return y_values
# Get linear regression coefficient
def fit_linear_regression(x_values, y_values):
A = np.vstack([X, np.ones(len(x_values))]).T
coefficients = np.linalg.lstsq(A, y_values, rcond=None)[0]
return
# Usage
base_X = np.array([1, 2, 3, 4, 5])
case1_y = generate_data(base_X, 2.0, 1.0)
case2_y = generate_data(base_X, 1.5, 0.5)
m, b = fit_linear_regression(base_X, case1_y)
verify_coefficients(m, b)
m1, b1 = fit_linear_regression(base_X, case2_y)
verify_coefficients(m1, b1)
```
This new version uses a single data generation function for both cases and a single linear regression function that can be applied to both datasets. These two functions fix all the DRY violations. At the same time, they also solve the SoC violations: each new function has one scope and clearly separates the data generations and model fittings.
This improved structure makes the code more maintainable, testable, and easier to extend while eliminating redundant calculations and mixed responsibilities.
:::::::::::::::::::::::::::::::::::::::
## Tips
By writing code for people to read, you improve the maintainability and usability of your code. This ensures that others (and your future self) can easily understand, modify, and extend the code.
Writing readable code requires a bit more effort, but there are a few tips that make it simpler like:
- **Be kind with your future self** you will thereby also be kind to others who come to your code.
- **Use meaningful variable names**.
- **Break down complex functions into smaller functions**.
- **Add comments to explain the "why" behind your code**.
::::::::::::::::::::::::::::::::::::: keypoints
- **Clarity**: Make sure your code is easy to understand.
- **Simplicity**: Keep the code simple and avoid unnecessary complexity.
- **Structure**: Organize your code well (e.g., use functions, classes, and modules appropriately).
- **Minimal Redundancy**: Avoid code duplication to improve efficiency.
- **Documentation**: Add comments to clarify the logic and intent behind your code.
::::::::::::::::::::::::::::::::::::::::::::::::