Skip to content

Commit edb14a0

Browse files
committed
Update in README
1 parent 4c97c3e commit edb14a0

File tree

2 files changed

+86
-18
lines changed

2 files changed

+86
-18
lines changed

Dockerfile

+2-1
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,8 @@ RUN apk update
55
RUN apk add postgresql
66
RUN chown postgres:postgres /run/postgresql/
77
# Install requirements
8-
RUN pip install -r requirements.txt
8+
COPY ./requirements.txt /tmp
9+
RUN pip install -r /tmp/requirements.txt
910
# For psycopg2
1011
RUN apk add --virtual postgresql-deps libpq-dev
1112
# Create directories

README.md

+84-17
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,48 @@
1-
# DataEngineering-Workshop1
2-
### Workshop 1 Agenda
3-
**Prerequisites**
1+
# Data Engineering Workshop
42

5-
Linux Machine
6-
Docker
7-
Python 3.10
8-
PostgreSQL 13
9-
Beautifulsoup
10-
urllib2
11-
requests
12-
git
3+
One Day workshop on understanding Docker, Web Scrapping, Regular Expressions, PostgreSQL and Git.
134

14-
1. **Introduction to Docker.**
5+
## Prerequisite
6+
7+
##### Any Linux machine/VM with following packages installed
8+
- Python 3.6 or above
9+
- [docker-ce](https://docs.docker.com/engine/install/ubuntu/)
10+
- [docker-compose](https://docs.docker.com/compose/install/)
11+
- pip3
12+
- git (any recent version)
13+
- PostgreSQL 13
14+
- psycopg2
15+
- bs4
16+
- urllib2
17+
- requests
18+
19+
##### GitHub account
20+
- Create an account on [GitHub](https://github.com/join) (if you don't already have one)
21+
- Fork [this](https://github.com/UniCourt/DataEngineering-Workshop1) repository and then clone it to your machine
22+
- You can refer [this](https://docs.github.com/en/get-started/quickstart/fork-a-repo) guide to understand how to fork and clone
23+
24+
## What will you learn by the end of this workshop?
25+
- By the end of this workshop you will learn how to build docker image and it's usage.
26+
- You will learn how to scrape a website using urllib/requests and Beautifulsoup.
27+
- You will learn Regular Expressions and how to work with it.
28+
- You will learn key features of PostgreSQL.
29+
- You will learn how to dockerize your project.
30+
31+
## Schedule
32+
| Time | Topics
33+
| ----------------------- |-------
34+
| 09:00 - 11:00 | [`Introduction to Docker`](#Introduction-to-Docker)
35+
| 11:00 - 01:00 | [`Introduction to Webscrapping.`](#Introduction-to-Webscrapping)
36+
| 1:00 - 2:00 | `Break`
37+
| 02:00 - 03:00 | [`Introduction to PostgreSQL`](#Introduction-to-PostgreSQL)
38+
| 03:30 - 04:00 | [`Dockerizing a project`]
39+
| 04:00 - 04:30 | [`Introduction to Github`](#Introduction-to-Github)
40+
| 04:30 - 05:00 | `Q & A and Wrapping Up`
41+
42+
43+
## Workshop 1 Agenda
44+
45+
1. ###Introduction to Docker.
1546

1647
- Building the Docker image for Worker using python:3.10.2-alpine3.15
1748

@@ -33,9 +64,9 @@
3364

3465
2)Goto the directory where you created Dockerfile
3566

36-
Docker build -t Simple_python
67+
docker build ./ -t Simple_python
3768

38-
2. **Introduction to Webscrapping.**
69+
2. ###Introduction to Webscrapping.
3970
- **Beautifulsoup**
4071
- *Introduction*
4172
Beautiful Soup is a python package which allows us to pull data out of HTML and XML documents.
@@ -165,7 +196,7 @@
165196

166197

167198

168-
3. **Introduction to PostgreSQL.**
199+
3. ###Introduction to PostgreSQL.
169200
- **Key Features of PostgreSQL**.
170201
- Free to download
171202
- Compatible with Data Integrity
@@ -245,7 +276,7 @@
245276
Goto the directory where you created Dockerfile
246277
Docker build -t simple_python
247278

248-
4. **Introduction to Github.**
279+
4. ###Introduction to Github.
249280
- **Setting up github**.
250281

251282
Make a repository in GitHub
@@ -290,6 +321,42 @@
290321

291322
The git config command is used initially to configure the user.name and user.email. This specifies what email id and username will be used from a local repository.
292323

293-
5. **Workshop 1 Home Work.**
324+
5. ###Webscrapping with docker.
325+
- Create a new docker file.
326+
327+
FROM python:3.10.2-alpine3.15
328+
# Create directories
329+
RUN mkdir -p /root/workspace/src
330+
COPY ./web_scraping_sample.py /root/workspace/src
331+
# Switch to project directory
332+
WORKDIR /root/workspace/src
333+
334+
- Create a docker-compose file.
335+
336+
version: "3"
337+
services:
338+
pyhton_service:
339+
build:
340+
context: ./
341+
dockerfile: Dockerfile
342+
image: workshop1
343+
container_name: workshop_python_container
344+
stdin_open: true # docker attach container_id
345+
tty: true
346+
ports:
347+
- "8000:8000"
348+
volumes:
349+
- .:/app
350+
- Get the containers up.
351+
352+
docker-compose up -d
353+
354+
- Login to the container.
355+
356+
docker exec -it python_service sh
357+
- Run the script for web scrapping inside the container.
358+
359+
python web_scraping_sample.py
294360

361+
6. ###Workshop 1 Home Work.
295362
A PR should be given where the data is scrapped from Lorem Ipsum - All the facts - Lipsum generator[ Lorem Ipsum - All the facts - Lipsum generator](https://www.lipsum.com/) website and save each section from that page in the database.

0 commit comments

Comments
 (0)