Skip to content

Commit 82c9ad1

Browse files
committed
first commit
0 parents  commit 82c9ad1

8 files changed

+756
-0
lines changed

.gitignore

+131
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,131 @@
1+
2+
# Created by https://www.gitignore.io/api/macos,pycharm,jupyternotebook,visualstudiocode
3+
# Edit at https://www.gitignore.io/?templates=macos,pycharm,jupyternotebook,visualstudiocode
4+
5+
### JupyterNotebook ###
6+
.ipynb_checkpoints
7+
*/.ipynb_checkpoints/*
8+
9+
# Remove previous ipynb_checkpoints
10+
# git rm -r .ipynb_checkpoints/
11+
#
12+
13+
### macOS ###
14+
# General
15+
.DS_Store
16+
.AppleDouble
17+
.LSOverride
18+
19+
# Icon must end with two \r
20+
Icon
21+
22+
# Thumbnails
23+
._*
24+
25+
# Files that might appear in the root of a volume
26+
.DocumentRevisions-V100
27+
.fseventsd
28+
.Spotlight-V100
29+
.TemporaryItems
30+
.Trashes
31+
.VolumeIcon.icns
32+
.com.apple.timemachine.donotpresent
33+
34+
# Directories potentially created on remote AFP share
35+
.AppleDB
36+
.AppleDesktop
37+
Network Trash Folder
38+
Temporary Items
39+
.apdisk
40+
41+
### PyCharm ###
42+
# Covers JetBrains IDEs: IntelliJ, RubyMine, PhpStorm, AppCode, PyCharm, CLion, Android Studio and WebStorm
43+
# Reference: https://intellij-support.jetbrains.com/hc/en-us/articles/206544839
44+
45+
# User-specific stuff
46+
.idea/**/workspace.xml
47+
.idea/**/tasks.xml
48+
.idea/**/usage.statistics.xml
49+
.idea/**/dictionaries
50+
.idea/**/shelf
51+
52+
# Generated files
53+
.idea/**/contentModel.xml
54+
55+
# Sensitive or high-churn files
56+
.idea/**/dataSources/
57+
.idea/**/dataSources.ids
58+
.idea/**/dataSources.local.xml
59+
.idea/**/sqlDataSources.xml
60+
.idea/**/dynamic.xml
61+
.idea/**/uiDesigner.xml
62+
.idea/**/dbnavigator.xml
63+
64+
# Gradle
65+
.idea/**/gradle.xml
66+
.idea/**/libraries
67+
68+
# Gradle and Maven with auto-import
69+
# When using Gradle or Maven with auto-import, you should exclude module files,
70+
# since they will be recreated, and may cause churn. Uncomment if using
71+
# auto-import.
72+
# .idea/modules.xml
73+
# .idea/*.iml
74+
# .idea/modules
75+
76+
# CMake
77+
cmake-build-*/
78+
79+
# Mongo Explorer plugin
80+
.idea/**/mongoSettings.xml
81+
82+
# File-based project format
83+
*.iws
84+
85+
# IntelliJ
86+
out/
87+
88+
# mpeltonen/sbt-idea plugin
89+
.idea_modules/
90+
91+
# JIRA plugin
92+
atlassian-ide-plugin.xml
93+
94+
# Cursive Clojure plugin
95+
.idea/replstate.xml
96+
97+
# Crashlytics plugin (for Android Studio and IntelliJ)
98+
com_crashlytics_export_strings.xml
99+
crashlytics.properties
100+
crashlytics-build.properties
101+
fabric.properties
102+
103+
# Editor-based Rest Client
104+
.idea/httpRequests
105+
106+
# Android studio 3.1+ serialized cache file
107+
.idea/caches/build_file_checksums.ser
108+
109+
### PyCharm Patch ###
110+
# Comment Reason: https://github.com/joeblau/gitignore.io/issues/186#issuecomment-215987721
111+
112+
# *.iml
113+
# modules.xml
114+
# .idea/misc.xml
115+
# *.ipr
116+
117+
# Sonarlint plugin
118+
.idea/sonarlint
119+
120+
### VisualStudioCode ###
121+
.vscode/*
122+
!.vscode/settings.json
123+
!.vscode/tasks.json
124+
!.vscode/launch.json
125+
!.vscode/extensions.json
126+
127+
### VisualStudioCode Patch ###
128+
# Ignore all local history of files
129+
.history
130+
131+
# End of https://www.gitignore.io/api/macos,pycharm,jupyternotebook,visualstudiocode

README.md

+45
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
<img src="https://bit.ly/2VnXWr2" alt="Ironhack Logo" width="100"/>
2+
3+
# Lab | String Operations and Bag of Words
4+
5+
## Introduction
6+
7+
In this lab, we will learn how to manipulate strings. There are two challenges: 1) to practice how to manipulate strings, and 2) to use string manipulation techniques to create Bag of Words (BoW). BoW is an essential technique in Natural Language Processing.
8+
9+
### Getting Started
10+
11+
In your Terminal, navigate into the directory `your-code` of this lab that contains `challenge-1.ipynb`, `challenge-2.ipynb`, `doc1.txt`, `doc2.txt`, and `doc3.txt`. Start Jupyter Notebook by executing `jupyter notebook`. A webpage should automatically open for you but in case not, go to [http://localhost:8888](http://localhost:8888). Then click the link to each ipynb file to complete the challenges.
12+
13+
## Deliverables
14+
15+
`challenge-1.ipynb` and `challenge-2.ipynb` with your responses.
16+
17+
## Submission
18+
19+
Upon completion, add your deliverables to git. Then commit git and push your branch to the remote.
20+
21+
## Resources
22+
23+
* [The `re` Library](https://docs.python.org/3/library/re.html)
24+
25+
* [F-strings](https://www.python.org/dev/peps/pep-0498/)
26+
27+
* [Regular Expressions](https://developers.google.com/edu/python/regular-expressions)
28+
29+
* [Python Input and Output (how to read file content)](https://docs.python.org/3/tutorial/inputoutput.html)
30+
31+
* [How to Remove Punctuation in Python String](https://www.quora.com/How-do-I-remove-punctuation-from-a-Python-string)
32+
33+
* [Convert String to Lowercase in Python](https://docs.python.org/3/library/stdtypes.html#str.lower)
34+
35+
* [Break Python String into Array](https://docs.python.org/3/library/stdtypes.html#str.split)
36+
37+
* [What is Text Corpus?](https://en.wikipedia.org/wiki/Text_corpus)
38+
39+
* [A Gentle Introduction to the Bag-of-Words Model](https://machinelearningmastery.com/gentle-introduction-bag-words-model/)
40+
41+
## Additional Reading
42+
43+
If you are a research-type person, you will find [this article](http://rstb.royalsocietypublishing.org/content/royptb/366/1567/1101.full.pdf) interesting. Scientists used techniques based on BoW to calculate the frequency of words used cross 17 world languages. They found there is a consistent pattern in terms of the frequency of words being used in human languages. Some mad scientists even [want to use this technique to analyze dolphin language](http://grantome.com/grant/NSF/PHY-1530544) because they believe they can build corpora based on the sounds dolphins make, correlate the dolphin language corpora with human language corpora, and potentially understand what dolphins speak. :astonished: :astonished: :astonished:
44+
45+
Data analytics is now entering almost every discipline and profession. You will want to reflect on how you will apply your data analytics skills to the fields you are familiar with -- in creative ways. There are tons of fun secrets waiting for you to discover with data analytics.

0 commit comments

Comments
 (0)