Skip to content

Commit 8048ea2

Browse files
OmarOmar
authored andcommitted
First commit
0 parents  commit 8048ea2

24 files changed

+3364
-0
lines changed

.gitignore

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
exploration
2+
.env
3+
data_dumps
4+
data
5+
*.png
6+
*.csv
7+
sync_db.sh
8+
.DS_Store
9+
Makefile
10+
*.pyc
11+
__pycache__
12+
poetry.lock
13+
containers/scrapper/benchhmark
14+
demo_nomic.py

LICENSE

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
MIT License
2+
3+
Copyright (c) 2024 Omar Mohammed
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

README.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
# HackerNews analysis - How to use it?
2+
3+
Running the scraping script is very simple. You just need to run the following command:
4+
5+
```shell
6+
cd ./scrapper
7+
poetry run python main.py
8+
```
9+
10+
The analysis is spread across multiple scripts.
11+
12+
```shell
13+
cd ./analysis
14+
poetry run python core_analysis.py # Main visualization
15+
poetry run python thematic_extraction.py # Use GPT-4 to extract the topics of interest
16+
poetry run python thematic_analysis.py # Analyze the topics extracted and visualize them
17+
```

analysis/.env_template

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
OPENAI_API_KEY=
2+
HUGGINGFACE_AUTH_BEARER_TOKEN=
3+
MONGO_DB_URL=
4+
OPENAI_ORG_ID=
5+
DB_NAME=
6+
NOMIC_AI_KEY=

0 commit comments

Comments
 (0)