Threads.net emotional Scraper

Description

This repository contains a Python script that demonstrates how to use the Threads.net scraper to collect product data and product search information. The script utilizes the Threads.net API through the Scrapfly service. The results are saved in the ./results/ directory.

Prerequisites

Before running the script, make sure you have the Scrapfly API key. Set the environment variable $SCRAPFLY_KEY with your Scrapfly API key:

Linux/Mac

export SCRAPFLY_KEY="your key from https://scrapfly.io/dashboard"

Windows

$env:SCRAPFLY_KEY="your key from https://scrapfly.io/dashboard"

Next, here are the dependencies you need to install using Poetry:

Project Dependencies

pip install poetry scrapfly-sdk pysentimiento openpyxl pandas pysentimiento

Project Dependencies (Run in the project directory)
```
poetry install
```

Make sure to run these commands in the Poetry-created virtual environment for your project. If the code uses any custom library, make sure to add it using poetry add.

Also, ensure that you have Python installed in your environment; in my case, I am using python-3.11.6.

Usage

Clone the repository to your local machine.
Set the Scrapfly API key as described in the prerequisites.
Run the script:
```
poetry run python run.py
```

Script Details

`run.py`

This script initiates the Threads.net scraper and saves the results in the ./results/ directory. It includes a list of example URLs for scraping. You can add more URLs as needed.

`addons/main.py`

This script processes JSON files, performs data extraction, transformation, sentiment analysis, and stores the final data in Excel format.

`addons/data_extraction.py`

This module provides functions to extract data from JSON files.

`addons/data_transformation.py`

This module transforms data, adding additional columns with default values.

`addons/sentiment_analysis.py`

This module analyzes the sentiment of the text using the PySentimiento library.

`addons/data_storage.py`

This module handles the storage of the final data in Excel format in the ./xlsx/ directory.

Instructions

Run poetry run python run.py to collect data from Threads.net. Make sure you have added the URLs previously in the urls variable within the run.py file.
Find the final results in the xlsx/Final.xlsx file.

GPT/gpy.py

This Python script leverages the OpenAI GPT-3.5 Turbo model for sentiment analysis. It takes a CSV file containing text data, processes each entry through the OpenAI API, and appends the predicted sentiment to the dataset. This step is optional since the emotion analysis is already done by the PySentimiento library.

Prerequisites

Before running the script, ensure you have the necessary environment variables set in a .env file:

OPENAI_API_KEY=your_openai_api_key

Also, make sure you have the required Python packages installed:

pip install requests python-dotenv openai

Usage

Add your OpenAI API key to the .env file.
Prepare a CSV file (Final.csv) with a column named 'Texto' containing the text data.
Run the script:

python gpy.py

Notes

The script assumes the input CSV file has a 'Texto' column for sentiment analysis.
Predicted sentiments are appended to the 'Concepto asociado' column.

Name		Name	Last commit message	Last commit date
Latest commit History 125 Commits
GPT		GPT
addons		addons
csv		csv
notebook		notebook
results		results
xlsx		xlsx
LICENSE		LICENSE
README.md		README.md
filtered_data.json		filtered_data.json
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
run.py		run.py
test.py		test.py
threads.py		threads.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Threads.net emotional Scraper

Description

Prerequisites

Linux/Mac

Windows

Usage

Script Details

`run.py`

`addons/main.py`

`addons/data_extraction.py`

`addons/data_transformation.py`

`addons/sentiment_analysis.py`

`addons/data_storage.py`

Instructions

GPT/gpy.py

Prerequisites

Usage

Notes

About

Releases

Packages

Languages

License

Riddikuluz/Threads.net-emotional-Scraper

Folders and files

Latest commit

History

Repository files navigation

Threads.net emotional Scraper

Description

Prerequisites

Linux/Mac

Windows

Usage

Script Details

run.py

addons/main.py

addons/data_extraction.py

addons/data_transformation.py

addons/sentiment_analysis.py

addons/data_storage.py

Instructions

GPT/gpy.py

Prerequisites

Usage

Notes

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

`run.py`

`addons/main.py`

`addons/data_extraction.py`

`addons/data_transformation.py`

`addons/sentiment_analysis.py`

`addons/data_storage.py`

Packages