README: Keyword Finder Script

This README file provides step-by-step instructions on how to set up, configure, and run the Keyword Finder script. This script processes HTML content, searches for keywords, and extracts relevant sentences and additional context.

Prerequisites

Ensure you have the following installed on your system:

Python (Version 3.8 or higher)
pip (Python's package manager)
Git (for cloning the repository)

Setup Instructions

1. Clone the Repository

To get started, clone the repository to your local machine:

git clone https://github.com/Madaocv/KristiyanY.git
cd keyword-finder-script

2. Create and Activate a Virtual Environment (Optional)

It is recommended to use a virtual environment to avoid conflicts with other Python packages:

On macOS/Linux:

python3 -m venv venv
source venv/bin/activate

On Windows:

python -m venv venv
venv\Scripts\activate

3. Install Required Dependencies

Install all necessary Python packages using pip:

pip install -r requirements.txt

Usage

Command-Line Arguments

The script accepts the following arguments:

--input_path: The path to the input CSV file containing URLs.
--input_keywords: A list of keywords to search for, passed as a stringified Python list.
--output_file: The path where the results file will be saved. Ensure this includes a valid file name and extension (e.g., .xlsx).

Example Command

python main.py --input_path="/Downloads/Template 2 - Sheet6.csv" --input_keywords='["art portfolio" , "website ideas" , "idea for a website" , "website design" , "mobile-friendly design" , "restaurant website" , "website for a restaurant" , "online store" , "website builder"]' --output_file="data/Initial_test_05_12_v6.xlsx"

Key Features

Keyword Detection:
- Extracts sentences containing specified keywords.
- Identifies whether a keyword is wrapped in <a> tags and adds this information to the output.
Contextual Sentence Extraction:
- Extracts one sentence before and after each match (when available).
Output File:
- Results are saved in an Excel file with columns for the keyword, matching sentences, context, and additional metadata.
Automatic Directory Creation:
- If the specified output directory does not exist, it will be created automatically.
Usage trics:
- User can use different combinations of (output filename & set keywords) - in order to get files with different results

Output File Structure

The output Excel file contains the following columns:

1 Keyword inside URL: True/False.
1.1 Keyword in URL: keyword name.
Response Status Code: The HTTP status code for each URL.
Keyword in text: Keywords found in the page content.
Link inside sentence: Boolean indicating if the keyword was wrapped in an <a> tag.
Sentence: Sentences containing the keywords.
Sentence -1: The sentence preceding the match.
Sentence +1: The sentence following the match.

Troubleshooting

File Not Found:
- Ensure the input file path is correct and accessible.
- Use absolute paths if running the script from a different directory.
Output File Errors:
- Make sure the --output_file argument includes a valid file name (e.g., results.xlsx).
- Ensure you have write permissions for the specified directory.
Dependencies Issues:
- Run pip install -r requirements.txt to ensure all dependencies are installed.

Notes

If using relative paths, ensure the script is run from the directory containing the main.py file.
Use Python 3.8 or higher for compatibility.

Contributing

Contributions are welcome! Feel free to submit issues or pull requests on the repository.

License

This project is licensed under the MIT License.

python main.py \
--input_path="Template2-Sheet72.csv" \
--output_file="data/5_2025_exclude_true.xlsx" \
--input_keywords='["art portfolio" , "website ideas" , "idea for a website" , "website design" , "mobile-friendly design" , "restaurant website" , "website for a restaurant" , "online store" , "website builder"]' \
--title_keywords='["art portfolio" , "website ideas" , "idea for a website" , "website design" , "mobile-friendly design" , "restaurant website" , "website for a restaurant" , "online store" , "website builder"]' \
--exclude_h_and_true=True

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
.gitignore		.gitignore
README.md		README.md
fetch_google_api.py		fetch_google_api.py
fetch_google_api_keys.py		fetch_google_api_keys.py
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

README: Keyword Finder Script

Prerequisites

Setup Instructions

1. Clone the Repository

2. Create and Activate a Virtual Environment (Optional)

On macOS/Linux:

On Windows:

3. Install Required Dependencies

Usage

Command-Line Arguments

Example Command

Key Features

Output File Structure

Troubleshooting

Notes

Contributing

License

About

Releases

Packages

Languages

Madaocv/KristiyanY

Folders and files

Latest commit

History

Repository files navigation

README: Keyword Finder Script

Prerequisites

Setup Instructions

1. Clone the Repository

2. Create and Activate a Virtual Environment (Optional)

On macOS/Linux:

On Windows:

3. Install Required Dependencies

Usage

Command-Line Arguments

Example Command

Key Features

Output File Structure

Troubleshooting

Notes

Contributing

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages