GitHub - Muhammed-Ragab/speeches

Web Scraping Using Python and Selenium:
    Objective: Extract speeches from a website using Selenium for browser automation.
    Process: Automate web browsing, navigate to the website, interact with elements, and extract speech content from the HTML.

Cleaning Data Using Custom Stopwords List:
    Objective: Clean speech text by removing common and irrelevant words.
    Process: Create a custom stopwords list, tokenize the speech content, and filter out stopwords.

Extracting Metadata (Title, Date, Place):
    Objective: Extract additional information (title, date, place) from the speech content.
    Process: Use regular expressions to identify and extract title, date, and place details.

Categorizing Speeches According to Years or by Keywords:
    Objective: Organize speeches for better analysis, either by years or based on specific keywords.
    Process: Categorize speeches by years if timestamps are available, and assign tags or labels based on keywords of interest.

Overall Workflow Summary:
    Automate web scraping to download speeches.
    Clean text data by removing stopwords.
    Extract metadata (title, date, place) using regular expressions.
    Categorize speeches by years or keywords for efficient organization and analysis.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Presidential speeches.ipynb		Presidential speeches.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

Muhammed-Ragab/speeches

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages