Skip to content

Muhammed-Ragab/speeches

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 

Repository files navigation

Web Scraping Using Python and Selenium:
    Objective: Extract speeches from a website using Selenium for browser automation.
    Process: Automate web browsing, navigate to the website, interact with elements, and extract speech content from the HTML.

Cleaning Data Using Custom Stopwords List:
    Objective: Clean speech text by removing common and irrelevant words.
    Process: Create a custom stopwords list, tokenize the speech content, and filter out stopwords.

Extracting Metadata (Title, Date, Place):
    Objective: Extract additional information (title, date, place) from the speech content.
    Process: Use regular expressions to identify and extract title, date, and place details.

Categorizing Speeches According to Years or by Keywords:
    Objective: Organize speeches for better analysis, either by years or based on specific keywords.
    Process: Categorize speeches by years if timestamps are available, and assign tags or labels based on keywords of interest.

Overall Workflow Summary:
    Automate web scraping to download speeches.
    Clean text data by removing stopwords.
    Extract metadata (title, date, place) using regular expressions.
    Categorize speeches by years or keywords for efficient organization and analysis.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published