Web Scraping Using Python and Selenium:
Objective: Extract speeches from a website using Selenium for browser automation.
Process: Automate web browsing, navigate to the website, interact with elements, and extract speech content from the HTML.
Cleaning Data Using Custom Stopwords List:
Objective: Clean speech text by removing common and irrelevant words.
Process: Create a custom stopwords list, tokenize the speech content, and filter out stopwords.
Extracting Metadata (Title, Date, Place):
Objective: Extract additional information (title, date, place) from the speech content.
Process: Use regular expressions to identify and extract title, date, and place details.
Categorizing Speeches According to Years or by Keywords:
Objective: Organize speeches for better analysis, either by years or based on specific keywords.
Process: Categorize speeches by years if timestamps are available, and assign tags or labels based on keywords of interest.
Overall Workflow Summary:
Automate web scraping to download speeches.
Clean text data by removing stopwords.
Extract metadata (title, date, place) using regular expressions.
Categorize speeches by years or keywords for efficient organization and analysis.
-
Notifications
You must be signed in to change notification settings - Fork 0
Muhammed-Ragab/speeches
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published