UT-Email-Web-Scraper

What is this for?

Web Scraper for consolidating a bunch of emails from various UT Austin departments. The emails are being collected using the Beautiful Soup and Selenium Python libraries.

Todo List:

Implementing the Selenium Edge Driver
Extracting emails from all the pages (where it can be done without selenium)
Figure out In-N-Out Selenium navigation method
Extracting emails from most of the Liberal Arts Directories
Adding all the emails to the drive
Figuring out how to automate the email sending process (?)

Checklist:

Liberal Arts Checklist:

Faculty pages that need to be handled with Selenium

College of Education
Mccombs School of Business
College of Pharamacy
Dell Medical School
LBJ School of Public Affairs
Moody College of Communication
School of Architecture
School of Information

Faculty pages that don't need Selenium (Maybe just a way to go the next page)

Cockrell School of Engineering
College of Fine Arts
College of Natural Sciences
Jackson School of Geosciences
Graduate School Staff
School of Law
School of Nursing
Steve Hicks School of Social Work

Special Cases

The College of Liberal Arts

The college of liberal arts encapsulates multiple departments (school of anthropology, history, linguistics, etc.). Each indiviudal department can be parsed without using Selenium, however, accessing each faculty page in a timely manner would best be done using Selenium.

Additional Notes:

The actual code for the Faculty pages that REQUIRE Selenium is pretty messy, since there's few uniform naming conventions, and so a lot of special keywords (particularly in the form of Regular Expressions) had to be used in those cases. Going into files, Driver.py is pretty much the 'main' file here, handling most of our web driver operations. HTMLParser.py contains our helper object just to make the code easier to parse, whilst CleanFile.py is for cleaning up our files we've parsed, i.e., dealing with duplicate emails or clearing up empty space.

Extract Emails & Names Using Google Takeout

What is this for?

We need to recover all lost emails during the transition from MailChimp to Brevo.

TODO List:

Using Google Takeout to extract emails with the proper tag
Learning the mbox format's structure
Writing a parser that streams through the mbox file
Extract names and emails
Writing extracted results to a .csv file

Name		Name	Last commit message	Last commit date
Latest commit History 75 Commits
code		code
documents		documents
takeout_code		takeout_code
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

UT-Email-Web-Scraper

What is this for?

Web Scraper for consolidating a bunch of emails from various UT Austin departments. The emails are being collected using the Beautiful Soup and Selenium Python libraries.

Todo List:

Checklist:

Liberal Arts Checklist:

Faculty pages that need to be handled with Selenium

Faculty pages that don't need Selenium (Maybe just a way to go the next page)

Special Cases

Additional Notes:

Extract Emails & Names Using Google Takeout

What is this for?

We need to recover all lost emails during the transition from MailChimp to Brevo.

TODO List:

About

Uh oh!

Releases

Packages

Languages

License

KaytchJam/UT-Email-Web-Scraper

Folders and files

Latest commit

History

Repository files navigation

UT-Email-Web-Scraper

What is this for?

Web Scraper for consolidating a bunch of emails from various UT Austin departments. The emails are being collected using the Beautiful Soup and Selenium Python libraries.

Todo List:

Checklist:

Liberal Arts Checklist:

Faculty pages that need to be handled with Selenium

Faculty pages that don't need Selenium (Maybe just a way to go the next page)

Special Cases

Additional Notes:

Extract Emails & Names Using Google Takeout

What is this for?

We need to recover all lost emails during the transition from MailChimp to Brevo.

TODO List:

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages