Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supporting Subsectors #1

Open
atxtechbro opened this issue Aug 12, 2021 · 0 comments
Open

Supporting Subsectors #1

atxtechbro opened this issue Aug 12, 2021 · 0 comments
Labels
good first issue Good for newcomers

Comments

@atxtechbro
Copy link
Owner

As mentioned in the README, there are certain high-population sectors such as "10001": "Accounting & Legal" which have in excess of 10,000 companies.

We do not want any companies to not be collected by our scraper. Therefore we need to construct a variable url structure and define it such that we never call more than 10,000 records with it.

https://www.glassdoor.com/seo/ajax/ugcSearch.htm?minRating=0&maxRating=5&numPerPage=100&pageRequested=99&domain=glassdoor.com&surgeHiring=false&sectorIds=10014

I would tend to think industryID would be the best way to achieve this robustly. If we can find the key, value pairs of all the industryIDs (a.k.a. subsectors) on Glassdoor then we can simply append that to the above url and then include to code to increment it and reset it when needed.

If anyone wants to help please feel free to reach out to me so I can help smooth the process. This shouldn't be that bad and I just wanted to mark this as #firstissuefriendly to give another developer the chance to contribute.

I am about to start a new project but will come back to accomplish this eventually.

numPerPage
maxRating
sectorIds
locationId
industryIds

@atxtechbro atxtechbro added the good first issue Good for newcomers label Aug 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

1 participant