You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As mentioned in the README, there are certain high-population sectors such as "10001": "Accounting & Legal" which have in excess of 10,000 companies.
We do not want any companies to not be collected by our scraper. Therefore we need to construct a variable url structure and define it such that we never call more than 10,000 records with it.
I would tend to think industryID would be the best way to achieve this robustly. If we can find the key, value pairs of all the industryIDs (a.k.a. subsectors) on Glassdoor then we can simply append that to the above url and then include to code to increment it and reset it when needed.
If anyone wants to help please feel free to reach out to me so I can help smooth the process. This shouldn't be that bad and I just wanted to mark this as #firstissuefriendly to give another developer the chance to contribute.
I am about to start a new project but will come back to accomplish this eventually.
As mentioned in the README, there are certain high-population sectors such as "10001": "Accounting & Legal" which have in excess of 10,000 companies.
We do not want any companies to not be collected by our scraper. Therefore we need to construct a variable url structure and define it such that we never call more than 10,000 records with it.
https://www.glassdoor.com/seo/ajax/ugcSearch.htm?minRating=0&maxRating=5&numPerPage=100&pageRequested=99&domain=glassdoor.com&surgeHiring=false§orIds=10014
I would tend to think industryID would be the best way to achieve this robustly. If we can find the key, value pairs of all the industryIDs (a.k.a. subsectors) on Glassdoor then we can simply append that to the above url and then include to code to increment it and reset it when needed.
If anyone wants to help please feel free to reach out to me so I can help smooth the process. This shouldn't be that bad and I just wanted to mark this as #firstissuefriendly to give another developer the chance to contribute.
I am about to start a new project but will come back to accomplish this eventually.
numPerPage
maxRating
sectorIds
locationId
industryIds
The text was updated successfully, but these errors were encountered: