-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parse cache data from different page #29
Comments
Do you have any clue how to retrieve the |
Using the log page provides a link to the listing using the |
It is also possible to fetch the GUID using the
The question is, if it will be faster to make two lightweight requests or one heavy. I would suggest adding a GUID parsing to the |
That sounds reasonable to me. I would like to give it try and will report about the performance comparison as soon as I have a first implementation. |
So here are some numbers... I used the timeit module to profile the call to Scenario 1 ( So it seems that two lightweight calls are faster than one heavy, although it's not a factor of 2. Should we go for scenario 2 and rely on two requests or stick to scenario 1 with a single request? |
Nice! I think we should do some refactoring before replacing the original But for now, the best you can do, is to create a pull request for a separate Then I would do the refactoring on my own, because it may be a little more complex. |
Following discussion with @twlare from #75: First of all, there are some gotchas regarding completeness of loaded attributes. The mentioned refactoring may help as it would allow end-users to control what is important for them therefore missing attributes wouldn't mind. Also, it may be good to check whether something hasn't changed on cache "print page" (some new/removed attributes there). Please feel free to continue to work on #74, but it will need some rebasing on actual master. So in summary, what is left to do: check status of the code (working?, new/removed attributes), rebase, maybe refactor and the most important step is to switch primary algorithm behind |
Resurrecting the thread by an email received from Dave: I have one possible suggestion: when I have scraped in the past I have found that the most reliable way to get the cache info is to request the gpx file, which can be obtained through a very simple POST request.
I don't know if it works for non-premium members, but it is very fast and contains almost everything you want, including (usually) about 10 logs. It could speed up the get_cache() function a lot. |
Use this URL: http://www.geocaching.com/seek/cdpf.aspx?guid=182a3463-e46e-4401-8697-3ad3ac2a1a42&lc=10 to parse geocache data (possible 2 times speedup).
The text was updated successfully, but these errors were encountered: