Parse cache data from different page #29

tomasbedrich · 2015-07-22T09:34:35Z

Use this URL: http://www.geocaching.com/seek/cdpf.aspx?guid=182a3463-e46e-4401-8697-3ad3ac2a1a42&lc=10 to parse geocache data (possible 2 times speedup).

weinshec · 2016-09-01T17:27:58Z

Do you have any clue how to retrieve the guid of a cache in the first place without loading the usual details page? Wouldn't make sense if one has to parse the details page first 😆

FriedrichFroebel · 2016-09-01T17:50:04Z

Using the log page provides a link to the listing using the guid of the cache: https://www.geocaching.com/seek/log.aspx?wp=GC3RPVZ

tomasbedrich · 2016-09-04T11:26:07Z

It is also possible to fetch the GUID using the load_quick() method:

GET https://tiles01.geocaching.com/map.details?i=GC4HTZW
{  
   "status":"success",
   "data":[  
      {  
         "name":"Zmijovec (Amorphophallus)",
         "gc":"GC3RPVZ",
         "g":"182a3463-e46e-4401-8697-3ad3ac2a1a42",
         "available":true,
         "archived":false,
         "subrOnly":false,
         "li":false,
         "fp":"115",
         "difficulty":{  
            "text":4.5,
            "value":"4_5"
         },
         "terrain":{  
            "text":1.0,
            "value":"1"
         },
         "hidden":"11/18/2012",
         "container":{  
            "text":"Regular",
            "value":"regular.gif"
         },
         "type":{  
            "text":"Traditional Cache",
            "value":2
         },
         "owner":{  
            "text":"Lindbergh007",
            "value":"850af23d-fc83-4d3c-b93a-9e5d6ae359c9"
         }
      }
   ]
}

The question is, if it will be faster to make two lightweight requests or one heavy.

I would suggest adding a GUID parsing to the load_quick() and creating a temporary load_by_guid() method, which would populate some basic Cache info using the print page. Then we could measure what is faster.

weinshec · 2016-09-04T22:15:20Z

That sounds reasonable to me. I would like to give it try and will report about the performance comparison as soon as I have a first implementation.

weinshec · 2016-09-09T15:27:27Z

So here are some numbers... I used the timeit module to profile the call to Cache.load() and compared it to a call of Cache.load_quick() parsing the guid and subsequently calling Cache.load_by_guid() requesting the print-page. The result is based on the mean time averaging over 100 calls each.

Scenario 1 (load()): 1.28 seconds
Scenario 2 (load_quick() + load_by_guid()): 0.85 seconds

So it seems that two lightweight calls are faster than one heavy, although it's not a factor of 2. Should we go for scenario 2 and rely on two requests or stick to scenario 1 with a single request?

tomasbedrich · 2016-09-09T17:36:57Z

Nice! I think we should do some refactoring before replacing the original load() method. A big picture is to have multiple load_by_xxx() methods which would actually fill the data and a lightweight load() method which would decide which one to use.

But for now, the best you can do, is to create a pull request for a separate load_by_guid() method which would check a presence of a GUID first (possibly calling load_quick() if needed) and then scrape as much cache details as possible.

Then I would do the refactoring on my own, because it may be a little more complex.

tomasbedrich · 2018-11-25T22:36:18Z

Following discussion with @twlare from #75:

First of all, there are some gotchas regarding completeness of loaded attributes. The mentioned refactoring may help as it would allow end-users to control what is important for them therefore missing attributes wouldn't mind. Also, it may be good to check whether something hasn't changed on cache "print page" (some new/removed attributes there).

Please feel free to continue to work on #74, but it will need some rebasing on actual master.

So in summary, what is left to do: check status of the code (working?, new/removed attributes), rebase, maybe refactor and the most important step is to switch primary algorithm behind load method (which must be backwards-compatible = the same API + load the same set of attributes).

tomasbedrich · 2021-08-20T22:27:30Z

Resurrecting the thread by an email received from Dave:

I have one possible suggestion: when I have scraped in the past I have found that the most reliable way to get the cache info is to request the gpx file, which can be obtained through a very simple POST request.

s = requests.Session() cookies = [s.cookies.set(c['name'], c['value']) for c in request_cookies_browser] URL = 'https://www.geocaching.com/geocache/GC9EFK2_colorful-pairs' params = {'__EVENTTARGET':'', '__EVENTARGUMENT':'', 'ctl00$ContentBody$lnkGPXDownload': 'GPX+file'} response = s.post(URL, params) GPX = response.text

I don't know if it works for non-premium members, but it is very fast and contains almost everything you want, including (usually) about 10 logs. It could speed up the get_cache() function a lot.

tomasbedrich added task contributors friendly labels Jul 22, 2015

FriedrichFroebel mentioned this issue Oct 4, 2016

API to retrieve found and not found caches #72

Closed

weinshec mentioned this issue Oct 26, 2016

Issue29 cache loading by GUID #73

Merged

tomasbedrich added a commit that referenced this issue Dec 20, 2016

New load() method, docs, tests (closes #29)

caa30e6

tomasbedrich mentioned this issue Dec 20, 2016

WIP: New loading #74

Closed

tomasbedrich self-assigned this Jul 19, 2017

tomasbedrich mentioned this issue Nov 6, 2018

UTFGrid parsing is not working #75

Open

tomasbedrich removed their assignment Nov 25, 2018

Lomanic mentioned this issue Sep 25, 2020

Re(add) static map feature cgeo/cgeo#7333

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parse cache data from different page #29

Parse cache data from different page #29

tomasbedrich commented Jul 22, 2015

weinshec commented Sep 1, 2016 •

edited

Loading

FriedrichFroebel commented Sep 1, 2016

tomasbedrich commented Sep 4, 2016

weinshec commented Sep 4, 2016

weinshec commented Sep 9, 2016

tomasbedrich commented Sep 9, 2016 •

edited

Loading

tomasbedrich commented Nov 25, 2018

tomasbedrich commented Aug 20, 2021

Parse cache data from different page #29

Parse cache data from different page #29

Comments

tomasbedrich commented Jul 22, 2015

weinshec commented Sep 1, 2016 • edited Loading

FriedrichFroebel commented Sep 1, 2016

tomasbedrich commented Sep 4, 2016

weinshec commented Sep 4, 2016

weinshec commented Sep 9, 2016

tomasbedrich commented Sep 9, 2016 • edited Loading

tomasbedrich commented Nov 25, 2018

tomasbedrich commented Aug 20, 2021

weinshec commented Sep 1, 2016 •

edited

Loading

tomasbedrich commented Sep 9, 2016 •

edited

Loading