add howto for developers of new parser #2

azrdev · 2013-08-11T22:12:12Z

I built a parser similar to the ones in this repo, and using your pyopenmensa. How would I (or anyone else) continue now, send a pull request and wait? If a new parser appears here, how will its output get into the openmensa database? (Yes, that last question is not really your concern, but I didn't find any answer to it, and I suppose you did)

Might be a good idea to also write this up in the readme for anybody else :-)

mswart · 2013-08-26T18:41:01Z

Sorry, for the late response but I had the hope to find the time to add needed documentation to http://openmensa.org and http://doc.openmensa.org until I answer you. A short version for now:

OpenMensa relies on parser maintained by the community to provide the canteen menu data. The data are received via HTTP from specific URLs that must return a standardized XML structure (http://doc.openmensa.org/feed/v2/ - the example is better than the textual description). It does not matter how this XML is created. I used python for this task, but every other languages are also possible.

If you provide an email address in your OpenMensa profile you will get the developer status. This allows you the add new parsers ("Meine Mensa" menu point in your profile). This means some meta information (name, city, address and the URL where the data should be received from).

One comment to the URL: A parser needs a main URl that provides all data. At 8 am OpenMensa merges the new data with theses from the last days. To be able to handle menu changes on the day itself, a parse can specify a additional today URL. After the main URL was merged, this today URL is opened every hour to provide updates for the current day (therefore today url). An short example day and the requests from OpenMensa:

08:00: http://example.org/data.xml -> OpenMensa requests all data
09:00: http://example.org/today.xml -> updates for the current day
10:00: http://example.org/today.xml -> updates for the current day
11:00: http://example.org/today.xml -> updates for the current day
12:00: http://example.org/today.xml -> updates for the current day
13:00: http://example.org/today.xml -> updates for the current day
14:00: http://example.org/today.xml -> updates for the current day

As developer of OpenMensa I wanted to fill the data base with parsers for some main cities. I develop these parsers in this repository. They are served on http://omfeeds.devtation.de/$CITY_NAME/$CANTEEN_NAME.xml
I had not added a README because it was not the main task of this repository to merge all parsers. More important I wanted to document my parsers and use the issue. To reduces the redundancy between the parsers a created a some python library (pyopenmensa - https://github.com/mswart/pyopenmensa) to handle the common task to create the XML feed.

azrdev · 2013-08-26T23:34:53Z

First, thanks for your feedback! I updated the PR accordingly, and would be very glad if you integrated it into your collection of parsers already hosted, as you proposed.
Be aware that I have to wait for an explicitly "closed" day to handle the HTML they provide in such case, so it might break if that occurs the first time on the webpage. Just drop me a note if that happens and I miss it.

Second, you've done a good readup above. For the time until you consider it done & publish it, I think the most important information for possible contributors are:

Additional parsers probably need to run somewhere else (than the openmensa infrastructure). If you create an account at openmensa.org and supply an email address, you can enter the url to your parsers output, which will then be added to the database.

Third, a note on your API: If I would host a parser myself, the today-URL would be optional, but as I'm integrating with your code, it isn't, correct?
Please add to your documentation what will happen if no today-URL is specified: will the main URL be called every hour, or only daily?
Then, am I right there is no way for the parser to supply the name of the canteens? As you pointed out, the 'canteens' dict in config.py only contains url components, and as the Feed Builder is missing it, there seems to be no facility to supply a readable name of either the whole mensa or its canteens.

Last, a style note: I'm quite new to python, so perhaps this is idiomatic, but I found your frequent use of **catchall parameters quite confusing, especially in the last line of config.py

So long

mswart · 2013-08-27T20:44:02Z

If no today URL is provided OpenMensa stops fetching after the first round (mostly directly after 8). But the today URL can be the same as the normal URL, and in this case OpenMensa queries the URL hourly.

The following algorithm illustrates the OpenMensa behavior:

def fetch_single_canteen(): # called for every canteen at midnight
    sleep(8*3600); # start at 8 am to fetch
    # on every day start with main fetch, retry hourly on errors:
    successfully_fetch = False
    while not successfully_fetch:
        successfully_fetch = fetch_main_url()
        sleep(3600)
    # if today URL provides, look hourly for updates:
    while today_url_defined() and now().hour < 15:
        fetch_today_url()
        sleep(3600)

The URL handling code in this repo (wsgihandler.py) supports normal URL and today URL for every canteen. In both cases the parse function of the referenced parser is called with a flag indicating which URL was used. So these parser can behavior different on both cases (but must not). Additional there is no need to specify the today URL in OpenMensa - in this case OpenMensa does not know that the parser supports today URLs and does not use them.

At the moment where is now easy way to get a canteen identifier. I wanted to specify the differences between the canteens inside the config.py. Do you need this possibility?

The **catchall usage is not the most intuitive python feature but it allows a more generic way in parameter handling while keeping the python internal parameter handling like errors on additional unknown parameters intact. In the config.py file it is needed to be able to pass arbitrary parameters to the parse functions (I think i need or needed every variance for at least one parser).

mswart · 2014-09-25T16:59:35Z

@azrdev I have extended the README. I am interested in your feedback - personal I am unsure whether all needed information are contained and what is the best structure.

azrdev · 2014-09-25T17:20:31Z

@mswart It's long ago, so I don't remember every detail… didn't look at it in the last months. :-)
But looks good!
IMHO you should add a word how the provider-canteen relationship is exported to the website and the app, i.e. if people searching for $city would get a grouped (by provider) list, or only all the canteens (in which case their names should include the city name again, which might double with the provider name) - I think that's one point where I was confused.

You might want to do a spell check, too ;-)

klemens · 2019-11-13T21:33:13Z

I guess we can close this ticket. 😉

mswart added a commit that referenced this issue Sep 25, 2014

Extend README / documentation - #2

b431ffb

j-maas added the enhancement label Jan 10, 2018

klemens closed this as completed Nov 13, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add howto for developers of new parser #2

add howto for developers of new parser #2

azrdev commented Aug 11, 2013

mswart commented Aug 26, 2013

azrdev commented Aug 26, 2013

mswart commented Aug 27, 2013

mswart commented Sep 25, 2014

azrdev commented Sep 25, 2014

klemens commented Nov 13, 2019

add howto for developers of new parser #2

add howto for developers of new parser #2

Comments

azrdev commented Aug 11, 2013

mswart commented Aug 26, 2013

azrdev commented Aug 26, 2013

mswart commented Aug 27, 2013

mswart commented Sep 25, 2014

azrdev commented Sep 25, 2014

klemens commented Nov 13, 2019