Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add howto for developers of new parser #2

Closed
azrdev opened this issue Aug 11, 2013 · 6 comments
Closed

add howto for developers of new parser #2

azrdev opened this issue Aug 11, 2013 · 6 comments

Comments

@azrdev
Copy link
Collaborator

azrdev commented Aug 11, 2013

I built a parser similar to the ones in this repo, and using your pyopenmensa. How would I (or anyone else) continue now, send a pull request and wait? If a new parser appears here, how will its output get into the openmensa database? (Yes, that last question is not really your concern, but I didn't find any answer to it, and I suppose you did)

Might be a good idea to also write this up in the readme for anybody else :-)

@mswart
Copy link
Owner

mswart commented Aug 26, 2013

Sorry, for the late response but I had the hope to find the time to add needed documentation to http://openmensa.org and http://doc.openmensa.org until I answer you. A short version for now:

OpenMensa relies on parser maintained by the community to provide the canteen menu data. The data are received via HTTP from specific URLs that must return a standardized XML structure (http://doc.openmensa.org/feed/v2/ - the example is better than the textual description). It does not matter how this XML is created. I used python for this task, but every other languages are also possible.

If you provide an email address in your OpenMensa profile you will get the developer status. This allows you the add new parsers ("Meine Mensa" menu point in your profile). This means some meta information (name, city, address and the URL where the data should be received from).

One comment to the URL: A parser needs a main URl that provides all data. At 8 am OpenMensa merges the new data with theses from the last days. To be able to handle menu changes on the day itself, a parse can specify a additional today URL. After the main URL was merged, this today URL is opened every hour to provide updates for the current day (therefore today url). An short example day and the requests from OpenMensa:

As developer of OpenMensa I wanted to fill the data base with parsers for some main cities. I develop these parsers in this repository. They are served on http://omfeeds.devtation.de/$CITY_NAME/$CANTEEN_NAME.xml
I had not added a README because it was not the main task of this repository to merge all parsers. More important I wanted to document my parsers and use the issue. To reduces the redundancy between the parsers a created a some python library (pyopenmensa - https://github.com/mswart/pyopenmensa) to handle the common task to create the XML feed.

@azrdev
Copy link
Collaborator Author

azrdev commented Aug 26, 2013

First, thanks for your feedback! I updated the PR accordingly, and would be very glad if you integrated it into your collection of parsers already hosted, as you proposed.
Be aware that I have to wait for an explicitly "closed" day to handle the HTML they provide in such case, so it might break if that occurs the first time on the webpage. Just drop me a note if that happens and I miss it.

Second, you've done a good readup above. For the time until you consider it done & publish it, I think the most important information for possible contributors are:

Additional parsers probably need to run somewhere else (than the openmensa infrastructure). If you create an account at openmensa.org and supply an email address, you can enter the url to your parsers output, which will then be added to the database.

Third, a note on your API: If I would host a parser myself, the today-URL would be optional, but as I'm integrating with your code, it isn't, correct?
Please add to your documentation what will happen if no today-URL is specified: will the main URL be called every hour, or only daily?
Then, am I right there is no way for the parser to supply the name of the canteens? As you pointed out, the 'canteens' dict in config.py only contains url components, and as the Feed Builder is missing it, there seems to be no facility to supply a readable name of either the whole mensa or its canteens.

Last, a style note: I'm quite new to python, so perhaps this is idiomatic, but I found your frequent use of **catchall parameters quite confusing, especially in the last line of config.py

So long

@mswart
Copy link
Owner

mswart commented Aug 27, 2013

If no today URL is provided OpenMensa stops fetching after the first round (mostly directly after 8). But the today URL can be the same as the normal URL, and in this case OpenMensa queries the URL hourly.

The following algorithm illustrates the OpenMensa behavior:

def fetch_single_canteen(): # called for every canteen at midnight
    sleep(8*3600); # start at 8 am to fetch
    # on every day start with main fetch, retry hourly on errors:
    successfully_fetch = False
    while not successfully_fetch:
        successfully_fetch = fetch_main_url()
        sleep(3600)
    # if today URL provides, look hourly for updates:
    while today_url_defined() and now().hour < 15:
        fetch_today_url()
        sleep(3600)

The URL handling code in this repo (wsgihandler.py) supports normal URL and today URL for every canteen. In both cases the parse function of the referenced parser is called with a flag indicating which URL was used. So these parser can behavior different on both cases (but must not). Additional there is no need to specify the today URL in OpenMensa - in this case OpenMensa does not know that the parser supports today URLs and does not use them.

At the moment where is now easy way to get a canteen identifier. I wanted to specify the differences between the canteens inside the config.py. Do you need this possibility?

The **catchall usage is not the most intuitive python feature but it allows a more generic way in parameter handling while keeping the python internal parameter handling like errors on additional unknown parameters intact. In the config.py file it is needed to be able to pass arbitrary parameters to the parse functions (I think i need or needed every variance for at least one parser).

mswart added a commit that referenced this issue Sep 25, 2014
@mswart
Copy link
Owner

mswart commented Sep 25, 2014

@azrdev I have extended the README. I am interested in your feedback - personal I am unsure whether all needed information are contained and what is the best structure.

@azrdev
Copy link
Collaborator Author

azrdev commented Sep 25, 2014

@mswart It's long ago, so I don't remember every detail… didn't look at it in the last months. :-)
But looks good!
IMHO you should add a word how the provider-canteen relationship is exported to the website and the app, i.e. if people searching for $city would get a grouped (by provider) list, or only all the canteens (in which case their names should include the city name again, which might double with the provider name) - I think that's one point where I was confused.

You might want to do a spell check, too ;-)

@klemens
Copy link
Collaborator

klemens commented Nov 13, 2019

I guess we can close this ticket. 😉

@klemens klemens closed this as completed Nov 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants