You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I include month/year/day for convenience, but they can be calculated from the timestamp, so I'm not sure whether or not those should be included in the data. Some have no images, some have no content.
As for the actual article text itself, I'm not sure whether or not that should be included in osrsbox. It could similar to the items, where the resource with every news item in 1 file wont have the text, but if youre fetching 1 specific article/month, it will.
Also, with regards to scraping, it seems to ratelimit you after around 40 page visits per hour, and that ratelimit is lifted within around an hour (maybe less), so just something to keep in mind for your big initial scrape of all the old articles.
The text was updated successfully, but these errors were encountered:
@gc - I quite like this idea. This project is all about providing easily parse-able OSRS-related data, and it makes logical sense to add news posts to the current data available. Thanks for providing the schema example, and the JS code example. And most importantly the rate limiting information - that is exceptionally useful. As a side note - it might be better to source the raw text from the OSRS Wiki, to avoid such intense rate limiting. Would have to compare the raw text coverage between each source to ensure the full data is available. I am also not sure if the OSRS Wiki has all news posts, or just the weekly update news posts. Will have to investigate.
Unfortunately, with the large number of things on the development list it might be a while before I can start looking into this. I have some free time at the moment for some development, but some fine tuning to the item database, and the addition of the quest and monster database will probably take precedence.
I suggest news articles are added.
Here is my TS typing (for the schema) that I'm using:
I include month/year/day for convenience, but they can be calculated from the
timestamp
, so I'm not sure whether or not those should be included in the data. Some have no images, some have no content.As for the actual article text itself, I'm not sure whether or not that should be included in osrsbox. It could similar to the items, where the resource with every news item in 1 file wont have the text, but if youre fetching 1 specific article/month, it will.
If its helpful, heres:
my code for scraping them: https://github.com/gc/oldschooljs/blob/master/src/lib/Structures/News.ts
my scraped data: https://github.com/gc/oldschooljs/blob/master/src/data/news/news_archive.json
Also, with regards to scraping, it seems to ratelimit you after around 40 page visits per hour, and that ratelimit is lifted within around an hour (maybe less), so just something to keep in mind for your big initial scrape of all the old articles.
The text was updated successfully, but these errors were encountered: