Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

News Articles #127

Open
gc opened this issue Jul 9, 2019 · 1 comment
Open

News Articles #127

gc opened this issue Jul 9, 2019 · 1 comment
Labels
data Label to identify updating or adding any data help wanted

Comments

@gc
Copy link

gc commented Jul 9, 2019

I suggest news articles are added.

Here is my TS typing (for the schema) that I'm using:

export interface NewsItem {
	title: string;
	link: string;
	image?: string;
	category: string;
	month: number;
	year: number;
	day: number;
	timestamp: number;
}

I include month/year/day for convenience, but they can be calculated from the timestamp, so I'm not sure whether or not those should be included in the data. Some have no images, some have no content.

As for the actual article text itself, I'm not sure whether or not that should be included in osrsbox. It could similar to the items, where the resource with every news item in 1 file wont have the text, but if youre fetching 1 specific article/month, it will.

If its helpful, heres:

my code for scraping them: https://github.com/gc/oldschooljs/blob/master/src/lib/Structures/News.ts

my scraped data: https://github.com/gc/oldschooljs/blob/master/src/data/news/news_archive.json

Also, with regards to scraping, it seems to ratelimit you after around 40 page visits per hour, and that ratelimit is lifted within around an hour (maybe less), so just something to keep in mind for your big initial scrape of all the old articles.

@osrsbox
Copy link
Owner

osrsbox commented Jul 9, 2019

@gc - I quite like this idea. This project is all about providing easily parse-able OSRS-related data, and it makes logical sense to add news posts to the current data available. Thanks for providing the schema example, and the JS code example. And most importantly the rate limiting information - that is exceptionally useful. As a side note - it might be better to source the raw text from the OSRS Wiki, to avoid such intense rate limiting. Would have to compare the raw text coverage between each source to ensure the full data is available. I am also not sure if the OSRS Wiki has all news posts, or just the weekly update news posts. Will have to investigate.

Unfortunately, with the large number of things on the development list it might be a while before I can start looking into this. I have some free time at the moment for some development, but some fine tuning to the item database, and the addition of the quest and monster database will probably take precedence.

@osrsbox osrsbox added data Label to identify updating or adding any data help wanted labels Jan 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data Label to identify updating or adding any data help wanted
Projects
None yet
Development

No branches or pull requests

2 participants