-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
.stream(cb)
method
#99
Comments
Cool, I like this idea - I'm just not sure how useful this would be. It would produce unexpected results if I tried to run a |
Well, as far as I know, most people are currently using eg. request to 2012/9/12 Matt Mueller notifications@github.com
|
Right, but how would you actually run queries on a half-parsed DOM? The only use case I could see is if you're looking for something specific, ex. |
You misunderstood me: The idea was to parse data while the user is still waiting for the next chunk to arrive. This way, the DOM will be available immediately after the download of the page is complete. Running queries isn't hard, though: I solved it yesterday with |
I've been thinking about this more and more lately. It would be awesome to select queries as they come through. Right now I'm thinking the API could be: var $ = cheerio.stream('http://google.com');
$.on('.logo', function($) {
console.log($.html());
}) @fb55 do you think this is feasible? |
Irrespective of the streaming functionality, it would be great if cheerio provided a way to create a "DOM" from a URL. As @fb55 stated, this is no doubt a very common use case. |
looking back at my example, I kind of think adding URL fetching functionality is a bit leaky (do we then support headers, what kind of request methods, etc). It would be nice to add a streaming interface though, as @fb55 did with cornet. Perhaps more along the lines of: var $ = cheerio.stream();
minreq.get("http://github.com/fb55").pipe($)
$.on(...) |
@matthewmueller First of all, Secondly, cheerio would have to wait until the entire DOM is present, as it calls the method with an array of results (cornet only passes a single element at a time). That would stop people from getting confused, with the benefit of the pauses between IO being used for actual work. Finally, the implementation of this should be pretty straight-forward, probably as complex as cornet (which has 30 LOC). |
Closing in favour of #2051. |
Just as an idea: The parser could do much more when it would actually get a stream of data. This would allow the creation of the DOM while IO is happening, which will speed up initial loading (and more stuff could be done inside of DomHandler).
There is already a
WritableStream.js
file shipped withhtmlparser2
(it's accessible viarequire("htmlparser2").WritableStream
) that pretty much solves all problems. The implementation of the cheerio method could look like this:The text was updated successfully, but these errors were encountered: