Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Html extractor #2072

Open
happysalada opened this issue Nov 14, 2022 · 3 comments
Open

Html extractor #2072

happysalada opened this issue Nov 14, 2022 · 3 comments
Labels
enhancement New feature or request

Comments

@happysalada
Copy link

Describe the problem you are trying to solve
Exctract data from an html page. Lots of older sites with valuabke data dont have an api. Extracting html with a regex is possible but very inconvenient

Describe the solution you'd like
An html extractor whete you would have an api similat yo css selectors

Notes

If this is an implementation of an RFC provide a URL
to the RFC this enhancement implements.

If this is a major enhancement or contribution an RFC may be required. It is ok to submit an enhancement
first and our core team will assist with major contributions. In general, major contributions should be
discussed with the community before submission.

@happysalada happysalada added the enhancement New feature or request label Nov 14, 2022
@Licenser
Copy link
Member

This is quite an interesting idea, I like it! It goes a bit further and might be worth a RFC as there are some extra things to consider. When we have an HTML extractor, we will need a structural representation of the data once it's extracted. That leads to an HTML codec that both decodes HTML into this structure and encodes this structure into an HTML page (which could be super cool to be honest).

@happysalada how do you feel about throwing an RFC up on the topic?

@happysalada
Copy link
Author

Let me try to carve some time for this.

@Licenser
Copy link
Member

Awesome, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants