Html extractor #2072

happysalada · 2022-11-14T15:09:27Z

Describe the problem you are trying to solve
Exctract data from an html page. Lots of older sites with valuabke data dont have an api. Extracting html with a regex is possible but very inconvenient

Describe the solution you'd like
An html extractor whete you would have an api similat yo css selectors

Notes

If this is an implementation of an RFC provide a URL
to the RFC this enhancement implements.

If this is a major enhancement or contribution an RFC may be required. It is ok to submit an enhancement
first and our core team will assist with major contributions. In general, major contributions should be
discussed with the community before submission.

Licenser · 2022-11-15T09:48:19Z

This is quite an interesting idea, I like it! It goes a bit further and might be worth a RFC as there are some extra things to consider. When we have an HTML extractor, we will need a structural representation of the data once it's extracted. That leads to an HTML codec that both decodes HTML into this structure and encodes this structure into an HTML page (which could be super cool to be honest).

@happysalada how do you feel about throwing an RFC up on the topic?

happysalada · 2022-11-15T12:20:26Z

Let me try to carve some time for this.

Licenser · 2022-11-15T13:53:12Z

Awesome, thanks!

happysalada added the enhancement New feature or request label Nov 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Html extractor #2072

Html extractor #2072

happysalada commented Nov 14, 2022

Licenser commented Nov 15, 2022

happysalada commented Nov 15, 2022

Licenser commented Nov 15, 2022

Html extractor #2072

Html extractor #2072

Comments

happysalada commented Nov 14, 2022

Licenser commented Nov 15, 2022

happysalada commented Nov 15, 2022

Licenser commented Nov 15, 2022