As a reminder, sqlite-html
is still young, so breaking changes should be expected while sqlite-html
is in a pre-v1 stage.
sqlite-html
information- Query HTML elements using CSS selectors
- html_each(document, selector)
- html_extract(document, selector)
- html_text(document, selector)
- html_count(document, selector)
- Safely generating HTML elements
- html(document)
- html_element(tag, attributes, child1, ...)
- HTML attributes
- html_attribute_get(document, selector, attribute)
- html_attribute_has(document, selector, attribute)
- Misc. HTML utilities
- html_escape(text)
- html_unescape(text)
- html_trim(text)
- html_table(document)
A table function with the following schema:
CREATE TABLE html_each(
html TEXT, -- HTML of the extracted element
text TEXT, -- textContent of the HTML element
document TEXT hidden, -- input HTML document
selector TEXT hidden -- input CSS selector
);
The html
column contains the matching element's HTML representation.
The text
column contains the matching element's textContent representation, similar to the JavaScript DOM API's .textContent
or the html_text
function in this library.
sqlite> select * from html_each('<ul>
<li>Alpha</li>
<li>Bravo</li>
<li>Charlie</li>
<li>Delta</li>', 'li')
Extracts the first matching element from document
using the given CSS selector
, and returns the full HTML representation of that element.
select html_extract('<p> Hello, <b class=x>world!</b> </p>', 'b');
-- '<b class="x">world!</b>'
Extracts the first matching element from document
using the given CSS selector
, and returns the text representation of that element, Similar to the Node.textContent
property in the JavaScript DOM API.
Examples:
select html_text('<p> hello <a href="https://google.com">dog</a></a>', 'a');
-- "dog"
For the given document
, count the number of matching elements from selector
and return that number.
select html_count('<div> <p>a</p> <p>b</p> <p>c</p> </div>', 'p');
-- 3
Verifies and "cleans" (quotes attributes) the given document as HTML.
Also sets the return subtype to the HTML magic number, for use in other funcs like html_element to designate something as "HTML".
select html('<a> foo'); -- "<a> foo</a>"
-- returns specific subtype to denote that it's HTML
select subtype('alex'); -- 0
select subtype(html('alex')); -- 221
-- Tag only
select html_element('a'); -- "<a></a>"
select html_element('br'); -- "<br/>"
-- attributes passed in as JSON
select html_element('img', json_object('src', './a.png', 'width', 200)); -- '<img src="./a.png" width="200.000000"/> '
-- Children can be text or HTML
select html_element('p', null, "text node"); -- '<p>text node</p>'
select html_element('p', null, "<b>Still a text node</b>"); -- '<p><b>Still a text node</b></p>'
select html_element('p', null, html_element('b', null, 'Bolded!')); -- '<p><b>Bolded!</b></p>'
select html_element('p', null, html('<b>Also bolded</b>')); -- '<p><b>Also bolded</b></p>'
select html_element('p', null,
"multiple ",
html("<b>children"),
" works ",
html_element("span", null, "just fine")
); -- '<p>multiple <b>children</b>works <span>just fine</span></p>'
Get the value of the "name" attribute from the element found in document, using selector
Alias: html_attr_get
select html_attr_get('<p> <a href="./about"> About<a/> </p>', 'a', 'href'); -- './about'
select html_attr_get('<p> <a href="./about"> About<a/> </p>', 'a', 'rel'); -- NULL
Returns 1 or 0, if the "name" attribute from the element found in document, using selector, exists.
Alias: html_attr_has
select html_attr_has('<p> <a href="./about"> About<a/> </p>', 'a', 'href'); -- 1
select html_attr_has('<p> <a href="./about"> About<a/> </p>', 'a', 'rel'); -- 0
Returns an HTML escaped version of the given content.
select html_escape('<a>');
-- "<a>"
select html_unescape('<a');
-- "<a"
Trims whitespace around contents
. Useful since many results of html_text
will have newlines/spaces that aren't useful.
select html_trim(" asdf ");
-- "asdf"
select html_trim( html_text("<p> empty space </p>", "p") );
-- "empty space"
Prepend the string "<table>"
before contents
.
While seemingly unnecessary, it's useful when dealing with HTML table <tr>
rows. The string <tr>example <a>foo</a> </tr>
isn't valid HTML, so trying to extract "foo"
won't work by itself. So wrapping html_table()
before extracting from a tr
element will allow for sqlite-html
to parse correctly.
select html_table('xyz');
-- "<table>xyz"
-- Try removing the "html_table()" calls and see blank results
select
html_text(html_table(rows.html), 'td:nth-child(1)') as name,
html_text(html_table(rows.html), 'td:nth-child(2)') as age
from html_each('<table>
<tr> <td>Alex</td> <td>1</td> </tr>
<tr> <td>Brian</td> <td>2</td> </tr>
<tr> <td>Craig</td> <td>3</td> </tr>
</table>', 'tr') as rows;
/*
┌───────┬─────┐
│ name │ age │
├───────┼─────┤
│ Alex │ 1 │
│ Brian │ 2 │
│ Craig │ 3 │
└───────┴─────┘
*/
Returns the version string of the sqlite-html
library, modeled after sqlite_version()
.
Examples:
sqlite> select html_version();
v0.0.0
Returns debug information of the sqlite-html
library, including the version string. Subject to change.
Examples:
sqlite> select html_debug();
Version: v0.0.0
Commit: 0cd144a880b47f4a57a5c7f8ceb96eb9dc821508
Runtime: go1.17 darwin/amd64
Date: 2021-11-17T17:06:12Z-0800