Skip to content
  • Rate limit · GitHub

    Access has been restricted

    You have triggered a rate limit.

    Please wait a few minutes before you try again;
    in some cases this may take up to an hour.

  • Notifications You must be signed in to change notification settings
  • Fork 0

HTML parser used by django-components written in Rust

License

Notifications You must be signed in to change notification settings

django-components/djc-core-html-parser

Repository files navigation

djc-core-html-parser

HTML parser used by django-components. Written in Rust, exposed as a Python package with maturin.

This implementation was found to be 40-50x faster than our Python implementation, taking ~90ms to parse 5 MB of HTML.

Installation

pip install djc-core-html-parser

Usage

from djc_core_html_parser import set_html_attributes

html = '<div><p>Hello</p></div>'
result, _ = set_html_attributes(
  html,
  # Add attributes to the root elements
  root_attributes=['data-root-id'],
  # Add attributes to all elements
  all_attributes=['data-v-123'],
)

To save ourselves from re-parsing the HTML, set_html_attributes returns not just the transformed HTML, but also a dictionary as the second item.

This dictionary contains a record of which HTML attributes were written to which elemenents.

To populate this dictionary, you need set watch_on_attribute to an attribute name.

Then, during the HTML transformation, we check each element for this attribute. And if the element HAS this attribute, we:

  1. Get the value of said attribute
  2. Record the attributes that were added to the element, using the value of the watched attribute as the key.
from djc_core_html_parser import set_html_attributes

html = """
  <div data-watch-id="123">
    <p data-watch-id="456">
      Hello
    </p>
  </div>
"""

result, captured = set_html_attributes(
  html,
  # Add attributes to the root elements
  root_attributes=['data-root-id'],
  # Add attributes to all elements
  all_attributes=['data-djc-tag'],
  # Watch for this attribute on elements
  watch_on_attribute='data-watch-id',
)

print(captured)
# {
#   '123': ['data-root-id', 'data-djc-tag'],
#   '456': ['data-djc-tag'],
# }

Development

  1. Setup python env

    python -m venv .venv
  2. Install dependencies

    pip install -r requirements-dev.txt

    The dev requirements also include maturin which is used packaging a Rust project as Python package.

  3. Install Rust

    See https://www.rust-lang.org/tools/install

  4. Run Rust tests

    cargo test
  5. Build the Python package

    maturin develop

    To build the production-optimized package, use maturin develop --release.

  6. Run Python tests

    pytest

    NOTE: When running Python tests, you need to run maturin develop first.

Deployment

Deployment is done automatically via GitHub Actions.

To publish a new version of the package, you need to:

  1. Bump the version in pyproject.toml and Cargo.toml
  2. Open a PR and merge it to main.
  3. Create a new tag on the main branch with the new version number (e.g. v1.0.0), or create a new release in the GitHub UI.

About

HTML parser used by django-components written in Rust

Resources

License

Code of conduct

Rate limit · GitHub

Access has been restricted

You have triggered a rate limit.

Please wait a few minutes before you try again;
in some cases this may take up to an hour.

Stars

Watchers

Forks

Sponsor this project

Rate limit · GitHub

Access has been restricted

You have triggered a rate limit.

Please wait a few minutes before you try again;
in some cases this may take up to an hour.

Packages

No packages published