Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add HTML parser initial implementation #1

Merged
merged 6 commits into from
Jan 24, 2025
Merged

Conversation

JuroOravec
Copy link
Contributor

@JuroOravec JuroOravec commented Jan 23, 2025

Here is the Rust implementation of the HTML parser 🎉

@@ -0,0 +1,179 @@
# This file is autogenerated by maturin v1.8.1
Copy link
Contributor Author

@JuroOravec JuroOravec Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the CI job for publishing the package, as provided by maturin. I have yet to see if this will work. It requires PYPI_API_TOKEN to be set as secret.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@EmilStenstrom Regarding the PYPI_API_TOKEN, the question remains how to manage this package on Pypi - e.g. release it under my or under your account?

Either way, this package should have a separate PYPI_API_TOKEN from the one we use for django-components, for extra security.

## Enforcement

Instances of abusive, harassing, or otherwise unacceptable behavior may be
reported by contacting the project team at emil@emilstenstrom.se. All
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied the project meta files like the code of conduct. So this one contains your contact info @EmilStenstrom. Let me know if it's ok or if I should put mine or other or remove.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's OK.

@@ -0,0 +1,96 @@
[build-system]
requires = ["maturin>=1.8,<2.0"]
build-backend = "maturin"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a regular pyproject.toml (based on the one for django-components.

One key difference, though, is that build system is set to maturin.

keywords = ["django", "components", "html"]
readme = "README.md"
authors = [
{name = "Juro Oravec", email = "juraj.oravec.josefson@gmail.com"},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've put me and my email as the author

src/lib.rs Outdated
/// A Python module implemented in Rust for high-performance HTML transformation.
#[pymodule]
fn djc_core_html_parser(_py: Python, m: &PyModule) -> PyResult<()> {
m.add_function(wrap_pyfunction!(html_parser::transform_html, m)?)?;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how, using pyo3, we register a function within a Python module.

So this means that when one imports the package in Python, they can access set_html_attributes under:

from djc_core_html_parser import set_html_attributes 

@@ -0,0 +1,34 @@
from typing import List, Dict, Optional

def set_html_attributes(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't know if this __init__.pyi is necessary also for the published package, but at least locally, python language server was not picking up the generated python module in the test file. So I added this stub so test file is correctly typed.

@@ -0,0 +1,148 @@
# This same set of tests is also found in django-components, to ensure that
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This package includes both Rust and Python tests - Rust tests are for development + making sure things work as expected. And these Python tests are to ensure that the generated Python module works as expected too.

#[pyo3(
text_signature = "(html, root_attributes, all_attributes, *, check_end_names=False, watch_on_attribute=None)"
)]
pub fn set_html_attributes(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the entrypoint. I renamed it to set_html_attributes, so it shares the same name as the entrypoint for the pure python implementation.

}

/// Add attributes to a HTML start tag (e.g. `<div>`) based on the configuration
fn add_attributes(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is the equivalent to the on_tag callback defined in the pure python impl,

let mut depth: i32 = 0;

// Read the HTML event by event
loop {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here is where we process the HTML as a stream of tokens ("events")

}

#[cfg(test)]
mod tests {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And in Rust it's standard to write the tests in the same file as the implementation. The #[cfg(test)] is like a compiler flag that tells Rust compiler that the contents of tests should 1. Be excluded from the build, and 2. should be run when running cargo test.

@JuroOravec JuroOravec marked this pull request as ready for review January 24, 2025 09:54
Copy link

@EmilStenstrom EmilStenstrom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Impressed by this whole thing, well done! :)

## Enforcement

Instances of abusive, harassing, or otherwise unacceptable behavior may be
reported by contacting the project team at emil@emilstenstrom.se. All

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's OK.

@JuroOravec JuroOravec merged commit 42e41c5 into main Jan 24, 2025
12 checks passed
@JuroOravec JuroOravec deleted the jo-feat-html-parser branch January 24, 2025 22:02
@JuroOravec
Copy link
Contributor Author

@EmilStenstromIt lives!

Lol but it's crazy, maturin builds 103 different platform releases. See here.

And here's the pipeline if interested.

Btw I've used my PyPI account to upload it, and I've set the PYPI_API_TOKEN secret for this repo.

@EmilStenstrom
Copy link

I’ve requested a PyPi org that we can share access to, will get back when we get accepted. Let’s move both projects there

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants