-
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add HTML parser initial implementation #1
Conversation
@@ -0,0 +1,179 @@ | |||
# This file is autogenerated by maturin v1.8.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the CI job for publishing the package, as provided by maturin. I have yet to see if this will work. It requires PYPI_API_TOKEN
to be set as secret.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@EmilStenstrom Regarding the PYPI_API_TOKEN
, the question remains how to manage this package on Pypi - e.g. release it under my or under your account?
Either way, this package should have a separate PYPI_API_TOKEN
from the one we use for django-components, for extra security.
## Enforcement | ||
|
||
Instances of abusive, harassing, or otherwise unacceptable behavior may be | ||
reported by contacting the project team at emil@emilstenstrom.se. All |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I copied the project meta files like the code of conduct. So this one contains your contact info @EmilStenstrom. Let me know if it's ok or if I should put mine or other or remove.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's OK.
@@ -0,0 +1,96 @@ | |||
[build-system] | |||
requires = ["maturin>=1.8,<2.0"] | |||
build-backend = "maturin" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a regular pyproject.toml
(based on the one for django-components.
One key difference, though, is that build system is set to maturin.
keywords = ["django", "components", "html"] | ||
readme = "README.md" | ||
authors = [ | ||
{name = "Juro Oravec", email = "juraj.oravec.josefson@gmail.com"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've put me and my email as the author
src/lib.rs
Outdated
/// A Python module implemented in Rust for high-performance HTML transformation. | ||
#[pymodule] | ||
fn djc_core_html_parser(_py: Python, m: &PyModule) -> PyResult<()> { | ||
m.add_function(wrap_pyfunction!(html_parser::transform_html, m)?)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is how, using pyo3
, we register a function within a Python module.
So this means that when one imports the package in Python, they can access set_html_attributes
under:
from djc_core_html_parser import set_html_attributes
@@ -0,0 +1,34 @@ | |||
from typing import List, Dict, Optional | |||
|
|||
def set_html_attributes( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't know if this __init__.pyi
is necessary also for the published package, but at least locally, python language server was not picking up the generated python module in the test file. So I added this stub so test file is correctly typed.
@@ -0,0 +1,148 @@ | |||
# This same set of tests is also found in django-components, to ensure that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This package includes both Rust and Python tests - Rust tests are for development + making sure things work as expected. And these Python tests are to ensure that the generated Python module works as expected too.
#[pyo3( | ||
text_signature = "(html, root_attributes, all_attributes, *, check_end_names=False, watch_on_attribute=None)" | ||
)] | ||
pub fn set_html_attributes( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the entrypoint. I renamed it to set_html_attributes
, so it shares the same name as the entrypoint for the pure python implementation.
} | ||
|
||
/// Add attributes to a HTML start tag (e.g. `<div>`) based on the configuration | ||
fn add_attributes( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is the equivalent to the on_tag
callback defined in the pure python impl,
let mut depth: i32 = 0; | ||
|
||
// Read the HTML event by event | ||
loop { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And here is where we process the HTML as a stream of tokens ("events")
} | ||
|
||
#[cfg(test)] | ||
mod tests { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And in Rust it's standard to write the tests in the same file as the implementation. The #[cfg(test)]
is like a compiler flag that tells Rust compiler that the contents of tests
should 1. Be excluded from the build, and 2. should be run when running cargo test
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! Impressed by this whole thing, well done! :)
## Enforcement | ||
|
||
Instances of abusive, harassing, or otherwise unacceptable behavior may be | ||
reported by contacting the project team at emil@emilstenstrom.se. All |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's OK.
Lol but it's crazy, maturin builds 103 different platform releases. See here. And here's the pipeline if interested. Btw I've used my PyPI account to upload it, and I've set the |
I’ve requested a PyPi org that we can share access to, will get back when we get accepted. Let’s move both projects there |
Here is the Rust implementation of the HTML parser 🎉