diff --git a/README.md b/README.md index 21aa907..52fe84c 100644 --- a/README.md +++ b/README.md @@ -2,7 +2,34 @@ Bindings for lexborisov's [myhtml](https://github.com/lexborisov/myhtml). +## Thoughts + +I need to a fast html-parsing library in Erlang/Elixir. +So falling back to c, and to myhtml especially, is a natural move. + +But Erlang interoperability is a tricky mine-field. +This increase in parsing speed does not come for free. + +The current implementation can be considered a proof-of-concept. +The myhtml code is called as a dirty-nif and executed **inside the Erlang-VM**. +Thus completely giving up the safety of the Erlang-VM. I am not saying that myhtml is unsafe, but +the slightest Segfault brings down the whole Erlang-VM. +So, I consider this mode of operation unsafe, and **not recommended for production use**. + +The other option, that I have on my roadmap, is to call into a C-Node. +A separate OS-process that receives calls from erlang and returns to the calling process. + +So to recap, I want a **fast** and **safe** html-parsing library for Erlang/Elixir. + +Not quite there, yet. + ## Status Currently under development. +* [x] Parse a HTML-document into a tree +* [ ] Expose node-retrieval functions +* [ ] Investigate safety and calling options + * [x] Call as dirty-nif + * [ ] Call as C-Node + diff --git a/src/myhtmlex.c b/src/myhtmlex.c index ea0ee14..fef2a8c 100644 --- a/src/myhtmlex.c +++ b/src/myhtmlex.c @@ -354,9 +354,9 @@ unload(ErlNifEnv *env, void *priv) static ErlNifFunc funcs[] = { - {"decode", 1, nif_decode}, - {"open", 1, nif_open}, - {"decode_tree", 1, nif_decode_tree} + {"decode", 1, nif_decode, ERL_NIF_DIRTY_JOB_CPU_BOUND}, + {"open", 1, nif_open, ERL_NIF_DIRTY_JOB_CPU_BOUND}, + {"decode_tree", 1, nif_decode_tree, ERL_NIF_DIRTY_JOB_CPU_BOUND} }; ERL_NIF_INIT(Elixir.Myhtmlex.Decoder, funcs, &load, &reload, &upgrade, &unload)