-
Notifications
You must be signed in to change notification settings - Fork 237
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
The blingfire sentence tokenizer is only avialable in python right now, there is a "quite easy" option to bring this to typescript via WASM.
to have this written down somewhere, here are the steps I followed to get this running.
- clone blingfire repo
- follow https://github.com/microsoft/BlingFire/blob/master/wasm/readme.md
- change Makefile do run:
em++ ../blingfiretools/blingfiretokdll/blingfiretokdll.cpp ../blingfiretools/blingfiretokdll/*.cxx ../blingfireclient.library/src/*.cpp -s WASM=1 -s EXPORTED_FUNCTIONS="[_GetBlingFireTokVersion, _TextToSentences, _TextToWords, _TextToIds, _SetModel, _FreeModel, _WordHyphenationWithModel, _malloc, _free]" -s "EXPORTED_RUNTIME_METHODS=['lengthBytesUTF8', 'stackAlloc', 'stringToUTF8', 'UTF8ToString', 'cwrap']" -s ALLOW_MEMORY_GROWTH=1 -s DISABLE_EXCEPTION_CATCHING=0 -I ../blingfireclient.library/inc/ -I ../blingfirecompile.library/inc/ -DHAVE_ICONV_LIB -DHAVE_NO_SPECSTRINGS -D_VERBOSE -DBLING_FIRE_NOAP -DBLING_FIRE_NOWINDOWS -DNDEBUG -O3 -s MODULARIZE=1 -s EXPORT_ES6 --std=c++11 -o blingfire.js
(adds-s MODULARIZE=1, -s EXPORT_ES6and fixes malloc/free exports. - copy blingfire.js + blingfire.wasm to livekit :)
- get blingfire_wrapper and adapt how they load the module:
import createModule from './blingfire.js';
const Module = await createModule()- use the module wrapper:
import { TextToSentences } from './blingfire_wrapper.js';
console.log('TextToSentences', TextToSentences('This is a sentence. And another one.'));Relevant log output
No response
Describe your environment
linux
Minimal reproducible example
No response
Additional information
No response
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working