ServiceNow completed its acquisition of Element AI on January 8, 2021. All references to Element AI in the materials that are part of this project should refer to ServiceNow.
A fast classification and tagging tool using byte-level n-gram embeddings
Please read our [paper] for details on byteSteady.
@article{zhang2021bytesteady,
title={byteSteady: Fast Classification Using Byte-Level n-Gram Embeddings},
author={Zhang, Xiang and Drouin, Alexandre and Li, Raymond},
journal={arXiv preprint arXiv:2106.13302},
year={2021}
}
- GNU/Linux (
byteswap.h
from glibc used in CityHash) - C++17 compiler (
::std::variant
,::std::filesystem
, etc) - Thunder (tensor math)
- Google googletest (unit tests)
- Google gflags (command-line option parsing)
- Google glog (logging and error handling)
Make sure all the dependencies are installed, and then simply make
.
There will be a few outputs:
bytesteady/bytesteady
: the byteSteady executablebytesteady/libbytesteady.so
: the byteSteady dynamic librarybytesteady/*_test
: unit tests of different modules for byteSteady
byteSteady is built with Google gflags to support command-line flag parsing. The definition of all available flags can be found in bytesteady/flags.cpp
. You can also query these flags by
$ bytesteady/bytesteady -helpon bytesteady/flags
The -helpon
is provided by Google gflags to show help for flags only defined in some source code file. For full help information, including flags from the other parts of the program (such as Google glog), simply use -help
.
The gene classification dataset used in the paper can be downloaded at https://zenodo.org/record/5181235#.YWc4bG3MJb8.