-
Notifications
You must be signed in to change notification settings - Fork 10
PyJASS
JASS uses SWIG to generate bindings for languages other than C++. This has been tested for Python as Python was the first language that a request came in for.
There are two approaches to automatically install pyJASS
Anaconda ensures that all of JASS dependices are met (including non-pythonic (CMake, doxygen, SWIG) and Python dependices (wheels)), so that pyJASS just works. In order to install via anaconda, first clone the git repo and then run the following command
conda env create -f env.yml
conda activate pyjass
Alternatively you can install via pip. However, pip does not install non-Python dependices for you (such as CMake and SWIG) and it is your responsibility to ensure that they are installed (it does check and inform you if any of the dependices aren't met)
pip install --user pyjass
Compile the build, it should create pyjass.py
and _pyjass.so
. Both of these files are needed for the Python interface. Simply run python3 setup.py install
and setup-tools will
perform installation by moving the required files to python packages directory.
You can then start Python and import pyjass thus:
python3
import pyjass
Create a JASS anytime object, load an index, and call methods on that object:
index = pyjass.anytime()
index.load_index(2)
get_document_count()
Then the following API methods are available (with prototypes given in C++):
JASS_ERROR load_index(size_t index_version);
JASS_ERROR load_index(size_t index_version, bool verbose);
uint32_t get_document_count(void);
std::string get_encoding_scheme_name(void);
int32_t get_encoding_scheme_d(void);
JASS_ERROR load_oracle_scores(std::string filename);
JASS_ERROR set_postings_to_process_proportion(double percent);
JASS_ERROR set_postings_to_process_proportion_minimum(double percent);
JASS_ERROR set_postings_to_process_relative(double percent);
JASS_ERROR set_postings_to_process(size_t count);
JASS_ERROR set_postings_to_process_minimum(size_t count);
uint32_t get_postings_to_process(void);
uint32_t get_max_top_k(void);
JASS_ERROR set_top_k(size_t k);
uint32_t get_top_k(void);
JASS_ERROR set_accumulator_width(size_t width);
JASS_ERROR use_ascii_parser(void);
JASS_ERROR use_query_parser(void);
JASS_anytime_result search(const std::string &query);
To search, call the search method:
got = index.search("Something")
The result of a search object is a structure containing the following members (given in C++)
class JASS_anytime_result
{
public:
std::string query_id;
std::string query;
std::string results_list;
size_t postings_processed;
size_t search_time_in_ns;
}
Where the results_list is a string in TREC format, with each result separated by a '\n'
JASS_ERROR
is an enum:
enum JASS_ERROR
{
JASS_ERROR_NO_INDEX = -1, ///< The index must be loaded before this operation can occur
JASS_ERROR_OK = 0, ///< Completed successfully without error
JASS_ERROR_BAD_INDEX_VERSION, ///< The index version number specified is not supported
JASS_ERROR_FAIL, ///< An exception occurred - probably not caused by JASS (might be a C++ RTL exception)
JASS_ERROR_TOO_MANY_DOCUMENTS,///< Index cannot be loaded because it contains too many documents
JASS_ERROR_TOO_LARGE, ///< top-k is larger than the system-wide maximum top-k value (or the accumulator width is too large)
};
A full example of a Python program that loads an index (from the current directory), does a search, and prints the results:
import pyjass
index = pyjass.anytime()
ok = index.load_index(2)
print("Compressed using:", index.get_encoding_scheme_name())
print("D-ness:", index.get_encoding_scheme_d())
print("Documents:", index.get_document_count())
print("")
results = index.search("one")
print("ID:", results.query_id)
print("query:", results.query)
print("Postings Processed:", results.postings_processed)
print("Time (ns):", results.search_time_in_ns)
print("Results:")
print(results.results_list)