Skip to content

Commit

Permalink
REST API branch (microsoft#160)
Browse files Browse the repository at this point in the history
Code for server that hosts a built index for search via REST API. Tested on Ubuntu. Windows is not well supported.

* Brute force initial commit, still missing some important details in our readme

* Working build, though it's unclear if it continues to work as a service.  I'm going to rebase over main, ensure we have a working build, and start working on some testing.

* WIP: Compilation works but segfault occurs when handling post request

* WIP: Still not working

* Still not working, but now that I have a working debugger I backed out all the debug statements

* Committing a more configurable test scaffold for easier debugging

* Adding some files to attempt to get a windows build working.  It does not work as of this commit.

* fixing the cleanup when running only the client.

* removing commented code

* Moving the reader object

It needs to persist for the life of the webserver not just the initialization.

* fixed unused return warning

* reverting a change that didn't work on windows

* adding a versbose setting to debug a CI build

* testing build without formatting code.

* removing a glob for a non-existant path

* initializing variable at declaration

* removing debug output from build

* fixing a typo

* apply language formatting to the restapi code

* adding more tests for the web services

The tests are in client python code.

* fixing is_ascending check

* simplifying cmake min requirement

* adding newline to the end of the file.

* removing commented out code

* testing out a versioning issue

* testing an install dependency fix

* Moving const value to the common header file

Also removing some dead comments.

* removing commented out code

* adding boost program options

Also changing the wrapper to take std::string instead of char*.

* adding boost program options to in memory sever

* adding boost program args to multiserver

Also fixed an issue with the help interpreter.

* updating to use the current command line

* Updating command line instructions for boost args

* fixing typo in docs

* adding boost program args to the client test

* adding copyright lines

* adding copyright lines

* exposing the distance metrics

* ensure that k < Ls

* testing a push with the python tests

* adding an env variable

* adding flad to build RESTAPI for non-windows

* adding release build flag

* Adding the cpprest-dev dependency to install list

* renaming tests/python/tests to tests/python/restapi

* preparing to add python tests

The tests are not turned on yet.

* removing python test running from the CI build

It was failing because the server could not start. We will add it back at
somepoint, but for now it will be a developer responsibility to check.

* I removed one too many things

I still need the cpprest library at link time.

* fixing data_type issue

Co-authored-by: Dax Pryce <daxpryce@microsoft.com>
Co-authored-by: Bryan Tower <brtower@microsoft.com>
  • Loading branch information
3 people authored Dec 23, 2022
1 parent 8a54127 commit 6690b52
Show file tree
Hide file tree
Showing 26 changed files with 1,746 additions and 21 deletions.
3 changes: 1 addition & 2 deletions .github/workflows/pr-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,7 @@ jobs:
uses: actions/checkout@v2
with:
submodules: true

- name: Install deps
- name: Install dependencies
if: runner.os != 'Windows'
run: |
if [ "${{ matrix.os }}" != "ubuntu-18.04" ]; then
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/push-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,10 @@ jobs:
uses: actions/checkout@v2
- name: Install deps
run: |
sudo apt install cmake g++ libaio-dev libgoogle-perftools-dev libunwind-dev clang-format libboost-dev libboost-program-options-dev libmkl-full-dev
sudo apt install cmake g++ libaio-dev libgoogle-perftools-dev libunwind-dev clang-format libboost-dev libboost-program-options-dev libmkl-full-dev libcpprest-dev
- name: Clang Format Check
run: |
mkdir build && cd build && cmake ..
mkdir build && cd build && cmake -DRESTAPI=True -DCMAKE_BUILD_TYPE=Release ..
make checkformat
- name: build
run: |
Expand Down
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -357,3 +357,7 @@ MigrationBackup/
cscope*

build/
.idea/
cmake-build-debug/

tests/python/venv
16 changes: 15 additions & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,11 @@ if (MSVC)
message(STATUS "Invoking nuget to download Boost, OpenMP and MKL dependencies...")
configure_file(${PROJECT_SOURCE_DIR}/windows/packages.config.in ${DISKANN_MSVC_PACKAGES_CONFIG})
exec_program(${NUGET_EXE} ARGS install \"${DISKANN_MSVC_PACKAGES_CONFIG}\" -ExcludeVersion -OutputDirectory \"${DISKANN_MSVC_PACKAGES}\")
if (RESTAPI)
set(DISKANN_MSVC_RESTAPI_PACKAGES_CONFIG ${CMAKE_BINARY_DIR}/restapi/packages.config)
configure_file(${PROJECT_SOURCE_DIR}/windows/packages_restapi.config.in ${DISKANN_MSVC_RESTAPI_PACKAGES_CONFIG})
exec_program(${NUGET_EXE} ARGS install \"${DISKANN_MSVC_RESTAPI_PACKAGES_CONFIG}\" -ExcludeVersion -OutputDirectory \"${DISKANN_MSVC_PACKAGES}\")
endif()
message(STATUS "Finished setting up nuget dependencies")
endif()

Expand Down Expand Up @@ -243,5 +248,14 @@ if (MSVC)
"msbuild.exe ${PROJECT_NAME}.sln /m /nologo /t:Build /p:Configuration=\"Release\" /property:Platform=\"x64\"\n")
endif()

include(clang-format.cmake)
if (RESTAPI)
if (MSVC)
set(DISKANN_CPPRESTSDK "${DISKANN_MSVC_PACKAGES}/cpprestsdk.v142/build/native")
# expected path for apt packaged intel mkl installs
link_libraries("${DISKANN_CPPRESTSDK}/x64/lib/cpprest142_2_10.lib")
include_directories("${DISKANN_CPPRESTSDK}/include")
endif()
add_subdirectory(tests/restapi)
endif()

include(clang-format.cmake)
19 changes: 19 additions & 0 deletions include/restapi/common.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
// Copyright (c) Microsoft Corporation. All rights reserved.
// Licensed under the MIT license.

#pragma once

#include <cpprest/base_uri.h>
#include <restapi/search_wrapper.h>

namespace diskann {
// Constants
static const std::string VECTOR_KEY = "query", K_KEY = "k",
INDICES_KEY = "indices", DISTANCES_KEY = "distances",
TAGS_KEY = "tags", QUERY_ID_KEY = "query_id",
ERROR_MESSAGE_KEY = "error", L_KEY = "Ls",
TIME_TAKEN_KEY = "time_taken_in_us",
PARTITION_KEY = "partition", UNKNOWN_ERROR = "unknown_error";
const unsigned int DEFAULT_L = 100;

} // namespace diskann
138 changes: 138 additions & 0 deletions include/restapi/search_wrapper.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
// Copyright (c) Microsoft Corporation. All rights reserved.
// Licensed under the MIT license.

#pragma once

#include <string>
#include <vector>
#include <stdexcept>

#include <index.h>
#include <pq_flash_index.h>

namespace diskann {
class SearchResult {
public:
SearchResult(unsigned int K, unsigned int elapsed_time_in_ms,
const unsigned* const indices, const float* const distances,
const std::string* const tags = nullptr,
const unsigned* const partitions = nullptr);

const std::vector<unsigned int>& get_indices() const {
return _indices;
}
const std::vector<float>& get_distances() const {
return _distances;
}
bool tags_enabled() const {
return _tags_enabled;
}
const std::vector<std::string>& get_tags() const {
return _tags;
}
bool partitions_enabled() const {
return _partitions_enabled;
}
const std::vector<unsigned>& get_partitions() const {
return _partitions;
}
unsigned get_time() const {
return _search_time_in_ms;
}

private:
unsigned int _K;
unsigned int _search_time_in_ms;
std::vector<unsigned int> _indices;
std::vector<float> _distances;

bool _tags_enabled;
std::vector<std::string> _tags;

bool _partitions_enabled;
std::vector<unsigned> _partitions;
};

class SearchNotImplementedException : public std::logic_error {
private:
std::string _errormsg;

public:
SearchNotImplementedException(const char* type)
: std::logic_error("Not Implemented") {
_errormsg = "Search with data type ";
_errormsg += std::string(type);
_errormsg += " not implemented : ";
_errormsg += __FUNCTION__;
}

virtual const char* what() const throw() {
return _errormsg.c_str();
}
};

class BaseSearch {
public:
BaseSearch(const std::string& tagsFile = nullptr);
virtual SearchResult search(const float* query,
const unsigned int dimensions,
const unsigned int K, const unsigned int Ls) {
throw SearchNotImplementedException("float");
}
virtual SearchResult search(const int8_t* query,
const unsigned int dimensions,
const unsigned int K, const unsigned int Ls) {
throw SearchNotImplementedException("int8_t");
}

virtual SearchResult search(const uint8_t* query,
const unsigned int dimensions,
const unsigned int K, const unsigned int Ls) {
throw SearchNotImplementedException("uint8_t");
}

void lookup_tags(const unsigned K, const unsigned* indices,
std::string* ret_tags);

protected:
bool _tags_enabled;
std::vector<std::string> _tags_str;
};

template<typename T>
class InMemorySearch : public BaseSearch {
public:
InMemorySearch(
const std::string& baseFile,
const std::string& indexFile,
const std::string& tagsFile,
Metric m,
uint32_t num_threads,
uint32_t search_l
);
virtual ~InMemorySearch();

SearchResult search(const T* query, const unsigned int dimensions,
const unsigned int K, const unsigned int Ls);

private:
unsigned int _dimensions, _numPoints;
std::unique_ptr<diskann::Index<T>> _index;
};

template<typename T>
class PQFlashSearch : public BaseSearch {
public:
PQFlashSearch(const std::string & indexPrefix, const unsigned num_nodes_to_cache,
const unsigned num_threads, const std::string& tagsFile, Metric m);
virtual ~PQFlashSearch();

SearchResult search(const T* query, const unsigned int dimensions,
const unsigned int K, const unsigned int Ls);

private:
unsigned int _dimensions, _numPoints;
std::unique_ptr<diskann::PQFlashIndex<T>> _index;
std::shared_ptr<AlignedFileReader> reader;
};
} // namespace diskann
48 changes: 48 additions & 0 deletions include/restapi/server.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
// Copyright (c) Microsoft Corporation. All rights reserved.
// Licensed under the MIT license.

#pragma once

#include <restapi/common.h>
#include <cpprest/http_listener.h>

namespace diskann {
class Server {
public:
Server(web::uri& url,
std::vector<std::unique_ptr<diskann::BaseSearch>>& multi_searcher,
const std::string& typestring);
virtual ~Server();

pplx::task<void> open();
pplx::task<void> close();

protected:
template<class T>
void handle_post(web::http::http_request message);

template<typename T>
web::json::value toJsonArray(
const std::vector<T>& v,
std::function<web::json::value(const T&)> valConverter);
web::json::value prepareResponse(const int64_t& queryId, const int k);

template<class T>
void parseJson(const utility::string_t& body, int& k, int64_t& queryId,
T*& queryVector, unsigned int& dimensions, unsigned& Ls);

web::json::value idsToJsonArray(const diskann::SearchResult& result);
web::json::value distancesToJsonArray(const diskann::SearchResult& result);
web::json::value tagsToJsonArray(const diskann::SearchResult& result);
web::json::value partitionsToJsonArray(const diskann::SearchResult& result);

SearchResult aggregate_results(
const unsigned K, const std::vector<diskann::SearchResult>& results);

private:
bool _isDebug;
std::unique_ptr<web::http::experimental::listener::http_listener> _listener;
const bool _multi_search;
std::vector<std::unique_ptr<diskann::BaseSearch>> _multi_searcher;
};
} // namespace diskann
10 changes: 4 additions & 6 deletions include/utils.h
Original file line number Diff line number Diff line change
Expand Up @@ -596,12 +596,10 @@ namespace diskann {
#else
strerror_r(errno, buff, 1024);
#endif
diskann::cerr << std::string("Failed to open file") + filename +
" for write because " + buff
<< std::endl;
throw diskann::ANNException(std::string("Failed to open file ") +
filename + " for write because: " + buff,
-1);
std::string error_message = std::string("Failed to open file") + filename +
" for write because " + buff;
diskann::cerr << error_message << std::endl;
throw diskann::ANNException(error_message, -1);
}
}

Expand Down
5 changes: 4 additions & 1 deletion src/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,10 @@ else()
linux_aligned_file_reader.cpp math_utils.cpp natural_number_map.cpp
natural_number_set.cpp memory_mapper.cpp partition.cpp pq.cpp
pq_flash_index.cpp scratch.cpp logger.cpp utils.cpp)
if (RESTAPI)
list(APPEND CPP_SOURCES restapi/search_wrapper.cpp restapi/server.cpp)
endif()
add_library(${PROJECT_NAME} ${CPP_SOURCES})
add_library(${PROJECT_NAME}_s STATIC ${CPP_SOURCES})
endif()
install()
install()
4 changes: 0 additions & 4 deletions src/linux_aligned_file_reader.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -196,9 +196,5 @@ void LinuxAlignedFileReader::read(std::vector<AlignedRead> &read_reqs,
diskann::cout << "Async currently not supported in linux." << std::endl;
}
assert(this->file_desc != -1);
//#pragma omp critical
// std::cout << "thread: " << std::this_thread::get_id() << ", crtx: " <<
// ctx
//<< "\n";
execute_io(ctx, this->file_desc, read_reqs);
}
10 changes: 5 additions & 5 deletions src/pq_flash_index.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -57,14 +57,14 @@ namespace diskann {

template<typename T>
PQFlashIndex<T>::PQFlashIndex(std::shared_ptr<AlignedFileReader> &fileReader,
diskann::Metric m)
diskann::Metric m)
: reader(fileReader), metric(m) {
if (m == diskann::Metric::COSINE || m == diskann::Metric::INNER_PRODUCT) {
if (std::is_floating_point<T>::value) {
diskann::cout << "Cosine metric chosen for (normalized) float data."
"Changing distance to L2 to boost accuracy."
<< std::endl;
m = diskann::Metric::L2;
metric = diskann::Metric::L2;
} else {
diskann::cerr << "WARNING: Cannot normalize integral data types."
<< " This may result in erroneous results or poor recall."
Expand All @@ -73,8 +73,8 @@ namespace diskann {
}
}

this->dist_cmp.reset(diskann::get_distance_function<T>(m));
this->dist_cmp_float.reset(diskann::get_distance_function<float>(m));
this->dist_cmp.reset(diskann::get_distance_function<T>(metric));
this->dist_cmp_float.reset(diskann::get_distance_function<float>(metric));
}

template<typename T>
Expand Down Expand Up @@ -466,7 +466,7 @@ namespace diskann {
this->disk_index_file = disk_index_file;

if (pq_file_num_centroids != 256) {
diskann::cout << "Error. Number of PQ centroids is not 256. Exitting."
diskann::cout << "Error. Number of PQ centroids is not 256. Exiting."
<< std::endl;
return -1;
}
Expand Down
Loading

0 comments on commit 6690b52

Please sign in to comment.