Skip to content
/ phalanx Public

Phalanx is a cloud-native distributed search engine that provides endpoints through gRPC and traditional RESTful API.

License

Notifications You must be signed in to change notification settings

mosuka/phalanx

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

52 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Phalanx

Phalanx is a cloud-native distributed search engine written in Go built on top of Bluge that provides endpoints through gRPC and traditional RESTful API.
Phalanx implements a cluster formation by hashicorp/memberlist and managing index metadata on etcd, so it is easy to bring up a fault-tolerant cluster.
Metrics for system operation can also be output in Prometheus exposition format, so that monitoring can be done immediately using Prometheus.
Phalanx is using object storage for the storage layer, it is only responsible for the computation layer, such as indexing and retrieval processes. Therefore, scaling is easy, and you can simply add new nodes to the cluster.
Currently, it is an alpha version and only supports MinIO as the storage layer, but in the future it will support Amazon S3, Google Cloud Storage, and Azure Blob Storage.

Build

Building Phalanx as following:

% git clone https://github.com/mosuka/phalanx.git
% cd phalanx
% make build

Binary

You can see the binary file when build successful like so:

% ls ./bin
phalanx

Start Phalanx on a local machine using a local file system

Phalanx can be started on a local machine using a local file system as a metastore. The following command starts with a configuration file:

% ./bin/phalanx --index-metastore-uri=file:///tmp/phalanx/metastore

A metastore is a place where various information about an index is stored.

Create index on local file system

If you have started Phalanx to use the local file system, you can use this command to create an index.

% curl -XPUT -H 'Content-type: application/json' http://localhost:8000/v1/indexes/wikipedia_en --data-binary @./examples/create_index_wikipedia_en_local.json

In create_index_example_en_local.json used in the above command, the URI of the local filesystem is specified in index_uri and lock_uri. index_mapping defines what kind of fields the index has. num_shards specifies how many shards the index will be divided into.
Both of the above commands will create an index named example_en.

Start Phalanx on local machine with MinIO and etcd

To experience Phalanx functionality, let's start Phalanx with MinIO and etcd. This repository has a docker-compose.yml file. With it, you can easily launch Phalanx, MinIO and etcd on Docker.

% docker-compose up

Once the container has been started, you can check the MinIO and etcd data in your browser at the following URL.

Create index with MinIO and etcd

If you have started Phalanx to use MinIO and etcd, use this command to create the index.

% curl -XPUT -H 'Content-type: application/json' http://localhost:8000/v1/indexes/example_en --data-binary @./examples/create_index_example_en.json

In the create_index_example_en.json used in the above command, index_uri is a MinIO URI and lock_uri is an etcd URI. This means that indexes will be created in MinIO, and locks for those indexes will be created in etcd. Phalanx uses etcd as a distributed lock manager.

Health check

These endpoints should be used for Phalanx health checks.

Liveness check

If Phalanx is running properly, it will return HTTP status 200.

% curl -XGET http://localhost:8000/livez | jq .
{
  "state":"alive"
}

Readiness check

If Phalanx is ready to accept the traffic, it will return HTTP Status 200.

% curl -XGET http://localhost:8000/readyz | jq .
{
  "state":"ready"
}

But this endpoint is not yet fully implemented.

Metrics exposition

This endpoint returns Phalanx metrics in Prometheus exposition format.

% curl -XGET http://localhost:8000/metrics
# HELP phalanx_grpc_server_handled_total Total number of RPCs completed on the server, regardless of success or failure.
# TYPE phalanx_grpc_server_handled_total counter
phalanx_grpc_server_handled_total{grpc_code="Aborted",grpc_method="AddDocuments",grpc_service="index.Index",grpc_type="unary"} 0
phalanx_grpc_server_handled_total{grpc_code="Aborted",grpc_method="Cluster",grpc_service="index.Index",grpc_type="unary"} 0
...
phalanx_grpc_server_started_total{grpc_method="Metrics",grpc_service="index.Index",grpc_type="unary"} 1
phalanx_grpc_server_started_total{grpc_method="ReadinessCheck",grpc_service="index.Index",grpc_type="unary"} 0
phalanx_grpc_server_started_total{grpc_method="Search",grpc_service="index.Index",grpc_type="unary"} 0

Cluster status

This endpoint returns the latest cluster status.

  • nodes: Lists the nodes that are joining in the cluster.
  • indexes: Lists the indexes served by the cluster.
  • indexer_assignment: Lists which node is responsible for the shard in the index.
  • searcher_assignment: Lists which nodes are responsible for the shard in the index.
% curl -XGET http://localhost:8000/cluster | jq .
{
  "indexer_assignment": {
    "wikipedia_en": {
      "shard-73iAEf8K": "node-duIMwfjn",
      "shard-CRzZVi2b": "node-duIMwfjn",
      "shard-Wh7VO5Lp": "node-duIMwfjn",
      "shard-YazeIhze": "node-duIMwfjn",
      "shard-cXyt4esz": "node-duIMwfjn",
      "shard-hUM3HWQW": "node-duIMwfjn",
      "shard-jH3sTtc7": "node-duIMwfjn",
      "shard-viI2Dm3V": "node-duIMwfjn",
      "shard-y1tMwCEP": "node-duIMwfjn",
      "shard-y7VRCIlU": "node-duIMwfjn"
    }
  },
  "indexes": {
    "wikipedia_en": {
      "index_lock_uri": "",
      "index_uri": "file:///tmp/phalanx/indexes/wikipedia_en",
      "shards": {
        "shard-73iAEf8K": {
          "shard_lock_uri": "",
          "shard_uri": "file:///tmp/phalanx/indexes/wikipedia_en/shard-73iAEf8K"
        },
        "shard-CRzZVi2b": {
          "shard_lock_uri": "",
          "shard_uri": "file:///tmp/phalanx/indexes/wikipedia_en/shard-CRzZVi2b"
        },
        "shard-Wh7VO5Lp": {
          "shard_lock_uri": "",
          "shard_uri": "file:///tmp/phalanx/indexes/wikipedia_en/shard-Wh7VO5Lp"
        },
        "shard-YazeIhze": {
          "shard_lock_uri": "",
          "shard_uri": "file:///tmp/phalanx/indexes/wikipedia_en/shard-YazeIhze"
        },
        "shard-cXyt4esz": {
          "shard_lock_uri": "",
          "shard_uri": "file:///tmp/phalanx/indexes/wikipedia_en/shard-cXyt4esz"
        },
        "shard-hUM3HWQW": {
          "shard_lock_uri": "",
          "shard_uri": "file:///tmp/phalanx/indexes/wikipedia_en/shard-hUM3HWQW"
        },
        "shard-jH3sTtc7": {
          "shard_lock_uri": "",
          "shard_uri": "file:///tmp/phalanx/indexes/wikipedia_en/shard-jH3sTtc7"
        },
        "shard-viI2Dm3V": {
          "shard_lock_uri": "",
          "shard_uri": "file:///tmp/phalanx/indexes/wikipedia_en/shard-viI2Dm3V"
        },
        "shard-y1tMwCEP": {
          "shard_lock_uri": "",
          "shard_uri": "file:///tmp/phalanx/indexes/wikipedia_en/shard-y1tMwCEP"
        },
        "shard-y7VRCIlU": {
          "shard_lock_uri": "",
          "shard_uri": "file:///tmp/phalanx/indexes/wikipedia_en/shard-y7VRCIlU"
        }
      }
    }
  },
  "nodes": {
    "node-duIMwfjn": {
      "addr": "0.0.0.0",
      "meta": {
        "grpc_port": 5000,
        "http_port": 8000,
        "roles": [
          "indexer",
          "searcher"
        ]
      },
      "port": 3000,
      "state": "alive"
    }
  },
  "searcher_assignment": {
    "wikipedia_en": {
      "shard-73iAEf8K": [
        "node-duIMwfjn"
      ],
      "shard-CRzZVi2b": [
        "node-duIMwfjn"
      ],
      "shard-Wh7VO5Lp": [
        "node-duIMwfjn"
      ],
      "shard-YazeIhze": [
        "node-duIMwfjn"
      ],
      "shard-cXyt4esz": [
        "node-duIMwfjn"
      ],
      "shard-hUM3HWQW": [
        "node-duIMwfjn"
      ],
      "shard-jH3sTtc7": [
        "node-duIMwfjn"
      ],
      "shard-viI2Dm3V": [
        "node-duIMwfjn"
      ],
      "shard-y1tMwCEP": [
        "node-duIMwfjn"
      ],
      "shard-y7VRCIlU": [
        "node-duIMwfjn"
      ]
    }
  }
}

Add / Update documents

% ./bin/phalanx_docs.sh -i id ./testdata/enwiki-20211201-pages-articles-multistream-1000.jsonl | curl -XPUT -H 'Content-type: application/x-ndjson' http://localhost:8000/v1/indexes/wikipedia_en/documents --data-binary @-

Delete documents

% jq -r '.id' ./testdata/enwiki-20211201-pages-articles-multistream-1000.jsonl | curl -XDELETE -H 'Content-type: text/plain' http://localhost:8000/v1/indexes/wikipedia_en/documents --data-binary @-

Search

% curl -XPOST -H 'Content-type: text/plain' http://localhost:8000/v1/indexes/wikipedia_en/_search --data-binary @./examples/search_with_aggregation.json | jq .
{
  "aggregations": {
    "text_terms": {
      "also": 56,
      "external": 57,
      "from": 58,
      "its": 56,
      "links": 57,
      "new": 57,
      "one": 57,
      "part": 56,
      "search": 59,
      "were": 57
    }
  },
  "documents": [
    {
      "_id": "1316",
      "_score": 4.16359472994851,
      "_timestamp": "2022-01-05T02:59:20Z",
      "id": 1316,
      "title": "Annales school"
    },
    {
      "_id": "1164",
      "_score": 3.926891595709891,
      "_timestamp": "2022-01-05T02:59:20Z",
      "id": 1164,
      "title": "Artificial intelligence"
    },
    {
      "_id": "1397",
      "_score": 3.518318285824467,
      "_timestamp": "2022-01-05T02:59:20Z",
      "id": 1397,
      "title": "AOL"
    },
    {
      "_id": "775",
      "_score": 3.4539237042117312,
      "_timestamp": "2022-01-05T02:59:20Z",
      "id": 775,
      "title": "Algorithm"
    },
    {
      "_id": "1902",
      "_score": 3.340805165149435,
      "_timestamp": "2022-01-05T02:59:20Z",
      "id": 1902,
      "title": "American Airlines Flight 77"
    }
  ],
  "hits": 59,
  "index_name": "wikipedia_en"
}

Delete index

The following command will delete the index example_en with the specified name. This command will delete the index file on the object storage and the index metadata on the metastore.

% curl -XDELETE http://localhost:8000/v1/indexes/wikipedia_en

Docker container

Build Docker container image

You can build the Docker container image like so:

% make docker-build

Pull Docker container image from docker.io

You can also use the Docker container image already registered in docker.io like so:

% docker pull mosuka/phalanx:latest

See https://hub.docker.com/r/mosuka/phalanx/tags/

Start on Docker

You can run a Phalanx node on Docker as follows:

% docker run --rm --name phalanx-node1 \
    -p 2000:2000 \
    -p 5000:5000 \
    -p 8000:8000 \
    mosuka/phalanx:latest start \
      --host=0.0.0.0 \
      --bind-port=2000 \
      --grpc-port=5000 \
      --http-port=8000 \
      --roles=indexer,searcher \
      --index-metastore-uri=file:///tmp/phalanx/metadata