Skip to content

Commit

Permalink
Add docs (#61)
Browse files Browse the repository at this point in the history
  • Loading branch information
mosuka authored Jan 7, 2022
1 parent 9c5039a commit 61dd627
Show file tree
Hide file tree
Showing 276 changed files with 53,940 additions and 331 deletions.
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -13,3 +13,7 @@ dist/

cover.out
cover.html

# gitbook
_book/
node_modules/
16 changes: 12 additions & 4 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ fmt: show-env
$(GO) fmt $(PACKAGES)

.PHONY: mock
mock: show-env
mock:
@echo ">> generating mocks"
mockgen -source=./metastore/storage.go -destination=./mock/metastore/storage.go

Expand All @@ -76,7 +76,7 @@ test: show-env
$(GO) test -v -tags="$(BUILD_TAGS)" $(PACKAGES)

.PHONY: clean
clean: show-env
clean:
@echo ">> cleaning repository"
$(GO) clean

Expand All @@ -85,6 +85,14 @@ build: show-env
@echo ">> building binaries"
$(GO) build -tags="$(BUILD_TAGS)" $(LDFLAGS) -o bin/phalanx

.PHONY: docs
docs:
@echo ">> building document"
gitbook install
gitbook build
rm -rf docs
mv _book docs

.PHONY: tag
tag: show-env
@echo ">> tagging github"
Expand All @@ -108,10 +116,10 @@ docker-push: show-env
docker push $(DOCKER_REPOSITORY)/phalanx:$(VERSION)

.PHONY: docker-clean
docker-clean: show-env
docker-clean:
docker rmi -f $(shell docker images --filter "dangling=true" -q --no-trunc)

.PHONY: cert
cert: show-env
cert:
@echo ">> generating certification"
openssl req -x509 -nodes -newkey rsa:4096 -keyout ./examples/phalanx-key.pem -out ./examples/phalanx-cert.pem -days 365 -subj '/CN=localhost'
328 changes: 1 addition & 327 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Metrics for system operation can also be output in Prometheus exposition format,
Phalanx is using object storage for the storage layer, it is only responsible for the computation layer, such as indexing and retrieval processes. Therefore, scaling is easy, and you can simply add new nodes to the cluster.
Currently, it is an alpha version and only supports [MinIO](https://min.io/) as the storage layer, but in the future it will support [Amazon S3](https://aws.amazon.com/s3/), [Google Cloud Storage](https://cloud.google.com/storage), and [Azure Blob Storage](https://azure.microsoft.com/en-us/services/storage/blobs/).

![phalanx_architecture](https://user-images.githubusercontent.com/970948/148490562-71283c1d-1f33-42c9-9aac-0ed0a09c29b8.png)
![phalanx_architecture](./docs_md/architecture.png)


## Build
Expand All @@ -30,332 +30,6 @@ phalanx
```


## Start Phalanx on a local machine using a local file system

Phalanx can be started on a local machine using a local file system as a metastore. The following command starts with a configuration file:

```
% ./bin/phalanx --index-metastore-uri=file:///tmp/phalanx/metastore
```

A metastore is a place where various information about an index is stored.

### Create index on local file system

If you have started Phalanx to use the local file system, you can use this command to create an index.

```
% curl -XPUT -H 'Content-type: application/json' http://localhost:8000/v1/indexes/wikipedia_en --data-binary @./examples/create_index_wikipedia_en_local.json
```

In `create_index_wikipedia_en_local.json` used in the above command, the URI of the local filesystem is specified in `index_uri`.
`index_mapping` defines what kind of fields the index has. `num_shards` specifies how many shards the index will be divided into.
Both of the above commands will create an index named `wikipedia_en`.


## Start Phalanx on local machine with MinIO and etcd

To experience Phalanx functionality, let's start Phalanx with MinIO and etcd.
This repository has a docker-compose.yml file. With it, you can easily launch Phalanx, MinIO and etcd on Docker.

```
% docker-compose up
```

Once the container has been started, you can check the MinIO and etcd data in your browser at the following URL.

- MinIO
http://localhost:9001/dashboard

- ETCD Keeper
http://localhost:8080/etcdkeeper/

### Create index with MinIO and etcd

If you have started Phalanx to use MinIO and etcd, use this command to create the index.

```
% curl -XPUT -H 'Content-type: application/json' http://localhost:8000/v1/indexes/wikipedia_en --data-binary @./examples/create_index_wikipedia_en.json
```

In the `create_index_wikipedia_en.json` used in the above command, `index_uri` is a MinIO URI and `lock_uri` is an etcd URI. This means that indexes will be created in MinIO, and locks for those indexes will be created in etcd. Phalanx uses etcd as a distributed lock manager.


## Health check

These endpoints should be used for Phalanx health checks.

### Liveness check

If Phalanx is running properly, it will return HTTP status 200.

```
% curl -XGET http://localhost:8000/livez | jq .
```

```json
{
"state":"alive"
}
```

### Readiness check

If Phalanx is ready to accept the traffic, it will return HTTP Status 200.

```
% curl -XGET http://localhost:8000/readyz | jq .
```

```json
{
"state":"ready"
}
```

But this endpoint is not yet fully implemented.


## Metrics exposition

This endpoint returns Phalanx metrics in Prometheus exposition format.

```
% curl -XGET http://localhost:8000/metrics
```

```text
# HELP phalanx_grpc_server_handled_total Total number of RPCs completed on the server, regardless of success or failure.
# TYPE phalanx_grpc_server_handled_total counter
phalanx_grpc_server_handled_total{grpc_code="Aborted",grpc_method="AddDocuments",grpc_service="index.Index",grpc_type="unary"} 0
phalanx_grpc_server_handled_total{grpc_code="Aborted",grpc_method="Cluster",grpc_service="index.Index",grpc_type="unary"} 0
...
phalanx_grpc_server_started_total{grpc_method="Metrics",grpc_service="index.Index",grpc_type="unary"} 1
phalanx_grpc_server_started_total{grpc_method="ReadinessCheck",grpc_service="index.Index",grpc_type="unary"} 0
phalanx_grpc_server_started_total{grpc_method="Search",grpc_service="index.Index",grpc_type="unary"} 0
```


## Cluster status

This endpoint returns the latest cluster status.
- `nodes`: Lists the nodes that are joining in the cluster.
- `indexes`: Lists the indexes served by the cluster.
- `indexer_assignment`: Lists which node is responsible for the shard in the index.
- `searcher_assignment`: Lists which nodes are responsible for the shard in the index.

```
% curl -XGET http://localhost:8000/cluster | jq .
```

```json
{
"indexer_assignment": {
"wikipedia_en": {
"shard-73iAEf8K": "node-duIMwfjn",
"shard-CRzZVi2b": "node-duIMwfjn",
"shard-Wh7VO5Lp": "node-duIMwfjn",
"shard-YazeIhze": "node-duIMwfjn",
"shard-cXyt4esz": "node-duIMwfjn",
"shard-hUM3HWQW": "node-duIMwfjn",
"shard-jH3sTtc7": "node-duIMwfjn",
"shard-viI2Dm3V": "node-duIMwfjn",
"shard-y1tMwCEP": "node-duIMwfjn",
"shard-y7VRCIlU": "node-duIMwfjn"
}
},
"indexes": {
"wikipedia_en": {
"index_lock_uri": "",
"index_uri": "file:///tmp/phalanx/indexes/wikipedia_en",
"shards": {
"shard-73iAEf8K": {
"shard_lock_uri": "",
"shard_uri": "file:///tmp/phalanx/indexes/wikipedia_en/shard-73iAEf8K"
},
"shard-CRzZVi2b": {
"shard_lock_uri": "",
"shard_uri": "file:///tmp/phalanx/indexes/wikipedia_en/shard-CRzZVi2b"
},
"shard-Wh7VO5Lp": {
"shard_lock_uri": "",
"shard_uri": "file:///tmp/phalanx/indexes/wikipedia_en/shard-Wh7VO5Lp"
},
"shard-YazeIhze": {
"shard_lock_uri": "",
"shard_uri": "file:///tmp/phalanx/indexes/wikipedia_en/shard-YazeIhze"
},
"shard-cXyt4esz": {
"shard_lock_uri": "",
"shard_uri": "file:///tmp/phalanx/indexes/wikipedia_en/shard-cXyt4esz"
},
"shard-hUM3HWQW": {
"shard_lock_uri": "",
"shard_uri": "file:///tmp/phalanx/indexes/wikipedia_en/shard-hUM3HWQW"
},
"shard-jH3sTtc7": {
"shard_lock_uri": "",
"shard_uri": "file:///tmp/phalanx/indexes/wikipedia_en/shard-jH3sTtc7"
},
"shard-viI2Dm3V": {
"shard_lock_uri": "",
"shard_uri": "file:///tmp/phalanx/indexes/wikipedia_en/shard-viI2Dm3V"
},
"shard-y1tMwCEP": {
"shard_lock_uri": "",
"shard_uri": "file:///tmp/phalanx/indexes/wikipedia_en/shard-y1tMwCEP"
},
"shard-y7VRCIlU": {
"shard_lock_uri": "",
"shard_uri": "file:///tmp/phalanx/indexes/wikipedia_en/shard-y7VRCIlU"
}
}
}
},
"nodes": {
"node-duIMwfjn": {
"addr": "0.0.0.0",
"meta": {
"grpc_port": 5000,
"http_port": 8000,
"roles": [
"indexer",
"searcher"
]
},
"port": 3000,
"state": "alive"
}
},
"searcher_assignment": {
"wikipedia_en": {
"shard-73iAEf8K": [
"node-duIMwfjn"
],
"shard-CRzZVi2b": [
"node-duIMwfjn"
],
"shard-Wh7VO5Lp": [
"node-duIMwfjn"
],
"shard-YazeIhze": [
"node-duIMwfjn"
],
"shard-cXyt4esz": [
"node-duIMwfjn"
],
"shard-hUM3HWQW": [
"node-duIMwfjn"
],
"shard-jH3sTtc7": [
"node-duIMwfjn"
],
"shard-viI2Dm3V": [
"node-duIMwfjn"
],
"shard-y1tMwCEP": [
"node-duIMwfjn"
],
"shard-y7VRCIlU": [
"node-duIMwfjn"
]
}
}
}
```


## Add / Update documents

```
% ./bin/phalanx_docs.sh -i id ./testdata/enwiki-20211201-pages-articles-multistream-1000.jsonl | curl -XPUT -H 'Content-type: application/x-ndjson' http://localhost:8000/v1/indexes/wikipedia_en/documents --data-binary @-
```


## Delete documents

```
% jq -r '.id' ./testdata/enwiki-20211201-pages-articles-multistream-1000.jsonl | curl -XDELETE -H 'Content-type: text/plain' http://localhost:8000/v1/indexes/wikipedia_en/documents --data-binary @-
```


## Search

```
% curl -XPOST -H 'Content-type: text/plain' http://localhost:8000/v1/indexes/wikipedia_en/_search --data-binary @./examples/search_with_aggregation.json | jq .
```

```json
{
"aggregations": {
"timestamp_date_range": {
"last_year": 59,
"this_year": 0,
"year_before_last": 0
}
},
"documents": [
{
"fields": {
"id": 1316,
"title": "Annales school"
},
"id": "1316",
"score": 4.202233015754667,
"timestamp": 1641387370964624100
},
{
"fields": {
"id": 1164,
"title": "Artificial intelligence"
},
"id": "1164",
"score": 3.684979417225831,
"timestamp": 1641387370944337200
},
{
"fields": {
"id": 1397,
"title": "AOL"
},
"id": "1397",
"score": 3.616048285209088,
"timestamp": 1641387370954038800
},
{
"fields": {
"id": 775,
"title": "Algorithm"
},
"id": "775",
"score": 3.429643674018485,
"timestamp": 1641387370942956300
},
{
"fields": {
"id": 1361,
"title": "Anagram"
},
"id": "1361",
"score": 3.097368070553906,
"timestamp": 1641387370953257000
}
],
"hits": 59,
"index_name": "wikipedia_en"
}
```


## Delete index

The following command will delete the index `wikipedia_en` with the specified name. This command will delete the index file on the object storage and the index metadata on the metastore.

```
% curl -XDELETE http://localhost:8000/v1/indexes/wikipedia_en
```


## Docker container

### Build Docker container image
Expand Down
2 changes: 2 additions & 0 deletions SUMMARY.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
* [Introduction](./README.md)
* [Get started](./docs_md/getting_started.md)
Loading

0 comments on commit 61dd627

Please sign in to comment.