Skip to content

Commit

Permalink
Fixes #4242: The Pinecone APOC implementation is misleading (#4250)
Browse files Browse the repository at this point in the history
* Fixes #4242: The Pinecone APOC implementation is misleading

* Changes review pinecone.adoc
  • Loading branch information
vga91 authored Dec 2, 2024
1 parent dcba786 commit b24d774
Show file tree
Hide file tree
Showing 6 changed files with 89 additions and 82 deletions.
Original file line number Diff line number Diff line change
@@ -1,24 +1,33 @@

= Pinecone

[NOTE]
====
In Pinecone a collection is a static and non-queryable copy of an index,
therefore, unlike other vector dbs, the Pinecone procedures work on indexes instead of collections.
However, the vectordb procedures to handle CRUD operations on collections are usually named `apoc.ml.<vdbname>.createCollection` and `apoc.ml.<vdbname>.deleteCollection`,
so to be consistent, the Pinecone index procedures are named `apoc.ml.pinecone.createCollection` and `apoc.ml.pinecone.deleteCollection`.
====

Here is a list of all available Pinecone procedures:

[opts=header, cols="1, 3"]
|===
| name | description
| apoc.vectordb.pinecone.info(hostOrKey, collection, $config) | Get information about the specified existing collection or throws a 404 error if it does not exist
| apoc.vectordb.pinecone.info(hostOrKey, index, $config) | Get information about the specified existing index or throws a 404 error if it does not exist
| apoc.vectordb.pinecone.createCollection(hostOrKey, index, similarity, size, $config) |
Creates an index, with the name specified in the 2nd parameter, and with the specified `similarity` and `size`.
The default endpoint is `<hostOrKey param>/indexes`.
| apoc.vectordb.pinecone.deleteCollection(hostOrKey, index, $config) |
Deletes an index with the name specified in the 2nd parameter.
The default endpoint is `<hostOrKey param>/indexes/<collection param>`.
The default endpoint is `<hostOrKey param>/indexes/<index param>`.
| apoc.vectordb.pinecone.upsert(hostOrKey, index, vectors, $config) |
Upserts, in the index with the name specified in the 2nd parameter, the vectors [{id: 'id', vector: '<vectorDb>', medatada: '<metadata>'}].
The default endpoint is `<hostOrKey param>/vectors/upsert`.
| apoc.vectordb.pinecone.delete(hostOrKey, index, ids, $config) |
Delete the vectors with the specified `ids`.
The default endpoint is `<hostOrKey param>/indexes/<collection param>`.
The default endpoint is `<hostOrKey param>/indexes/<index param>`.
| apoc.vectordb.pinecone.get(hostOrKey, index, ids, $config) |
Get the vectors with the specified `ids`.
The default endpoint is `<hostOrKey param>/vectors/fetch`.
Expand All @@ -35,15 +44,6 @@ Here is a list of all available Pinecone procedures:

where the 1st parameter can be a key defined by the apoc config `apoc.pinecone.<key>.host=myHost`.

[NOTE]
====
The procedures create/drop/handle an index, instead of a collection like the other vectordb procedures,
since in Pinecone a collection is a static and non-queryable copy of an index.
Anyway, the create / delete index procedures are named `.createCollection` and `.deleteCollection` to be consistent with the other.
====


The default `hostOrKey` is `"https://api.pinecone.io"`,
therefore in general can be null with the `createCollection` and `deleteCollection` procedures,
and equal to the host name, with the other ones, that is, the one indicated in the Pinecone dashboard:
Expand All @@ -55,10 +55,10 @@ image::pinecone-index.png[width=800]

The following example assume we want to create and manage an index called `test-index`.

.Get collection info (it leverages https://docs.pinecone.io/reference/api/control-plane/describe_collection[this API])
.Get index info (it leverages https://docs.pinecone.io/guides/indexes/view-index-information[this API])
[source,cypher]
----
CALL apoc.vectordb.pinecone.info(hostOrKey, 'test-collection', {<optional config>})
CALL apoc.vectordb.pinecone.info(hostOrKey, 'test-index', {<optional config>})
----

.Example results
Expand All @@ -67,7 +67,7 @@ CALL apoc.vectordb.pinecone.info(hostOrKey, 'test-collection', {<optional config
| value
| { "dimension": 3,
"environment": "us-east1-gcp",
"name": "tiny-collection",
"name": "tiny-index",
"size": 3126700,
"status": "Ready",
"vector_count": 99
Expand Down Expand Up @@ -262,7 +262,7 @@ It is possible to execute vector db procedures together with the xref::ml/rag.ad

[source,cypher]
----
CALL apoc.vectordb.pinecone.getAndUpdate($host, $collection, [<id1>, <id2>], $conf) YIELD node, metadata, id, vector
CALL apoc.vectordb.pinecone.getAndUpdate($host, $index, [<id1>, <id2>], $conf) YIELD node, metadata, id, vector
WITH collect(node) as paths
CALL apoc.ml.rag(paths, $attributes, $question, $confPrompt) YIELD value
RETURN value
Expand Down
74 changes: 37 additions & 37 deletions extended/src/main/java/apoc/vectordb/Pinecone.java
Original file line number Diff line number Diff line change
Expand Up @@ -43,12 +43,12 @@ public class Pinecone {
public URLAccessChecker urlAccessChecker;

@Procedure("apoc.vectordb.pinecone.info")
@Description("apoc.vectordb.pinecone.info(hostOrKey, collection, $configuration) - Get information about the specified existing collection or throws an error if it does not exist")
@Description("apoc.vectordb.pinecone.info(hostOrKey, index, $configuration) - Get information about the specified existing index or throws an error if it does not exist")
public Stream<MapResult> getInfo(@Name("hostOrKey") String hostOrKey,
@Name("collection") String collection,
@Name("index") String index,
@Name(value = "configuration", defaultValue = "{}") Map<String, Object> configuration) throws Exception {
String url = "%s/collections/%s";
Map<String, Object> config = getVectorDbInfo(hostOrKey, collection, configuration, url);
String url = "%s/indexes/%s";
Map<String, Object> config = getVectorDbInfo(hostOrKey, index, configuration, url);

methodAndPayloadNull(config);

Expand All @@ -59,18 +59,18 @@ public Stream<MapResult> getInfo(@Name("hostOrKey") String hostOrKey,
}

@Procedure("apoc.vectordb.pinecone.createCollection")
@Description("apoc.vectordb.pinecone.createCollection(hostOrKey, collection, similarity, size, $configuration) - Creates a collection, with the name specified in the 2nd parameter, and with the specified `similarity` and `size`")
@Description("apoc.vectordb.pinecone.createCollection(hostOrKey, index, similarity, size, $configuration) - Creates a index, with the name specified in the 2nd parameter, and with the specified `similarity` and `size`")
public Stream<MapResult> createCollection(@Name("hostOrKey") String hostOrKey,
@Name("collection") String collection,
@Name("index") String index,
@Name("similarity") String similarity,
@Name("size") Long size,
@Name(value = "configuration", defaultValue = "{}") Map<String, Object> configuration) throws Exception {
String url = "%s/indexes";
Map<String, Object> config = getVectorDbInfo(hostOrKey, collection, configuration, url);
Map<String, Object> config = getVectorDbInfo(hostOrKey, index, configuration, url);
config.putIfAbsent(METHOD_KEY, "POST");

Map<String, Object> additionalBodies = Map.of(
"name", collection,
"name", index,
"dimension", size,
"metric", similarity
);
Expand All @@ -81,14 +81,14 @@ public Stream<MapResult> createCollection(@Name("hostOrKey") String hostOrKey,
}

@Procedure("apoc.vectordb.pinecone.deleteCollection")
@Description("apoc.vectordb.pinecone.deleteCollection(hostOrKey, collection, $configuration) - Deletes a collection with the name specified in the 2nd parameter")
@Description("apoc.vectordb.pinecone.deleteCollection(hostOrKey, index, $configuration) - Deletes a index with the name specified in the 2nd parameter")
public Stream<MapResult> deleteCollection(
@Name("hostOrKey") String hostOrKey,
@Name("collection") String collection,
@Name("index") String index,
@Name(value = "configuration", defaultValue = "{}") Map<String, Object> configuration) throws Exception {

String url = "%s/indexes/%s";
Map<String, Object> config = getVectorDbInfo(hostOrKey, collection, configuration, url);
Map<String, Object> config = getVectorDbInfo(hostOrKey, index, configuration, url);
config.putIfAbsent(METHOD_KEY, "DELETE");

RestAPIConfig restAPIConfig = new RestAPIConfig(config);
Expand All @@ -98,16 +98,16 @@ public Stream<MapResult> deleteCollection(
}

@Procedure("apoc.vectordb.pinecone.upsert")
@Description("apoc.vectordb.pinecone.upsert(hostOrKey, collection, vectors, $configuration) - Upserts, in the collection with the name specified in the 2nd parameter, the vectors [{id: 'id', vector: '<vectorDb>', medatada: '<metadata>'}]")
@Description("apoc.vectordb.pinecone.upsert(hostOrKey, index, vectors, $configuration) - Upserts, in the index with the name specified in the 2nd parameter, the vectors [{id: 'id', vector: '<vectorDb>', medatada: '<metadata>'}]")
public Stream<MapResult> upsert(
@Name("hostOrKey") String hostOrKey,
@Name("collection") String collection,
@Name("index") String index,
@Name("vectors") List<Map<String, Object>> vectors,
@Name(value = "configuration", defaultValue = "{}") Map<String, Object> configuration) throws Exception {

String url = "%s/vectors/upsert";

Map<String, Object> config = getVectorDbInfo(hostOrKey, collection, configuration, url);
Map<String, Object> config = getVectorDbInfo(hostOrKey, index, configuration, url);
config.putIfAbsent(METHOD_KEY, "POST");

vectors = vectors.stream()
Expand All @@ -126,15 +126,15 @@ public Stream<MapResult> upsert(
}

@Procedure("apoc.vectordb.pinecone.delete")
@Description("apoc.vectordb.pinecone.delete(hostOrKey, collection, ids, $configuration) - Delete the vectors with the specified `ids`")
@Description("apoc.vectordb.pinecone.delete(hostOrKey, index, ids, $configuration) - Delete the vectors with the specified `ids`")
public Stream<MapResult> delete(
@Name("hostOrKey") String hostOrKey,
@Name("collection") String collection,
@Name("index") String index,
@Name("vectors") List<Object> ids,
@Name(value = "configuration", defaultValue = "{}") Map<String, Object> configuration) throws Exception {

String url = "%s/vectors/delete";
Map<String, Object> config = getVectorDbInfo(hostOrKey, collection, configuration, url);
Map<String, Object> config = getVectorDbInfo(hostOrKey, index, configuration, url);
config.putIfAbsent(METHOD_KEY, "POST");

Map<String, Object> additionalBodies = Map.of("ids", ids);
Expand All @@ -145,29 +145,29 @@ public Stream<MapResult> delete(
}

@Procedure(value = "apoc.vectordb.pinecone.get")
@Description("apoc.vectordb.pinecone.get(hostOrKey, collection, ids, $configuration) - Get the vectors with the specified `ids`")
@Description("apoc.vectordb.pinecone.get(hostOrKey, index, ids, $configuration) - Get the vectors with the specified `ids`")
public Stream<VectorDbUtil.EmbeddingResult> get(@Name("hostOrKey") String hostOrKey,
@Name("collection") String collection,
@Name("index") String index,
@Name("ids") List<Object> ids,
@Name(value = "configuration", defaultValue = "{}") Map<String, Object> configuration) throws Exception {
setReadOnlyMappingMode(configuration);
return getCommon(hostOrKey, collection, ids, configuration);
return getCommon(hostOrKey, index, ids, configuration);
}

@Procedure(value = "apoc.vectordb.pinecone.getAndUpdate", mode = Mode.WRITE)
@Description("apoc.vectordb.pinecone.getAndUpdate(hostOrKey, collection, ids, $configuration) - Get the vectors with the specified `ids`")
@Description("apoc.vectordb.pinecone.getAndUpdate(hostOrKey, index, ids, $configuration) - Get the vectors with the specified `ids`")
public Stream<VectorDbUtil.EmbeddingResult> getAndUpdate(@Name("hostOrKey") String hostOrKey,
@Name("collection") String collection,
@Name("index") String index,
@Name("ids") List<Object> ids,
@Name(value = "configuration", defaultValue = "{}") Map<String, Object> configuration) throws Exception {
return getCommon(hostOrKey, collection, ids, configuration);
return getCommon(hostOrKey, index, ids, configuration);
}

private Stream<VectorDbUtil.EmbeddingResult> getCommon(String hostOrKey, String collection, List<Object> ids, Map<String, Object> configuration) throws Exception {
private Stream<VectorDbUtil.EmbeddingResult> getCommon(String hostOrKey, String index, List<Object> ids, Map<String, Object> configuration) throws Exception {
String url = "%s/vectors/fetch";
Map<String, Object> config = getVectorDbInfo(hostOrKey, collection, configuration, url);
Map<String, Object> config = getVectorDbInfo(hostOrKey, index, configuration, url);

VectorEmbeddingConfig conf = DB_HANDLER.getEmbedding().fromGet(config, procedureCallContext, ids, collection);
VectorEmbeddingConfig conf = DB_HANDLER.getEmbedding().fromGet(config, procedureCallContext, ids, index);

return getEmbeddingResultStream(conf, procedureCallContext, urlAccessChecker, tx,
v -> {
Expand All @@ -178,33 +178,33 @@ private Stream<VectorDbUtil.EmbeddingResult> getCommon(String hostOrKey, String
}

@Procedure(value = "apoc.vectordb.pinecone.query")
@Description("apoc.vectordb.pinecone.query(hostOrKey, collection, vector, filter, limit, $configuration) - Retrieve closest vectors the the defined `vector`, `limit` of results, in the collection with the name specified in the 2nd parameter")
@Description("apoc.vectordb.pinecone.query(hostOrKey, index, vector, filter, limit, $configuration) - Retrieve closest vectors the the defined `vector`, `limit` of results, in the index with the name specified in the 2nd parameter")
public Stream<VectorDbUtil.EmbeddingResult> query(@Name("hostOrKey") String hostOrKey,
@Name("collection") String collection,
@Name("index") String index,
@Name(value = "vector", defaultValue = "[]") List<Double> vector,
@Name(value = "filter", defaultValue = "{}") Map<String, Object> filter,
@Name(value = "limit", defaultValue = "10") long limit,
@Name(value = "configuration", defaultValue = "{}") Map<String, Object> configuration) throws Exception {
setReadOnlyMappingMode(configuration);
return queryCommon(hostOrKey, collection, vector, filter, limit, configuration);
return queryCommon(hostOrKey, index, vector, filter, limit, configuration);
}

@Procedure(value = "apoc.vectordb.pinecone.queryAndUpdate", mode = Mode.WRITE)
@Description("apoc.vectordb.pinecone.queryAndUpdate(hostOrKey, collection, vector, filter, limit, $configuration) - Retrieve closest vectors the the defined `vector`, `limit` of results, in the collection with the name specified in the 2nd parameter")
@Description("apoc.vectordb.pinecone.queryAndUpdate(hostOrKey, index, vector, filter, limit, $configuration) - Retrieve closest vectors the the defined `vector`, `limit` of results, in the index with the name specified in the 2nd parameter")
public Stream<VectorDbUtil.EmbeddingResult> queryAndUpdate(@Name("hostOrKey") String hostOrKey,
@Name("collection") String collection,
@Name("index") String index,
@Name(value = "vector", defaultValue = "[]") List<Double> vector,
@Name(value = "filter", defaultValue = "{}") Map<String, Object> filter,
@Name(value = "limit", defaultValue = "10") long limit,
@Name(value = "configuration", defaultValue = "{}") Map<String, Object> configuration) throws Exception {
return queryCommon(hostOrKey, collection, vector, filter, limit, configuration);
return queryCommon(hostOrKey, index, vector, filter, limit, configuration);
}

private Stream<VectorDbUtil.EmbeddingResult> queryCommon(String hostOrKey, String collection, List<Double> vector, Map<String, Object> filter, long limit, Map<String, Object> configuration) throws Exception {
private Stream<VectorDbUtil.EmbeddingResult> queryCommon(String hostOrKey, String index, List<Double> vector, Map<String, Object> filter, long limit, Map<String, Object> configuration) throws Exception {
String url = "%s/query";
Map<String, Object> config = getVectorDbInfo(hostOrKey, collection, configuration, url);
Map<String, Object> config = getVectorDbInfo(hostOrKey, index, configuration, url);

VectorEmbeddingConfig conf = DB_HANDLER.getEmbedding().fromQuery(config, procedureCallContext, vector, filter, limit, collection);
VectorEmbeddingConfig conf = DB_HANDLER.getEmbedding().fromQuery(config, procedureCallContext, vector, filter, limit, index);

return getEmbeddingResultStream(conf, procedureCallContext, urlAccessChecker, tx,
v -> {
Expand All @@ -215,7 +215,7 @@ private Stream<VectorDbUtil.EmbeddingResult> queryCommon(String hostOrKey, Strin
}

private Map<String, Object> getVectorDbInfo(
String hostOrKey, String collection, Map<String, Object> configuration, String templateUrl) {
return getCommonVectorDbInfo(hostOrKey, collection, configuration, templateUrl, DB_HANDLER);
String hostOrKey, String index, Map<String, Object> configuration, String templateUrl) {
return getCommonVectorDbInfo(hostOrKey, index, configuration, templateUrl, DB_HANDLER);
}
}
4 changes: 2 additions & 2 deletions extended/src/main/java/apoc/vectordb/PineconeHandler.java
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ static class PineconeEmbeddingHandler implements VectorEmbeddingHandler {
* that makes the request to respond 200 OK, but returns an empty result
*/
@Override
public <T> VectorEmbeddingConfig fromGet(Map<String, Object> config, ProcedureCallContext procedureCallContext, List<T> ids, String collection) {
public <T> VectorEmbeddingConfig fromGet(Map<String, Object> config, ProcedureCallContext procedureCallContext, List<T> ids, String index) {
List<String> fields = procedureCallContext.outputFields().toList();

config.put(BODY_KEY, null);
Expand All @@ -74,7 +74,7 @@ public <T> VectorEmbeddingConfig fromGet(Map<String, Object> config, ProcedureCa
}

@Override
public VectorEmbeddingConfig fromQuery(Map<String, Object> config, ProcedureCallContext procedureCallContext, List<Double> vector, Object filter, long limit, String collection) {
public VectorEmbeddingConfig fromQuery(Map<String, Object> config, ProcedureCallContext procedureCallContext, List<Double> vector, Object filter, long limit, String index) {
List<String> fields = procedureCallContext.outputFields().toList();

Map<String, Object> additionalBodies = map("vector", vector,
Expand Down
Loading

0 comments on commit b24d774

Please sign in to comment.