Skip to content

Latest commit

 

History

History
77 lines (50 loc) · 2.05 KB

README.md

File metadata and controls

77 lines (50 loc) · 2.05 KB

pgvector_chem

Distance metrics used in chemoinformatics domains for Postgres pgvector. Provides:

  • Tanimoto/Jaccard-Index for dichotomous values

Installation

git clone https://github.com/leotaku/pgvector_chem
cd pgvector_chem
make CC=cc && make install

Usage

Setup

In order to use the functionality provided by this extension first install and enable the pgvector extension.

CREATE EXTENSION vector;

Enable the extension (do this once in each database where you want to use it).

CREATE EXTENSION vector_chem;

Querying

We define the operator <^> for computing the Tanimoto distance between two vectors of dichotomous values.

SELECT '[1., 0., 1.]' <^> '[1., 0., 1.]';
-- ?column?
-- ----------
--       0
-- (1 row)

The function tanimoto_distance is also defined and computes the same result as the <^> operator.

SELECT tanimoto_distance('[1., 0., 1.]', '[1., 0., 0.]');
--  tanimoto_distance
-- -------------------
--                0.5
-- (1 row)

FAQ

Should I use this library?

Unless you need to support the Tanimoto/Jaccard-Index for historical reasons, you are probably better served by the cosine distance (<=>) built in to pgvector.

Cosine distance has been shown to carry essentially the same information as Tanimoto/Jaccard in a wide range of applications.1

Is the provided Tanimoto/Jaccard-Index distance usable for continuous values?

Not at this time.

Is approximate nearest neighbor search supported?

Not at this time.

License

pgvector_chem is free and open source software distributed under the terms of The PostgreSQL License.

Substantial parts of the extension have been adapted from the existing pgvector extension. As such, their copyright assignments should also be observed and their original license has been included in this code distribution.