Distance metrics used in chemoinformatics domains for Postgres pgvector. Provides:
- Tanimoto/Jaccard-Index for dichotomous values
git clone https://github.com/leotaku/pgvector_chem
cd pgvector_chem
make CC=cc && make install
In order to use the functionality provided by this extension first install and enable the pgvector extension.
CREATE EXTENSION vector;
Enable the extension (do this once in each database where you want to use it).
CREATE EXTENSION vector_chem;
We define the operator <^>
for computing the Tanimoto distance between two vectors of dichotomous values.
SELECT '[1., 0., 1.]' <^> '[1., 0., 1.]';
-- ?column?
-- ----------
-- 0
-- (1 row)
The function tanimoto_distance
is also defined and computes the same result as the <^>
operator.
SELECT tanimoto_distance('[1., 0., 1.]', '[1., 0., 0.]');
-- tanimoto_distance
-- -------------------
-- 0.5
-- (1 row)
Unless you need to support the Tanimoto/Jaccard-Index for historical reasons, you are probably better served by the cosine distance (<=>
) built in to pgvector.
Cosine distance has been shown to carry essentially the same information as Tanimoto/Jaccard in a wide range of applications.1
Not at this time.
Not at this time.
pgvector_chem
is free and open source software distributed under the terms of The PostgreSQL License.
Substantial parts of the extension have been adapted from the existing pgvector
extension.
As such, their copyright assignments should also be observed and their original license has been included in this code distribution.