Skip to content

Distance metrics used in chemoinformatics domains for PostgreSQL pgvector

License

Unknown, Unknown licenses found

Licenses found

Unknown
LICENSE
Unknown
LICENSE-PGVECTOR
Notifications You must be signed in to change notification settings

leotaku/pgvector_chem

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pgvector_chem

Distance metrics used in chemoinformatics domains for Postgres pgvector. Provides:

  • Tanimoto/Jaccard-Index for dichotomous values

Installation

git clone https://github.com/leotaku/pgvector_chem
cd pgvector_chem
make CC=cc && make install

Usage

Setup

In order to use the functionality provided by this extension first install and enable the pgvector extension.

CREATE EXTENSION vector;

Enable the extension (do this once in each database where you want to use it).

CREATE EXTENSION vector_chem;

Querying

We define the operator <^> for computing the Tanimoto distance between two vectors of dichotomous values.

SELECT '[1., 0., 1.]' <^> '[1., 0., 1.]';
-- ?column?
-- ----------
--       0
-- (1 row)

The function tanimoto_distance is also defined and computes the same result as the <^> operator.

SELECT tanimoto_distance('[1., 0., 1.]', '[1., 0., 0.]');
--  tanimoto_distance
-- -------------------
--                0.5
-- (1 row)

FAQ

Should I use this library?

Unless you need to support the Tanimoto/Jaccard-Index for historical reasons, you are probably better served by the cosine distance (<=>) built in to pgvector.

Cosine distance has been shown to carry essentially the same information as Tanimoto/Jaccard in a wide range of applications.1

Is the provided Tanimoto/Jaccard-Index distance usable for continuous values?

Not at this time.

Is approximate nearest neighbor search supported?

Not at this time.

License

pgvector_chem is free and open source software distributed under the terms of The PostgreSQL License.

Substantial parts of the extension have been adapted from the existing pgvector extension. As such, their copyright assignments should also be observed and their original license has been included in this code distribution.

About

Distance metrics used in chemoinformatics domains for PostgreSQL pgvector

Resources

License

Unknown, Unknown licenses found

Licenses found

Unknown
LICENSE
Unknown
LICENSE-PGVECTOR

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published