Nearest neighbor search for Rails and Postgres
Add this line to your application’s Gemfile:
gem "neighbor"Neighbor supports two extensions: cube and vector. cube ships with Postgres, while vector supports approximate nearest neighbor search.
For cube, run:
rails generate neighbor:cube
rails db:migrateFor vector, install pgvector and run:
rails generate neighbor:vector
rails db:migrateCreate a migration
class AddNeighborVectorToItems < ActiveRecord::Migration[7.0]
def change
add_column :items, :embedding, :cube
# or
add_column :items, :embedding, :vector, limit: 3 # dimensions
end
endAdd to your model
class Item < ApplicationRecord
has_neighbors :embedding
endUpdate the vectors
item.update(embedding: [1.0, 1.2, 0.5])Get the nearest neighbors to a record
item.nearest_neighbors(:embedding, distance: "euclidean").first(5)Get the nearest neighbors to a vector
Item.nearest_neighbors(:embedding, [0.9, 1.3, 1.1], distance: "euclidean").first(5)Supported values are:
euclideancosinetaxicab(cube only)chebyshev(cube only)inner_product(vector only)
For cosine distance with cube, vectors must be normalized before being stored.
class Item < ApplicationRecord
has_neighbors :embedding, normalize: true
endFor inner product with cube, see this example.
Records returned from nearest_neighbors will have a neighbor_distance attribute
nearest_item = item.nearest_neighbors(:embedding, distance: "euclidean").first
nearest_item.neighbor_distanceThe cube data type can have up to 100 dimensions by default. See the Postgres docs for how to increase this. The vector data type can have up to 16,000 dimensions, and vectors with up to 2,000 dimensions can be indexed.
For cube, it’s a good idea to specify the number of dimensions to ensure all records have the same number.
class Item < ApplicationRecord
has_neighbors :embedding, dimensions: 3
endFor vector, add an approximate index to speed up queries. Create a migration with:
class AddIndexToItemsNeighborVector < ActiveRecord::Migration[7.0]
def change
add_index :items, :embedding, using: :ivfflat, opclass: :vector_l2_ops
# or with pgvector 0.5.0+
add_index :items, :embedding, using: :hnsw, opclass: :vector_l2_ops
end
endUse :vector_cosine_ops for cosine distance and :vector_ip_ops for inner product.
Set the number of probes with IVFFlat
Item.connection.execute("SET ivfflat.probes = 3")Or the size of the dynamic candidate list with HNSW
Item.connection.execute("SET hnsw.ef_search = 100")Generate a model
rails generate model Article content:text embedding:vector{1536}
rails db:migrateAnd add has_neighbors
class Article < ApplicationRecord
has_neighbors :embedding
endCreate a method to call the embeddings API
def fetch_embeddings(input)
url = "https://api.openai.com/v1/embeddings"
headers = {
"Authorization" => "Bearer #{ENV.fetch("OPENAI_API_KEY")}",
"Content-Type" => "application/json"
}
data = {
input: input,
model: "text-embedding-ada-002"
}
response = Net::HTTP.post(URI(url), data.to_json, headers)
JSON.parse(response.body)["data"].map { |v| v["embedding"] }
endPass your input
input = [
"The dog is barking",
"The cat is purring",
"The bear is growling"
]
embeddings = fetch_embeddings(input)Store the embeddings
articles = []
input.zip(embeddings) do |content, embedding|
articles << {content: content, embedding: embedding}
end
Article.insert_all!(articles) # use create! for Active Record < 6And get similar articles
article = Article.first
article.nearest_neighbors(:embedding, distance: "inner_product").first(5).map(&:content)See the complete code
You can use Neighbor for online item-based recommendations with Disco. We’ll use MovieLens data for this example.
Generate a model
rails generate model Movie name:string factors:cube
rails db:migrateAnd add has_neighbors
class Movie < ApplicationRecord
has_neighbors :factors, dimensions: 20, normalize: true
endFit the recommender
data = Disco.load_movielens
recommender = Disco::Recommender.new(factors: 20)
recommender.fit(data)Store the item factors
movies = []
recommender.item_ids.each do |item_id|
movies << {name: item_id, factors: recommender.item_factors(item_id)}
end
Movie.insert_all!(movies) # use create! for Active Record < 6And get similar movies
movie = Movie.find_by(name: "Star Wars (1977)")
movie.nearest_neighbors(:factors, distance: "cosine").first(5).map(&:name)See the complete code for cube and vector
There are 3 options available when calling with the nearest_neighbor method.
movie = Movie.find_by(name: "Star Wars (1977)")
# Order all results by the neighbor_distance column in descending order
movie.nearest_neighbors(:factors, distance: "inner_product", order: { neighbor_distance: :desc })movie = Movie.find_by(name: "Star Wars (1977)")
# Limit the results to 3 records
movie.nearest_neighbors(:factors, distance: "inner_product", limit: 3)movie = Movie.find_by(name: "Star Wars (1977)")
# Only return records where the neighbor_distance is greater than or equal to 0.9
movie.nearest_neighbors(:factors, distance: "inner_product", threshold: { gte: 0.9 })All options can be used at the same time or separately.
movie = Movie.find_by(name: "Star Wars (1977)")
# Only return 5 records where the neighbor_distance is greater than or equal to 0.9 in descending order
movie.nearest_neighbors(
:factors,
distance: "inner_product",
limit: 5,
threshold: { gte: 0.9 },
order: { neighbor_distance: :desc }
)The distance option has been moved from has_neighbors to nearest_neighbors, and there is no longer a default. If you use cosine distance, set:
class Item < ApplicationRecord
has_neighbors normalize: true
endView the changelog
Everyone is encouraged to help improve this project. Here are a few ways you can help:
- Report bugs
- Fix bugs and submit pull requests
- Write, clarify, or fix documentation
- Suggest or add new features
To get started with development:
git clone https://github.com/ankane/neighbor.git
cd neighbor
bundle install
createdb neighbor_test
# cube
bundle exec rake test
# vector
EXT=vector bundle exec rake test