Closed
Description
I have a sparse vector — the result of applying sklearn
's TfidfVectorizer:
<Compressed Sparse Row sparse matrix of dtype 'float64'
with 4 stored elements and shape (1, 157541)>
Coords Values
(0, 5051) 0.35521903059198523
(0, 14956) 0.5566306658037382
(0, 45152) 0.7328483894186835
(0, 60738) 0.1640578566196061
which I want to copy into a table with a sparsevec
column. As far as I understand from the documentation, the correct way to do this is the following:
with cur.copy(
"COPY my_table FROM STDIN WITH (FORMAT BINARY)"
) as copy:
copy.set_types(["sparsevec"])
copy.write_row((SparseVector(the_sparse_vector),))
but this produces an error:
psycopg.errors.DataException: sparsevec indices must not contain duplicates
I've investigated a bit and found this line which uses value.coords[0]
(not value.coords[1]
for two dimensional input). Is this a bug? What should I do?
Additional information about the example:
- The code
print(the_sparse_vector)
the_sparse_vector = the_sparse_vector.tocoo()
print(the_sparse_vector.ndim, the_sparse_vector.shape)
print(the_sparse_vector.coords)
print(the_sparse_vector.data)
print(SparseVector(the_sparse_vector))
outputs:
<Compressed Sparse Row sparse matrix of dtype 'float64'
with 4 stored elements and shape (1, 157541)>
Coords Values
(0, 5051) 0.35521903059198523
(0, 14956) 0.5566306658037382
(0, 45152) 0.7328483894186835
(0, 60738) 0.1640578566196061
2 (1, 157541)
(array([0, 0, 0, 0], dtype=int32), array([ 5051, 14956, 45152, 60738], dtype=int32))
[0.35521903 0.55663067 0.73284839 0.16405786]
SparseVector({0: 0.1640578566196061}, 157541)
- I have
psycopg 3.2.6
psycopg-binary 3.2.6
pgvector 0.4.0
scipy 1.15.2
Metadata
Metadata
Assignees
Labels
No labels