Skip to content

Sense2vec Similarity Question #42

@dbl001

Description

@dbl001

Why do 'flies|VERB' and 'flies|NOUN' have a similarity of 1.0?
I'm running sense2vec on Anaconda, with Python 3.6 on OS X 10.11.6

$ python --version
Python 3.6.3 :: Anaconda custom (64-bit)
$ sputnik --name sense2vec --repository-url http://index.spacy.io install reddit_vectors
Downloading...
Downloaded 560.90MB 100.00% 2.15MB/s eta 0s              
archive.gz checksum/md5 OK
INFO:sputnik.pool:install reddit_vectors-1.1.0
$ conda list spacy
# packages in environment at /Users/davidlaxer/anaconda/envs/spacy:
#
spacy                     2.0.4                    py36_0    conda-forge
spacy                     0.101.0                   <pip>
$ conda list sense2vec
# packages in environment at /Users/davidlaxer/anaconda/envs/spacy:
#
sense2vec                 0.6.0                     <pip>
$ conda list thinc
# packages in environment at /Users/davidlaxer/anaconda/envs/spacy:
#
thinc                     6.10.0                   py36_0    conda-forge
thinc                     5.0.8                     <pip>

Here's my example:

import sense2vec
model = sense2vec.load()
freq, query_vector1 = model["flies|NOUN"]
model.most_similar(query_vector1, n=5)
(['flies|NOUN', 'gnats|NOUN', 'snakes|NOUN', 'birds|NOUN',  'grasshoppers|NOUN'],
 <MemoryView of 'ndarray' at 0x1af394c540>)

freq, query_vector2 = model["flies|VERB"]
model.most_similar(query_vector2, n=5)

(['flies|VERB', 'flys|VERB', 'flying|VERB', 'jumps|VERB', 'swoops|VERB'],
 <MemoryView of 'ndarray' at 0x1af394c6e8>)
In [42]: model.data.similarity(query_vector1, query_vector1)
1.0

screen shot 2018-01-05 at 2 53 08 pm

From a model I trained:

In [40] new_model = gensim.models.Word2Vec.load('/Users/davidlaxer/LSTM-Sentiment-Analysis/corpus_output_256.txt')
In [41] new_model.similarity('flies|NOUN', 'flies|VERB')
0.9954307438574328
In [43] new_model.wv.vocab["flies|VERB"].index
5895
In [44] new_model.wv.vocab["flies|NOUN"].index
7349
In [45] new_model.wv["flies|VERB"]  
array([ 0.15279259,  0.04471067,  0.0923325 , -0.07349139,  0.04180749,
     -0.71864516,  0.08252977, -0.02405624,  0.28384277,  0.01706951,
     -0.15931296, -0.21216595, -0.0352594 ,  0.13597694,  0.07868216,
     -0.15907238, -0.30132023,  0.01954124,  0.22636545, -0.19983807,
     -0.03842518,  0.49959993, -0.18679027, -0.16045345,  0.05813084,
      0.12905809,  0.1305625 ,  0.42689237,  0.19311258, -0.1002808 ,
      0.07427863, -0.19840011,  0.42542475, -0.32158205,  0.15129171,
     -0.32177079, -0.04034998, -0.05301504,  0.38441092, -0.31020632,
      0.42528978, -0.26249531, -0.25648555,  0.16558036,  0.28656447,
     -0.11909373,  0.09208378, -0.08886475, -0.40061441,  0.02873728,
      0.07275984, -0.05674595, -0.09471942, -0.01308586, -0.2777423 ,
     -0.05253473, -0.00179329, -0.15887854,  0.31784746, -0.00895729,
      0.50658983,  0.09232203,  0.16289137, -0.20241632, -0.01240843,
      0.20972176,  0.065593  ,  0.40676439, -0.16795945,  0.08079262,
      0.27334401,  0.16058736, -0.15362383, -0.13958427,  0.17041191,
     -0.08574789, -0.20200305,  0.16288304,  0.11220794,  0.44721738,
     -0.14058201,  0.13652138, -0.0134679 ,  0.20938247,  0.34156594,
      0.21730828, -0.19907214,  0.02451441,  0.12492239,  0.08635994,
     -0.29003018,  0.01458945,  0.02637799,  0.10671763, -0.17983682,
      0.01115436, -0.02827467,  0.13415532,  0.4656623 , -0.34222263,
      0.44238791, -0.29407004, -0.16681372,  0.04466435, -0.21825369,
     -0.09138768,  0.02407285, -0.57841706, -0.19544049, -0.07518575,
      0.36430466, -0.13164517, -0.01708322,  0.11068137,  0.2811991 ,
      0.02544841,  0.10672008,  0.06147943,  0.09167367, -0.71296901,
      0.04190712, -0.47360554, -0.01762259,  0.0359503 , -0.24351278,
     -0.01718491, -0.04033662,  0.03032484, -0.33736056, -0.13555804,
      0.02156358, -0.50073934, -0.0706998 ,  0.41698509, -0.23886077,
     -0.06120266, -0.0681426 ,  0.15182504,  0.13283113, -0.05899575,
     -0.11477304, -0.18594885, -0.17855589,  0.31381837,  0.25157636,
      0.41943148,  0.05070408, -0.03173119, -0.04240219, -0.25305411,
     -0.36856946,  0.20292452,  0.10858628,  0.17122397,  0.01447193,
     -0.47961271, -0.45739996,  0.17185016, -0.03916142, -0.04544915,
      0.34947339,  0.04178765,  0.37088165,  0.14284173,  0.03443905,
      0.30170318,  0.05259432, -0.22402297,  0.05495254, -0.46103877,
     -0.22059456, -0.27414244,  0.55484813,  0.1569699 ,  0.35761088,
      0.08712664,  0.23313828, -0.25803107, -0.03343969, -0.14713305,
     -0.0611255 ,  0.17435439, -0.01603068,  0.00526717, -0.08379596,
     -0.08644171, -0.12666632,  0.12955435,  0.48045933, -0.17596652,
     -0.29505005,  0.60152525, -0.01975689,  0.02343576,  0.17027852,
     -0.06638149, -0.10826188, -0.41277543, -0.12114278, -0.01596882,
      0.02660148,  0.22383556, -0.030263  , -0.0768819 , -0.32506746,
     -0.15082234, -0.16559191, -0.08502773, -0.01570902, -0.22921689,
      0.19637343, -0.4993245 ,  0.19670881,  0.17284806,  0.10345648,
      0.45276237, -0.12255403,  0.18032061,  0.05677452,  0.09869532,
     -0.23536956, -0.22449525,  0.51938456,  0.24111946,  0.26022053,
     -0.18190917, -0.01768251,  0.00435291,  0.05820792, -0.46525213,
      0.17490779,  0.15250422, -0.1760795 ,  0.14194083,  0.09954269,
     -0.89346975, -0.11642933,  0.0944154 ,  0.2134015 , -0.01955901,
     -0.02899018,  0.07254739, -0.03995875,  0.39499217, -0.05394226,
     -0.07821836, -0.29973337, -0.11607374, -0.01082127,  0.36769736,
      0.04288069, -0.0461933 ,  0.00675509,  0.25210902, -0.21784271,
     -0.18479778], dtype=float32)
In [46]: new_model.wv["flies|NOUN"]
array([ 0.1304135 ,  0.05724983,  0.06886293, -0.03062466,  0.01640639,
     -0.53799176,  0.10968599, -0.02839088,  0.18814373,  0.00147691,
     -0.11227507, -0.14502132, -0.03685957,  0.06422875,  0.07289967,
     -0.10437401, -0.23557086,  0.00153201,  0.17661473, -0.12828164,
     -0.02789859,  0.35942602, -0.1580196 , -0.13264264,  0.03343309,
      0.10922851,  0.1102568 ,  0.29480889,  0.14417146, -0.07892705,
      0.06608826, -0.14885685,  0.32329369, -0.23263605,  0.11967299,
     -0.23964159, -0.02619613,  0.00930338,  0.31111386, -0.22507732,
      0.32475442, -0.19287167, -0.19306417,  0.10722513,  0.2237518 ,
     -0.06828826,  0.07246322, -0.06233693, -0.31375739,  0.01069155,
      0.04457425, -0.00323939, -0.05079295, -0.02164256, -0.22060572,
     -0.03816675,  0.00503534, -0.10069088,  0.24429323,  0.02505454,
      0.38344654,  0.09145252,  0.11439045, -0.10801487, -0.01075712,
      0.16894275,  0.04799445,  0.3149668 , -0.13885498,  0.02068597,
      0.17856079,  0.11587915, -0.11973458, -0.0896498 ,  0.11993878,
     -0.06647626, -0.15219077,  0.10705566,  0.07842658,  0.31101131,
     -0.12788543,  0.09909476,  0.00878725,  0.1618593 ,  0.22566552,
      0.1297064 , -0.14370884,  0.02069237,  0.08489513,  0.0567583 ,
     -0.21860926,  0.01057386,  0.03844477,  0.06213358, -0.12877114,
      0.02327059, -0.00917741,  0.11733869,  0.35853127, -0.25572705,
      0.30879059, -0.20568153, -0.12405248,  0.03546307, -0.18377842,
     -0.06700096,  0.00626029, -0.42848313, -0.13129929, -0.04215423,
      0.26977378, -0.07725398,  0.01177794,  0.05952175,  0.21516307,
      0.01055368,  0.06727242,  0.05038245,  0.06739338, -0.53844106,
      0.02834721, -0.33890292, -0.02644366,  0.03540507, -0.16382404,
     -0.01353777, -0.02502321,  0.00226415, -0.24348356, -0.12502551,
      0.01489578, -0.37660655, -0.05798845,  0.28748602, -0.18512824,
     -0.06250153, -0.06967189,  0.14023623,  0.09628384, -0.09925015,
     -0.07317897, -0.14045765, -0.14597888,  0.24456802,  0.173549  ,
      0.3357946 ,  0.0424754 ,  0.00723427, -0.02120454, -0.14892557,
     -0.26496273,  0.14844348,  0.06555442,  0.11951103,  0.03691757,
     -0.36404395, -0.32292312,  0.09412326, -0.06377046, -0.02561374,
      0.24361259,  0.02616721,  0.29151902,  0.1178301 ,  0.03284379,
      0.20218852,  0.0337379 , -0.14703217,  0.02869225, -0.31447497,
     -0.15038867, -0.23353554,  0.41700551,  0.11959957,  0.26917797,
      0.04590914,  0.16029988, -0.18795538, -0.01343729, -0.10532234,
     -0.02617499,  0.12019841,  0.00673278, -0.0070972 , -0.03176219,
     -0.07582191, -0.07277017,  0.09928112,  0.36159652, -0.14404564,
     -0.21233276,  0.46463615,  0.01645906,  0.01815237,  0.12149289,
     -0.07040837, -0.06278557, -0.29605272, -0.07451538,  0.00487611,
      0.00313085,  0.13640559, -0.02045129, -0.05790693, -0.22582445,
     -0.10382047, -0.13318184, -0.05160375,  0.01498237, -0.15075362,
      0.14116266, -0.36445442,  0.1420894 ,  0.11182524,  0.10055254,
      0.33450282, -0.08930281,  0.15410167,  0.03961684,  0.06431124,
     -0.15608449, -0.1599745 ,  0.3780185 ,  0.18073064,  0.2190931 ,
     -0.16039631, -0.03769958, -0.00069833,  0.06914425, -0.33746576,
      0.11075038,  0.11626988, -0.12498619,  0.07928085,  0.0636186 ,
     -0.6352759 , -0.10650127,  0.03810085,  0.14585988, -0.01552053,
     -0.01488287,  0.04300846, -0.00500007,  0.26444513, -0.03629581,
     -0.04127173, -0.23304868, -0.08911316,  0.0029219 ,  0.27401808,
      0.00279731, -0.04162024,  0.00214672,  0.15316918, -0.14298579,
     -0.15343791], dtype=float32)


Metadata

Metadata

Assignees

No one assigned

    Labels

    usageGeneral usage

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions