-
Notifications
You must be signed in to change notification settings - Fork 241
Closed
Labels
usageGeneral usageGeneral usage
Description
Why do 'flies|VERB' and 'flies|NOUN' have a similarity of 1.0?
I'm running sense2vec on Anaconda, with Python 3.6 on OS X 10.11.6
$ python --version
Python 3.6.3 :: Anaconda custom (64-bit)
$ sputnik --name sense2vec --repository-url http://index.spacy.io install reddit_vectors
Downloading...
Downloaded 560.90MB 100.00% 2.15MB/s eta 0s
archive.gz checksum/md5 OK
INFO:sputnik.pool:install reddit_vectors-1.1.0
$ conda list spacy
# packages in environment at /Users/davidlaxer/anaconda/envs/spacy:
#
spacy 2.0.4 py36_0 conda-forge
spacy 0.101.0 <pip>
$ conda list sense2vec
# packages in environment at /Users/davidlaxer/anaconda/envs/spacy:
#
sense2vec 0.6.0 <pip>
$ conda list thinc
# packages in environment at /Users/davidlaxer/anaconda/envs/spacy:
#
thinc 6.10.0 py36_0 conda-forge
thinc 5.0.8 <pip>
Here's my example:
import sense2vec
model = sense2vec.load()
freq, query_vector1 = model["flies|NOUN"]
model.most_similar(query_vector1, n=5)
(['flies|NOUN', 'gnats|NOUN', 'snakes|NOUN', 'birds|NOUN', 'grasshoppers|NOUN'],
<MemoryView of 'ndarray' at 0x1af394c540>)
freq, query_vector2 = model["flies|VERB"]
model.most_similar(query_vector2, n=5)
(['flies|VERB', 'flys|VERB', 'flying|VERB', 'jumps|VERB', 'swoops|VERB'],
<MemoryView of 'ndarray' at 0x1af394c6e8>)
In [42]: model.data.similarity(query_vector1, query_vector1)
1.0
From a model I trained:
In [40] new_model = gensim.models.Word2Vec.load('/Users/davidlaxer/LSTM-Sentiment-Analysis/corpus_output_256.txt')
In [41] new_model.similarity('flies|NOUN', 'flies|VERB')
0.9954307438574328
In [43] new_model.wv.vocab["flies|VERB"].index
5895
In [44] new_model.wv.vocab["flies|NOUN"].index
7349
In [45] new_model.wv["flies|VERB"]
array([ 0.15279259, 0.04471067, 0.0923325 , -0.07349139, 0.04180749,
-0.71864516, 0.08252977, -0.02405624, 0.28384277, 0.01706951,
-0.15931296, -0.21216595, -0.0352594 , 0.13597694, 0.07868216,
-0.15907238, -0.30132023, 0.01954124, 0.22636545, -0.19983807,
-0.03842518, 0.49959993, -0.18679027, -0.16045345, 0.05813084,
0.12905809, 0.1305625 , 0.42689237, 0.19311258, -0.1002808 ,
0.07427863, -0.19840011, 0.42542475, -0.32158205, 0.15129171,
-0.32177079, -0.04034998, -0.05301504, 0.38441092, -0.31020632,
0.42528978, -0.26249531, -0.25648555, 0.16558036, 0.28656447,
-0.11909373, 0.09208378, -0.08886475, -0.40061441, 0.02873728,
0.07275984, -0.05674595, -0.09471942, -0.01308586, -0.2777423 ,
-0.05253473, -0.00179329, -0.15887854, 0.31784746, -0.00895729,
0.50658983, 0.09232203, 0.16289137, -0.20241632, -0.01240843,
0.20972176, 0.065593 , 0.40676439, -0.16795945, 0.08079262,
0.27334401, 0.16058736, -0.15362383, -0.13958427, 0.17041191,
-0.08574789, -0.20200305, 0.16288304, 0.11220794, 0.44721738,
-0.14058201, 0.13652138, -0.0134679 , 0.20938247, 0.34156594,
0.21730828, -0.19907214, 0.02451441, 0.12492239, 0.08635994,
-0.29003018, 0.01458945, 0.02637799, 0.10671763, -0.17983682,
0.01115436, -0.02827467, 0.13415532, 0.4656623 , -0.34222263,
0.44238791, -0.29407004, -0.16681372, 0.04466435, -0.21825369,
-0.09138768, 0.02407285, -0.57841706, -0.19544049, -0.07518575,
0.36430466, -0.13164517, -0.01708322, 0.11068137, 0.2811991 ,
0.02544841, 0.10672008, 0.06147943, 0.09167367, -0.71296901,
0.04190712, -0.47360554, -0.01762259, 0.0359503 , -0.24351278,
-0.01718491, -0.04033662, 0.03032484, -0.33736056, -0.13555804,
0.02156358, -0.50073934, -0.0706998 , 0.41698509, -0.23886077,
-0.06120266, -0.0681426 , 0.15182504, 0.13283113, -0.05899575,
-0.11477304, -0.18594885, -0.17855589, 0.31381837, 0.25157636,
0.41943148, 0.05070408, -0.03173119, -0.04240219, -0.25305411,
-0.36856946, 0.20292452, 0.10858628, 0.17122397, 0.01447193,
-0.47961271, -0.45739996, 0.17185016, -0.03916142, -0.04544915,
0.34947339, 0.04178765, 0.37088165, 0.14284173, 0.03443905,
0.30170318, 0.05259432, -0.22402297, 0.05495254, -0.46103877,
-0.22059456, -0.27414244, 0.55484813, 0.1569699 , 0.35761088,
0.08712664, 0.23313828, -0.25803107, -0.03343969, -0.14713305,
-0.0611255 , 0.17435439, -0.01603068, 0.00526717, -0.08379596,
-0.08644171, -0.12666632, 0.12955435, 0.48045933, -0.17596652,
-0.29505005, 0.60152525, -0.01975689, 0.02343576, 0.17027852,
-0.06638149, -0.10826188, -0.41277543, -0.12114278, -0.01596882,
0.02660148, 0.22383556, -0.030263 , -0.0768819 , -0.32506746,
-0.15082234, -0.16559191, -0.08502773, -0.01570902, -0.22921689,
0.19637343, -0.4993245 , 0.19670881, 0.17284806, 0.10345648,
0.45276237, -0.12255403, 0.18032061, 0.05677452, 0.09869532,
-0.23536956, -0.22449525, 0.51938456, 0.24111946, 0.26022053,
-0.18190917, -0.01768251, 0.00435291, 0.05820792, -0.46525213,
0.17490779, 0.15250422, -0.1760795 , 0.14194083, 0.09954269,
-0.89346975, -0.11642933, 0.0944154 , 0.2134015 , -0.01955901,
-0.02899018, 0.07254739, -0.03995875, 0.39499217, -0.05394226,
-0.07821836, -0.29973337, -0.11607374, -0.01082127, 0.36769736,
0.04288069, -0.0461933 , 0.00675509, 0.25210902, -0.21784271,
-0.18479778], dtype=float32)
In [46]: new_model.wv["flies|NOUN"]
array([ 0.1304135 , 0.05724983, 0.06886293, -0.03062466, 0.01640639,
-0.53799176, 0.10968599, -0.02839088, 0.18814373, 0.00147691,
-0.11227507, -0.14502132, -0.03685957, 0.06422875, 0.07289967,
-0.10437401, -0.23557086, 0.00153201, 0.17661473, -0.12828164,
-0.02789859, 0.35942602, -0.1580196 , -0.13264264, 0.03343309,
0.10922851, 0.1102568 , 0.29480889, 0.14417146, -0.07892705,
0.06608826, -0.14885685, 0.32329369, -0.23263605, 0.11967299,
-0.23964159, -0.02619613, 0.00930338, 0.31111386, -0.22507732,
0.32475442, -0.19287167, -0.19306417, 0.10722513, 0.2237518 ,
-0.06828826, 0.07246322, -0.06233693, -0.31375739, 0.01069155,
0.04457425, -0.00323939, -0.05079295, -0.02164256, -0.22060572,
-0.03816675, 0.00503534, -0.10069088, 0.24429323, 0.02505454,
0.38344654, 0.09145252, 0.11439045, -0.10801487, -0.01075712,
0.16894275, 0.04799445, 0.3149668 , -0.13885498, 0.02068597,
0.17856079, 0.11587915, -0.11973458, -0.0896498 , 0.11993878,
-0.06647626, -0.15219077, 0.10705566, 0.07842658, 0.31101131,
-0.12788543, 0.09909476, 0.00878725, 0.1618593 , 0.22566552,
0.1297064 , -0.14370884, 0.02069237, 0.08489513, 0.0567583 ,
-0.21860926, 0.01057386, 0.03844477, 0.06213358, -0.12877114,
0.02327059, -0.00917741, 0.11733869, 0.35853127, -0.25572705,
0.30879059, -0.20568153, -0.12405248, 0.03546307, -0.18377842,
-0.06700096, 0.00626029, -0.42848313, -0.13129929, -0.04215423,
0.26977378, -0.07725398, 0.01177794, 0.05952175, 0.21516307,
0.01055368, 0.06727242, 0.05038245, 0.06739338, -0.53844106,
0.02834721, -0.33890292, -0.02644366, 0.03540507, -0.16382404,
-0.01353777, -0.02502321, 0.00226415, -0.24348356, -0.12502551,
0.01489578, -0.37660655, -0.05798845, 0.28748602, -0.18512824,
-0.06250153, -0.06967189, 0.14023623, 0.09628384, -0.09925015,
-0.07317897, -0.14045765, -0.14597888, 0.24456802, 0.173549 ,
0.3357946 , 0.0424754 , 0.00723427, -0.02120454, -0.14892557,
-0.26496273, 0.14844348, 0.06555442, 0.11951103, 0.03691757,
-0.36404395, -0.32292312, 0.09412326, -0.06377046, -0.02561374,
0.24361259, 0.02616721, 0.29151902, 0.1178301 , 0.03284379,
0.20218852, 0.0337379 , -0.14703217, 0.02869225, -0.31447497,
-0.15038867, -0.23353554, 0.41700551, 0.11959957, 0.26917797,
0.04590914, 0.16029988, -0.18795538, -0.01343729, -0.10532234,
-0.02617499, 0.12019841, 0.00673278, -0.0070972 , -0.03176219,
-0.07582191, -0.07277017, 0.09928112, 0.36159652, -0.14404564,
-0.21233276, 0.46463615, 0.01645906, 0.01815237, 0.12149289,
-0.07040837, -0.06278557, -0.29605272, -0.07451538, 0.00487611,
0.00313085, 0.13640559, -0.02045129, -0.05790693, -0.22582445,
-0.10382047, -0.13318184, -0.05160375, 0.01498237, -0.15075362,
0.14116266, -0.36445442, 0.1420894 , 0.11182524, 0.10055254,
0.33450282, -0.08930281, 0.15410167, 0.03961684, 0.06431124,
-0.15608449, -0.1599745 , 0.3780185 , 0.18073064, 0.2190931 ,
-0.16039631, -0.03769958, -0.00069833, 0.06914425, -0.33746576,
0.11075038, 0.11626988, -0.12498619, 0.07928085, 0.0636186 ,
-0.6352759 , -0.10650127, 0.03810085, 0.14585988, -0.01552053,
-0.01488287, 0.04300846, -0.00500007, 0.26444513, -0.03629581,
-0.04127173, -0.23304868, -0.08911316, 0.0029219 , 0.27401808,
0.00279731, -0.04162024, 0.00214672, 0.15316918, -0.14298579,
-0.15343791], dtype=float32)
Metadata
Metadata
Assignees
Labels
usageGeneral usageGeneral usage