Finding The Distance Between 'doctag' And 'infer_vector' With Gensim Doc2vec?
Using Gensim's Doc2Vec how would I find the distance between a Doctag and an infer_vector()? Many thanks
Solution 1:
Doctag
is the internal name for the keys to doc-vectors. The result of an infer_vector()
operation is a vector. So as you've literally asked, these aren't comparable.
You could ask a model for a known doc-vector, by its doc-tag key that was supplied during training, via model.docvecs[doctag]
. That would be comparable to the result of an infer_vector()
call.
With two vectors in hand, you can use scipy
routines to calculate various kinds of distance. For example:
import scipy.spatial.distance.cosine as cosine_distancevec_by_doctag= model.docvecs["doc0007"]
vec_by_inference = model.infer_vector(['a', 'cat', 'was', 'in', 'a', 'hat'])
dist = cosine_distance(vec_by_doctag, vec_by_inference)
You can also look at how gensim's Doc2VecKeyedVectors
does similarity/distance between vectors that are known (by their doctag key names) inside a model, in its similarity()
and distance()
functions, at:
Post a Comment for "Finding The Distance Between 'doctag' And 'infer_vector' With Gensim Doc2vec?"