
Similarity and Distance Metrics
sim_metrics.RdMetrics for measuring relationships between vector embeddings.
Usage
dot_prod(x, y)
cos_sim(x, y)
euc_dist(x, y)
minkowski_dist(x, y, p = 1)
anchored_sim(x, pos, neg)Arguments
- x
a numeric vector
- y
a numeric vector the same length as x
- p
p-norm used to compute the Minkowski distance
- pos, neg
a pair of numeric vectors the same length as x; the positive and negative ends of the anchored vector
Details
dot_prod gives the dot product. cos_sim gives the cosine similarity (i.e.
the dot product of two normalized vectors). euc_dist gives the Euclidean
distance. anchored_sim gives the position of x on the spectrum between two
anchor points, where vectors aligned with pos are given a score of 1 and those
aligned with neg are given a score of 0. For more on anchored vectors, see
Data Science for Psychology: Natural Language, Chapter 20.
Note that, for a given set of values of x, anchored_sim(x, pos, neg) will
be perfectly correlated with dot_prod(x, pos - neg).