Similarity and Distance Metrics
sim_metrics.Rd
Metrics for measuring relationships between vector embeddings.
Usage
dot_prod(x, y)
cos_sim(x, y)
euc_dist(x, y)
minkowski_dist(x, y, p = 1)
anchored_sim(x, pos, neg)
Arguments
- x
a numeric vector
- y
a numeric vector the same length as x
- p
p-norm used to compute the Minkowski distance
- pos, neg
a pair of numeric vectors the same length as x; the positive and negative ends of the anchored vector
Details
dot_prod
gives the dot product. cos_sim
gives the cosine similarity (i.e.
the dot product of two normalized vectors). euc_dist
gives the Euclidean
distance. anchored_sim
gives the position of x
on the spectrum between two
anchor points, where vectors aligned with pos
are given a score of 1 and those
aligned with neg
are given a score of 0. For more on anchored vectors, see
Data Science for Psychology: Natural Language, Chapter 20.
Note that, for a given set of values of x
, anchored_sim(x, pos, neg)
will
be perfectly correlated with dot_prod(x, pos - neg)
.