Embedding (machine learning)

Embedding in machine learning refers to a representation learning technique that maps complex, high-dimensional data into a lower-dimensional vector space of numerical vectors.^[1]

Technique

It also denotes the resulting representation, where meaningful patterns or relationships are preserved. As a technique, it learns these vectors from data like words, images, or user interactions, differing from manually designed methods such as one-hot encoding.^[2] This process reduces complexity and captures key features without needing prior knowledge of the domain.

Similarity

In natural language processing, words or concepts may be represented as feature vectors, where similar concepts are mapped to nearby vectors. The resulting embeddings vary by type, including word embeddings for text (e.g., Word2Vec), image embeddings for visual data, and knowledge graph embeddings for knowledge graphs, each tailored to tasks like NLP, computer vision, or recommendation systems.^[3] This dual role enhances model efficiency and accuracy by automating feature extraction and revealing latent similarities across diverse applications.

To measure the distance between two embeddings, a similarity measure can be used to find the overall similarity of the concepts represented by the embeddings. If the vectors are normalized to have a magnitude of 1, then the similarity measures are proportional to $\cos \left(\theta _{ab}\right)$ .^[4]

Similarity Measures
Name	Meaning	Formula	Formula (Scalar)	Similarity Correlation
Euclidean Distance	Distance between ends of vectors	$\|a-b\|$	${\sqrt {\sum \left(a_{n}-b_{n}\right)^{2}}}$	Negative Correlation
Cosine Similarity	Cosine of angle $\theta$ between vectors	${\frac {a\cdot b}{\|a\|\|b\|}}$	${\frac {\sum a_{n}b_{n}}{\sqrt {\left(\sum a_{n}^{2}\right)\left(\sum b_{n}^{2}\right)}}}$	Positive Correlation
Dot Product	Cosine similarity multiplied by the lengths of both vectors	$a\cdot b$	$\sum a_{n}b_{n}$	Positive Correlation

The cosine similarity disregards the magnitude of the vector when determining similarity, so it is less biased towards training data that appears very frequently. The dot product includes the magnitude inherently, so it will tend to value more popular data.^[4] Generally, for high-dimensional vector spaces, vectors tend to converge in distance, so Euclidean distance becomes less reliable for large embedding vectors.^[5]

References

^ Bengio, Yoshua; Ducharme, Réjean; Vincent, Pascal (2003). "A Neural Probabilistic Language Model". Journal of Machine Learning Research. 3: 1137–1155.
^ Mikolov, Tomas; Chen, Kai; Corrado, Greg; Dean, Jeffrey (2013). Efficient Estimation of Word Representations in Vector Space. International Conference on Learning Representations (ICLR).
^ "What are Embedding in Machine Learning?". GeeksforGeeks. 2024-02-15. Retrieved 2025-02-28.
^ ^a ^b "Measuring similarity from embeddings". Google Machine Learning Education. Retrieved 21 September 2025.
^ Krantz, Tom; Jonker, Alexandra. "What is cosine similarity?". IBM Think. Retrieved 21 September 2025.

[Bengio2003-1] Bengio, Yoshua; Ducharme, Réjean; Vincent, Pascal (2003). "A Neural Probabilistic Language Model". Journal of Machine Learning Research. 3: 1137–1155.

[Mikolov2013-2] Mikolov, Tomas; Chen, Kai; Corrado, Greg; Dean, Jeffrey (2013). Efficient Estimation of Word Representations in Vector Space. International Conference on Learning Representations (ICLR).

[3] "What are Embedding in Machine Learning?". GeeksforGeeks. 2024-02-15. Retrieved 2025-02-28.

[googleml-4] "Measuring similarity from embeddings". Google Machine Learning Education. Retrieved 21 September 2025.

[5] Krantz, Tom; Jonker, Alexandra. "What is cosine similarity?". IBM Think. Retrieved 21 September 2025.

[1]

[2]

[3]

[4]

[5]

Embedding (machine learning)

Technique

Similarity

See also

References

Portal di Ensiklopedia Dunia