Skip to content
mentorship

concepts

Knowledge-graph embeddings

How to learn vector representations of entities and relations so that link prediction becomes a geometric operation. Covers TransE, DistMult, ComplEx, and RotatE — and why the scoring function determines which relation patterns a model can express.

Reviewed · 4 min read

One-line definition

Knowledge-graph embeddings map entities and relations of a graph of (head, relation, tail) triples into a continuous vector space, with a scoring function that ranks true triples above false ones — turning link prediction into a geometric / algebraic operation.

Why it matters

Knowledge graphs (entities like titles, people, genres, products linked by typed relations) power recommendation, search, and question answering. Embedding them lets you predict missing links (“which genre is this new title?”), compute entity similarity, and inject structured side-information into recsys and RAG. The interview angle is sharp: the choice of scoring function determines which relation patterns (symmetry, antisymmetry, inversion, composition) the model can represent — a clean test of representational reasoning.

The task: knowledge-graph completion

A KG is a set of triples — e.g. (Inception, directed_by, Nolan). Graphs are radically incomplete, so the goal is link prediction: score candidate triples and rank the true tail (or head) highly. Trained with a margin / ranking loss against negative samples (corrupt or ), evaluated with Mean Reciprocal Rank (MRR) and Hits@k.

The four models to know

TransE — translation

Model the relation as a translation in embedding space:

so a true triple satisfies . Simple, scalable, intuitive. Limitation: it cannot model symmetric relations (would force ) or one-to-many / many-to-one relations (many valid tails collapse to one point).

DistMult — bilinear diagonal

Efficient, captures pairwise feature interactions. Limitation: the score is symmetric in and , so it cannot distinguish from — useless for antisymmetric relations like parent_of.

ComplEx — complex bilinear

Move embeddings into and use the Hermitian product:

The complex conjugate breaks the symmetry, so ComplEx handles symmetric and antisymmetric relations — a strict generalization of DistMult.

RotatE — rotation in complex space

each relation is an element-wise rotation (unit-modulus complex multiply). Rotations compose and invert, so RotatE can express symmetry, antisymmetry, inversion, and composition — the most expressive of the four on relation patterns.

Which patterns each model expresses

ModelSpaceSymmetryAntisymmetryInversionComposition
TransE
DistMult
ComplEx
RotatE

This table is the interview answer: pick the model by which relation patterns your graph contains.

Where this fits in recsys / RAG

  • Recsys side-information: embed a catalog KG (titles, actors, genres) and concatenate entity embeddings with user/item collaborative-filtering vectors to fight cold-start and add semantics.
  • Beyond shallow embeddings: R-GCN and other relational GNNs generalize these scoring functions with message passing; node2vec / metapath2vec learn embeddings from random walks.
  • KG + LLM: structured triples ground LLM answers and constrain RAG retrieval.

What an interviewer expects you to say

  1. Frame the task as link prediction over (h, r, t) triples, trained with negative sampling and a ranking loss, evaluated with MRR / Hits@k.
  2. Give TransE () and immediately name its failure on symmetric and 1-to-many relations.
  3. Explain that DistMult is symmetric (can’t do antisymmetry), ComplEx fixes it via complex conjugation, and RotatE models relations as rotations to also capture composition.
  4. Tie model choice to relation patterns in the data.
  5. Bonus: connect to GNNs (R-GCN) and to recsys cold-start / RAG grounding.

Common confusions

  • “More dimensions is the main lever.” The scoring function’s inductive bias matters more than dimensionality — DistMult literally cannot represent antisymmetry at any width.
  • “TransE handles any relation.” It breaks on symmetric and many-to-one relations by construction.
  • “These are just word embeddings.” They jointly embed entities and typed relations with a relation-specific operator, not a single similarity space.
  • “Link prediction is classification.” It’s a ranking problem over corrupted negatives; accuracy is the wrong metric, MRR/Hits@k are standard.
  • “KG embeddings replaced GNNs.” They’re the shallow end; relational GNNs add message passing and usually win when neighborhood structure is rich.

Related: Graph neural networks, Word embeddings, Negative sampling strategies, Matrix factorization for recsys, Content-based filtering.