Handling high cardinality categorical variables with embeddings

This talk intends to highlight the applications / pitfalls of using categorical embeddings (from deep learning or matrix factorization) in general and in gradient boosting models; which is a kind of transfer learning. Word vectors are commonplace, so this talk focuses only general categorical variables (with high cardinality). This is a mixture of some theory and practice in R, and my experience of generating such embeddings

Burghard Tamás
Data scientist, Ingatlan.com Zrt.

Tamas originally studied computer sciences and mathematics at the University of Szeged (Hungary), but ended up as an economist. 15 years later he boosted his career with a Business Analytics degree at CEU (Hungary/Usa). He joined to ingatlan.com in 2019 as a data scientist where he focuses on machine learning and evangelizing data science.

Handling high cardinality categorical variables with embeddings

Aktuális