Also, Foundational courses: Embeddings on Google Machine Learning Crush Course and Meet AI’s multitool: Vector embeddings by Dale Markowitz are great materials to learn more about embeddings.
LLM text embedding business use cases
With the embedding API, you can apply the innovation of embeddings, combined with the LLM capability, to various text processing tasks, such as:
LLM-enabled Semantic Search: text embeddings can be used to represent both the meaning and intent of a user’s query and documents in the embedding space. Documents that have similar meaning to the user’s query intent will be found fast with vector search technology. The model is capable of generating text embeddings that capture the subtle nuances of each sentence and paragraphs in the document.
LLM-enabled Text Classification: LLM text embeddings can be used for text classification with a deep understanding of different contexts without any training or fine-tuning (so-called zero-shot learning). This wasn’t possible with the past language models without task-specific training.
LLM-enabled Recommendation: The text embedding can be used for recommendation systems as a strong feature for training recommendation models such as Two-Tower model. The model learns the relationship between the query and candidate embeddings, resulting in next-gen user experience with semantic product recommendation.
LLM-enabled Clustering, Anomaly Detection, Sentiment Analysis, and more, can be also handled with the LLM-level deep semantics understanding.
Sorting 8 million texts at “librarian-level” precision
Vertex AI Embeddings for Text has an embedding space with 768 dimensions. As explained in the video above, the space represents a huge map of a wide variety of texts in the world, organized by their meanings. With each input text, the model can find a location (embedding) in the map.
The API can take 3,072 input tokens, so it can digest the overall meaning of a long text and even programming code, and represent it as single embedding. It is like having a librarian knowledgeable about a wide variety of industries, reading through millions of texts carefully, and sorting them with millions of nano-categories that can classify even slight differences of subtle nuances.
By visualizing the embedding space, you can actually observe how the model sorts the texts at the “librarian-level” precision. Nomic AI provides a platform called Atlas for storing, visualizing and interacting with embedding spaces with high scalability and in a smooth UI, and they worked with Google for visualizing the embedding space of the 8 million Stack Overflow questions. You can try exploring around the space, zooming in and out to each data point on your browser on this page, courtesy of Nomic AI.
Read more here: Source link