This document explains how to create text embeddings with Vertex AI´s text embeddings API and how to use them in retrieval and vector search workflows. The service produces dense vectors that capture meaning rather than direct word matches, which makes them useful for semantic search, question answering, and similarity ranking. Vectors are normalized, so cosine similarity, dot product, or Euclidean distance all provide consistent similarity rankings.
Vertex AI supports several embedding models. The flagship model is ´gemini-embedding-001´, which produces up to 3072-dimensional vectors and is designed for state-of-the-art performance across english, multilingual, and code tasks. Two smaller models, ´text-embedding-005´ and ´text-multilingual-embedding-002´, produce up to 768-dimensional vectors and specialize in english and multilingual tasks respectively. Note that ´gemini-embedding-001´ supports one instance per request. Also note the service banner: starting april 29, 2025, gemini 1.5 pro and gemini 1.5 flash models are not available to projects with no prior usage of those models, including new projects.
The API enforces request limits to protect reliability and performance. Each call can include up to 250 input texts and the overall input token cap is 20,000 tokens; exceeding that returns a 400 error. Individual input texts are limited to 2048 tokens and are silently truncated by default, although users can disable silent truncation by setting ´autoTruncate´ to false. Developers can also reduce storage and compute costs by specifying ´output_dimensionality´ to produce smaller embedding vectors; smaller vectors often retain much of the utility while saving space.
Practical integration is demonstrated with the python genai SDK, including example environment variables and a sample embed_content call that requests embeddings for multiple strings and optional metadata like title and task type. After generating embeddings you can persist them in a vector database such as Vertex AI Vector Search for low-latency retrieval as your dataset grows. The documentation also links to deeper resources, including model reference pages, supported languages, rate limits, batch prediction guides, tuning tips, and the research behind the embeddings models.
