Get text embeddings on Vertex AI

Use Vertex AI´s text embeddings API to generate dense vector representations for semantic search, retrieval, and other Artificial Intelligence tasks; supports gemini-embedding-001 and smaller embedding models.

This document explains how to create text embeddings with Vertex AI´s text embeddings API and how to use them in retrieval and vector search workflows. The service produces dense vectors that capture meaning rather than direct word matches, which makes them useful for semantic search, question answering, and similarity ranking. Vectors are normalized, so cosine similarity, dot product, or Euclidean distance all provide consistent similarity rankings.

Vertex AI supports several embedding models. The flagship model is ´gemini-embedding-001´, which produces up to 3072-dimensional vectors and is designed for state-of-the-art performance across english, multilingual, and code tasks. Two smaller models, ´text-embedding-005´ and ´text-multilingual-embedding-002´, produce up to 768-dimensional vectors and specialize in english and multilingual tasks respectively. Note that ´gemini-embedding-001´ supports one instance per request. Also note the service banner: starting april 29, 2025, gemini 1.5 pro and gemini 1.5 flash models are not available to projects with no prior usage of those models, including new projects.

The API enforces request limits to protect reliability and performance. Each call can include up to 250 input texts and the overall input token cap is 20,000 tokens; exceeding that returns a 400 error. Individual input texts are limited to 2048 tokens and are silently truncated by default, although users can disable silent truncation by setting ´autoTruncate´ to false. Developers can also reduce storage and compute costs by specifying ´output_dimensionality´ to produce smaller embedding vectors; smaller vectors often retain much of the utility while saving space.

Practical integration is demonstrated with the python genai SDK, including example environment variables and a sample embed_content call that requests embeddings for multiple strings and optional metadata like title and task type. After generating embeddings you can persist them in a vector database such as Vertex AI Vector Search for low-latency retrieval as your dataset grows. The documentation also links to deeper resources, including model reference pages, supported languages, rate limits, batch prediction guides, tuning tips, and the research behind the embeddings models.

60

Impact Score

Saudi Artificial Intelligence startup launches Arabic LLM

Misraj Artificial Intelligence unveiled Kawn, an Arabic large language model, at AWS re:Invent and launched Workforces, a platform for creating and managing Artificial Intelligence agents for enterprises and public institutions.

Introducing Mistral 3: open artificial intelligence models

Mistral 3 is a family of open, multimodal and multilingual Artificial Intelligence models that includes three Ministral edge models and a sparse Mistral Large 3 trained with 41B active and 675B total parameters, released under the Apache 2.0 license.

NVIDIA and Mistral Artificial Intelligence partner to accelerate new family of open models

NVIDIA and Mistral Artificial Intelligence announced a partnership to optimize the Mistral 3 family of open-source multilingual, multimodal models across NVIDIA supercomputing and edge platforms. The collaboration highlights Mistral Large 3, a mixture-of-experts model designed to improve efficiency and accuracy for enterprise artificial intelligence deployments starting Tuesday, Dec. 2.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.