|
| 1 | +--- |
| 2 | +title: Quickstart - Vector Search with Java |
| 3 | +description: Learn how to use vector search in Azure DocumentDB with Java. Store and query vector data efficiently in your applications. |
| 4 | +author: diberry |
| 5 | +ms.author: diberry |
| 6 | +ms.reviewer: khelanmodi |
| 7 | +ms.devlang: java |
| 8 | +ms.topic: quickstart-sdk |
| 9 | +ms.date: 02/12/2026 |
| 10 | +ai-usage: ai-assisted |
| 11 | +ms.custom: |
| 12 | + - devx-track-java |
| 13 | + - devx-track-java-ai |
| 14 | + - devx-track-data-ai |
| 15 | +# CustomerIntent: As a developer, I want to learn how to use vector search in Java applications with Azure DocumentDB. |
| 16 | +--- |
| 17 | + |
| 18 | +# Quickstart: Vector search with Java in Azure DocumentDB |
| 19 | + |
| 20 | +Learn to use vector search in Azure DocumentDB with the Java MongoDB driver to store and query vector data efficiently. |
| 21 | + |
| 22 | +This quickstart provides a guided tour of key vector search techniques using a [Java sample app](https://github.com/Azure-Samples/documentdb-samples/tree/main/ai/vector-search-java) on GitHub. |
| 23 | + |
| 24 | +The app uses a sample hotel dataset in a JSON file with pre-calculated vectors from the `text-embedding-3-small` model, though you can also generate the vectors yourself. The hotel data includes hotel names, locations, descriptions, and vector embeddings. |
| 25 | + |
| 26 | +## Prerequisites |
| 27 | + |
| 28 | +[!INCLUDE[Prerequisites - Vector Search Quickstart](includes/prerequisite-quickstart-vector-search-model.md)] |
| 29 | + |
| 30 | +- [Java 21](/java/openjdk/download) or later |
| 31 | + |
| 32 | +- [Maven 3.6](https://maven.apache.org/download.cgi) or later |
| 33 | + |
| 34 | + |
| 35 | +## Create data file with vectors |
| 36 | + |
| 37 | +1. Create a new data directory for the hotels data file: |
| 38 | + |
| 39 | + ```bash |
| 40 | + mkdir data |
| 41 | + ``` |
| 42 | + |
| 43 | +1. Copy the `Hotels_Vector.json` [raw data file with vectors](https://raw.githubusercontent.com/Azure-Samples/documentdb-samples/refs/heads/main/ai/data/Hotels_Vector.json) to your `data` directory. |
| 44 | + |
| 45 | +## Create a Java project |
| 46 | + |
| 47 | +1. Create a new sibling directory for your project, at the same level as the data directory, and open it in Visual Studio Code: |
| 48 | + |
| 49 | + ```bash |
| 50 | + mkdir vector-search-quickstart |
| 51 | + mkdir vector-search-quickstart/src |
| 52 | + code vector-search-quickstart |
| 53 | + ``` |
| 54 | + |
| 55 | +1. Create a `pom.xml` file in the project root with the following content: |
| 56 | + |
| 57 | + :::code language="xml" source="~/documentdb-samples/ai/vector-search-java/pom.xml" ::: |
| 58 | + |
| 59 | + The app uses the following Maven dependencies specified in the `pom.xml`: |
| 60 | + |
| 61 | + - [`mongodb-driver-sync`](https://mvnrepository.com/artifact/org.mongodb/mongodb-driver-sync): Official MongoDB Java driver for database connectivity and operations |
| 62 | + - [`azure-identity`](https://mvnrepository.com/artifact/com.azure/azure-identity): Azure Identity library for passwordless authentication with Microsoft Entra ID |
| 63 | + - [`azure-ai-openai`](https://mvnrepository.com/artifact/com.azure/azure-ai-openai): Azure OpenAI client library to communicate with AI models and create vector embeddings |
| 64 | + - [`jackson-databind`](https://mvnrepository.com/artifact/com.fasterxml.jackson.core/jackson-databind): JSON serialization and deserialization library |
| 65 | + - [`slf4j-nop`](https://mvnrepository.com/artifact/org.slf4j/slf4j-nop): No-operation SLF4J binding to suppress logging output from the MongoDB driver |
| 66 | + |
| 67 | + |
| 68 | +1. Create a `.env` file in your project root for environment variables: |
| 69 | + |
| 70 | + ```ini |
| 71 | + # Azure OpenAI Embedding Settings |
| 72 | + AZURE_OPENAI_EMBEDDING_MODEL=text-embedding-3-small |
| 73 | + AZURE_OPENAI_EMBEDDING_API_VERSION=2023-05-15 |
| 74 | + AZURE_OPENAI_EMBEDDING_ENDPOINT= |
| 75 | + EMBEDDING_SIZE_BATCH=16 |
| 76 | +
|
| 77 | + # Azure DocumentDB configuration |
| 78 | + MONGO_CLUSTER_NAME= |
| 79 | +
|
| 80 | + # Data file |
| 81 | + DATA_FILE_WITH_VECTORS=../data/Hotels_Vector.json |
| 82 | + EMBEDDED_FIELD=DescriptionVector |
| 83 | + EMBEDDING_DIMENSIONS=1536 |
| 84 | + LOAD_SIZE_BATCH=50 |
| 85 | + ``` |
| 86 | + |
| 87 | + Replace the placeholder values in the `.env` file with your own information: |
| 88 | + - `AZURE_OPENAI_EMBEDDING_ENDPOINT`: Your Azure OpenAI resource endpoint URL. |
| 89 | + - `MONGO_CLUSTER_NAME`: Your Azure DocumentDB resource name. |
| 90 | + |
| 91 | +1. Load the environment variables: |
| 92 | + |
| 93 | + ```bash |
| 94 | + set -a && source .env && set +a |
| 95 | + ``` |
| 96 | + |
| 97 | +1. The project structure should look like this: |
| 98 | + |
| 99 | + ```plaintext |
| 100 | + data |
| 101 | + └── Hotels_Vector.json |
| 102 | + vector-search-quickstart |
| 103 | + ├── .env |
| 104 | + ├── pom.xml |
| 105 | + └── src |
| 106 | + ``` |
| 107 | + |
| 108 | +## Add code for vector search |
| 109 | + |
| 110 | +#### [DiskANN](#tab/tab-diskann) |
| 111 | + |
| 112 | +Create a `DiskAnn.java` file in the `src` directory and paste in the following code: |
| 113 | + |
| 114 | +:::code language="java" source="~/documentdb-samples/ai/vector-search-java/src/main/java/com/azure/documentdb/samples/DiskAnn.java" ::: |
| 115 | + |
| 116 | +#### [IVF](#tab/tab-ivf) |
| 117 | + |
| 118 | +Create an `IVF.java` file in the `src` directory and paste in the following code: |
| 119 | + |
| 120 | +:::code language="java" source="~/documentdb-samples/ai/vector-search-java/src/main/java/com/azure/documentdb/samples/IVF.java" ::: |
| 121 | + |
| 122 | +#### [HNSW](#tab/tab-hnsw) |
| 123 | + |
| 124 | +Create an `HNSW.java` file in the `src` directory and paste in the following code: |
| 125 | + |
| 126 | +:::code language="java" source="~/documentdb-samples/ai/vector-search-java/src/main/java/com/azure/documentdb/samples/HNSW.java" ::: |
| 127 | + |
| 128 | +--- |
| 129 | + |
| 130 | +This code performs the following tasks: |
| 131 | + |
| 132 | +- Creates a passwordless connection to Azure DocumentDB using `DefaultAzureCredential` and the MongoDB OIDC mechanism |
| 133 | +- Creates an Azure OpenAI client for generating embeddings |
| 134 | +- Drops and recreates the collection, then loads hotel data from the JSON file in batches |
| 135 | +- Creates standard indexes and a vector index with algorithm-specific options |
| 136 | +- Generates an embedding for a sample query and runs an aggregation search pipeline |
| 137 | +- Prints the top five matching hotels with similarity scores |
| 138 | + |
| 139 | +## Authenticate to Azure |
| 140 | + |
| 141 | +Sign in to Azure before you run the application so it can access Azure resources securely. |
| 142 | + |
| 143 | +> [!NOTE] |
| 144 | +> Ensure you're signed-in identity has the required data plane roles on both the Azure DocumentDB account and the Azure OpenAI resource. |
| 145 | +
|
| 146 | +```bash |
| 147 | +az login |
| 148 | +``` |
| 149 | +
|
| 150 | +## Build the application |
| 151 | +
|
| 152 | +Compile the application: |
| 153 | +
|
| 154 | +```bash |
| 155 | +mvn clean compile |
| 156 | +``` |
| 157 | +
|
| 158 | +#### [DiskANN](#tab/tab-diskann) |
| 159 | +
|
| 160 | +Run DiskANN (Disk-based Approximate Nearest Neighbor) search: |
| 161 | +
|
| 162 | +```bash |
| 163 | +mvn exec:java -Dexec.mainClass="com.azure.documentdb.samples.DiskAnn" |
| 164 | +``` |
| 165 | +
|
| 166 | +DiskANN is optimized for large datasets that don't fit in memory, efficient disk-based storage, and a good balance of speed and accuracy. |
| 167 | + |
| 168 | +Example output: |
| 169 | + |
| 170 | +:::code language="output" source="~/documentdb-samples/ai/vector-search-java/output/diskann.txt" ::: |
| 171 | + |
| 172 | +#### [IVF](#tab/tab-ivf) |
| 173 | + |
| 174 | +Run IVF (Inverted File) search: |
| 175 | + |
| 176 | +```bash |
| 177 | +mvn exec:java -Dexec.mainClass="com.azure.documentdb.samples.IVF" |
| 178 | +``` |
| 179 | + |
| 180 | +IVF clusters vectors by similarity and provides fast search through cluster centroids. It offers configurable accuracy vs speed trade-offs for large vector datasets. |
| 181 | + |
| 182 | +Example output: |
| 183 | + |
| 184 | +:::code language="output" source="~/documentdb-samples/ai/vector-search-java/output/ivf.txt" ::: |
| 185 | + |
| 186 | +#### [HNSW](#tab/tab-hnsw) |
| 187 | + |
| 188 | +Run HNSW (Hierarchical Navigable Small World) search: |
| 189 | + |
| 190 | +```bash |
| 191 | +mvn exec:java -Dexec.mainClass="com.azure.documentdb.samples.HNSW" |
| 192 | +``` |
| 193 | + |
| 194 | +HNSW provides excellent search performance with high recall rates using a hierarchical graph structure, making it suitable for real-time applications. |
| 195 | + |
| 196 | +Example output: |
| 197 | + |
| 198 | +:::code language="output" source="~/documentdb-samples/ai/vector-search-java/output/hnsw.txt" ::: |
| 199 | + |
| 200 | +--- |
| 201 | + |
| 202 | +## View and manage data in Visual Studio Code |
| 203 | + |
| 204 | +1. Install the [DocumentDB extension](https://marketplace.visualstudio.com/items?itemName=ms-azuretools.vscode-documentdb) and [Extension Pack for Java](https://marketplace.visualstudio.com/items?itemName=vscjava.vscode-java-pack) in Visual Studio Code. |
| 205 | +1. Connect to your Azure DocumentDB account using the DocumentDB extension. |
| 206 | +1. View the data and indexes in the Hotels database. |
| 207 | + |
| 208 | + :::image type="content" source="./media/quickstart-nodejs-vector-search/visual-studio-code-documentdb.png" lightbox="./media/quickstart-nodejs-vector-search/visual-studio-code-documentdb.png" alt-text="Screenshot of DocumentDB extension showing the DocumentDB collection."::: |
| 209 | + |
| 210 | +## Clean up resources |
| 211 | + |
| 212 | +Delete the resource group, Azure DocumentDB cluster, and Azure OpenAI resource when you no longer need them to avoid unnecessary costs. |
| 213 | + |
| 214 | +## Related content |
| 215 | + |
| 216 | +- [Vector store in Azure DocumentDB](vector-search.md) |
| 217 | +- [Support for geospatial queries](geospatial-support.md) |
0 commit comments