Skip to main content

Create vectors from your data

In order to search your data, you will need to create embeddings for your documents. Embeddings are vectors that represent your documents in a vector space. The vector space is a multi-dimensional space where each dimension represents a feature of your documents.

Many services offer pre-trained models that you can use to create embeddings for your documents.

Using OpenAI​

OpenAI is a company that develops AI models for natural language processing. They offer a free API that you can use to create embeddings for your documents. The API is called OpenAI's Embedding API.

To get some embeddings using their API, you need to create an account and get an API key.

Once you have an API key, you can use the following code to get embeddings from a text field.

const getTermEmbeddings = async (text) => {
const url = "https://api.openai.com/v1/embeddings";

// Call OpenAI API to get the embeddings.
let response = await fetch(url, {
method: "POST",
headers: {
Authorization: `Bearer ${OPENAI_API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
input: text,
model: "text-embedding-ada-002",
})
}).then((res) => res.json());

return response.data[0].embedding;
};

You can see the code in action in the server/embeddings/openai.mjs file in your CodeSandbox.

info

You would need to run this function on the fields you want to vectorize for each of your documents. For your convenience, the sample_mflix.embedded_movies collection already has the embeddings created.

To search your data, you will need to also vectorize your query. You can use the same function to vectorize your query as well.

Once you have a vector, you will be able to use Vector Search to find the most similar documents to your query.

Configuring the application​

Remember those two environment variables we created in the previous chapter? Now it's time to use them.

The first one is EMBEDDINGS_SOURCE. It tells the application where to get the embeddings from. You can set it to either openai or serverlessEndpoint.

If you already have an OpenAI API key, you can set the value to openai. You can then set the EMBEDDING_KEY variable to your API key.

If you don't have an OpenAI API key, you can use the serverlessEndpoint value. This will tell the application to use the serverless function we created for you.

tip

You should probably use EMBEDDINGS_SOURCE=serverlessEndpoint for now. This will allow you to get started with the application without having to create an OpenAI account.