Card Creation

Since Vault is an abstraction above Qdrant, it has multiple preprocessing steps before a card has an embedding created for it and is added to the database. On this page, we will look at how Vault parses and processes cards.

Step 1: Parsing

Oftentimes, users don't create their cards by hand but rather upload a Word doc with a list of cards. Vault supports parsing of Word documents, and will automatically convert the Word doc to HTML, extract the text from the document and create cards for each paragraph. However, if a user creates a card directly from the UI, this step is skipped.

Step 2: Collision Detection

After parsing, Vault checks if the card already exists in the database. If it does, it will be inserted into the Postgres database without creating a Qdrant record as an embedding with the meaning of the card already exists within Qdrant.

Check 1: Text-based collision detection

The first check is text-based collision detection using Postgres trigrams. If Postgres returns a similarity score above 0.9, the card is considered a duplicate. This is to ensure that we aren't wasting OpenAI credits on cards that are almost exactly in the database. If it does, the content and HTML formatting of the card will be inserted into Postgres without creating a Qdrant record.

Check 2: Embedding-based collision detection

The second check is embedding-based collision detection. If the card passes the first check, we check if its meaning is in the database by generating an embedding for it. We then query Qdrant to check for similar vectors, and if a vector is found that has a similarity score above 0.95 or 0.92 if the content is less than 200 words, the card is considered a duplicate. If it is a duplicate, the content and HTML formatting of the card will be inserted into Postgres without creating a Qdrant record.

Step 3: Insertion into Qdrant

If the card passes both checks, it will be inserted into Qdrant, using the embedding we generated during the checks, and the content and HTML formatting of the card will be inserted into Postgres.