Jenny
Qdrant DevRel, ML/NLP/math nerd with yapping skills
Templates by Jenny
Building RAG chatbot for movie recommendations with Qdrant and Open AI
Create a recommendation tool without hallucinations based on RAG with the Qdrant Vector database. This example is based on movie recommendations on the IMDB-top1000 dataset. You can provide your wishes and your "big no's" to the chatbot, for example: "A movie about wizards but not Harry Potter", and get top-3 recommendations. How it works a video with the full design process Upload IMDB-1000 dataset to Qdrant Vector Store, embedding movie descriptions with OpenAI; Set up an AI agent with a chat. This agent will call a workflow tool to get movie recommendations based on a request written in the chat; Create a workflow which calls Qdrant's Recommendation API to retrieve top-3 recommendations of movies based on your positive and negative examples. Set Up Steps You'll need to create a free tier Qdrant Cluster (Qdrant can also be used locally; it's open-sourced) and set up API credentials You'll OpenAI credentials You'll need GitHub credentials & to upload the IMDB Kaggle dataset to your GitHub.
Vector database as a big data analysis tool for AI agents [1/3 anomaly][1/2 KNN]
Vector Database as a Big Data Analysis Tool for AI Agents Workflows from the webinar "Build production-ready AI Agents with Qdrant and n8n". This series of workflows shows how to build big data analysis tools for production-ready AI agents with the help of vector databases. These pipelines are adaptable to any dataset of images, hence, many production use cases. Uploading (image) datasets to Qdrant Set up meta-variables for anomaly detection in Qdrant Anomaly detection tool KNN classifier tool For anomaly detection This is the first pipeline to upload an image dataset to Qdrant. The second pipeline is to set up cluster (class) centres & cluster (class) threshold scores needed for anomaly detection. The third is the anomaly detection tool, which takes any image as input and uses all preparatory work done with Qdrant to detect if it's an anomaly to the uploaded dataset. For KNN (k nearest neighbours) classification This is the first pipeline to upload an image dataset to Qdrant. The second is the KNN classifier tool, which takes any image as input and classifies it on the uploaded to Qdrant dataset. To recreate both You'll have to upload crops and lands datasets from Kaggle to your own Google Storage bucket, and re-create APIs/connections to Qdrant Cloud (you can use Free Tier cluster), Voyage AI API & Google Cloud Storage. [This workflow] Batch Uploading Images Dataset to Qdrant This template imports dataset images from Google Could Storage, creates Voyage AI embeddings for them in batches, and uploads them to Qdrant, also in batches. In this particular template, we work with crops dataset. However, it's analogous to uploading lands dataset, and in general, it's adaptable to any dataset consisting of image URLs (as the following pipelines are). First, check for an existing Qdrant collection to use; otherwise, create it here. Additionally, when creating the collection, we'll create a payload index, which is required for a particular type of Qdrant requests we will use later. Next, import all (dataset) images from Google Cloud Storage but keep only non-tomato-related ones (for anomaly detection testing). Create (per batch) embeddings for all imported images using the Voyage AI multimodal embeddings API. Finally, upload the resulting embeddings and image descriptors to Qdrant via batch upload.
Vector database as a big data analysis tool for AI agents [3/3 - anomaly]
Vector Database as a Big Data Analysis Tool for AI Agents Workflows from the webinar "Build production-ready AI Agents with Qdrant and n8n". This series of workflows shows how to build big data analysis tools for production-ready AI agents with the help of vector databases. These pipelines are adaptable to any dataset of images, hence, many production use cases. Uploading (image) datasets to Qdrant Set up meta-variables for anomaly detection in Qdrant Anomaly detection tool KNN classifier tool For anomaly detection The first pipeline to upload an image dataset to Qdrant. The second pipeline is to set up cluster (class) centres & cluster (class) threshold scores needed for anomaly detection. This is the third pipeline --- the anomaly detection tool, which takes any image as input and uses all preparatory work done with Qdrant to detect if it's an anomaly to the uploaded dataset. For KNN (k nearest neighbours) classification The first pipeline to upload an image dataset to Qdrant. The second is the KNN classifier tool, which takes any image as input and classifies it on the uploaded to Qdrant dataset. To recreate both You'll have to upload crops and lands datasets from Kaggle to your own Google Storage bucket, and re-create APIs/connections to Qdrant Cloud (you can use Free Tier cluster), Voyage AI API & Google Cloud Storage. [This workflow] Anomaly Detection Tool This is the tool that can be used directly for anomalous images (crops) detection. It takes as input (any) image URL and returns a text message telling if whatever this image depicts is anomalous to the crop dataset stored in Qdrant. An Image URL is received via the Execute Workflow Trigger*, which is used to generate embedding vectors using the Voyage AI Embeddings API. The returned vectors are used to query the Qdrant collection to determine if the given crop is known by comparing it to threshold scores of each image class (crop type). If the image scores lower than all thresholds, then the image is considered an anomaly for the dataset.
Index legal documents for hybrid search with Qdrant, OpenAI & BM25
Index Legal Dataset to Qdrant for Hybrid Retrieval *This pipeline is the first part of "Hybrid Search with Qdrant & n8n, Legal AI". The second part, "Hybrid Search with Qdrant & n8n, Legal AI: Retrieval", covers retrieval and simple evaluation.* Overview This pipeline transforms a Q&A legal corpus from Hugging Face (isaacus) into vector representations and indexes them to Qdrant, providing the foundation for running Hybrid Search, combining: Dense vectors (embeddings) for semantic similarity search; Sparse vectors for keyword-based exact search. After running this pipeline, you will have a Qdrant collection with your legal dataset ready for hybrid retrieval on BM25 and dense embeddings: either mxbai-embed-large-v1 or text-embedding-3-small. Options for Embedding Inference This pipeline equips you with two approaches for generating dense vectors: Using Qdrant Cloud Inference, conversion to vectors handled directly in Qdrant; Using external provider, e.g. OpenAI for generating embeddings. Prerequisites A cluster on Qdrant Cloud Paid cluster in the US region if you want to use Qdrant Cloud Inference Free Tier Cluster if using an external provider (here OpenAI) Qdrant Cluster credentials: You'll be guided on how to obtain both the URL and API_KEY from the Qdrant Cloud UI when setting up your cluster; An OpenAI API key (if you’re not using Qdrant’s Cloud Inference); P.S. To ask retrieval in Qdrant-related questions, join the Qdrant Discord. Star Qdrant n8n community node repo <3
Vector database as a big data analysis tool for AI agents [2/3 - anomaly]
Vector Database as a Big Data Analysis Tool for AI Agents Workflows from the webinar "Build production-ready AI Agents with Qdrant and n8n". This series of workflows shows how to build big data analysis tools for production-ready AI agents with the help of vector databases. These pipelines are adaptable to any dataset of images, hence, many production use cases. Uploading (image) datasets to Qdrant Set up meta-variables for anomaly detection in Qdrant Anomaly detection tool KNN classifier tool For anomaly detection The first pipeline to upload an image dataset to Qdrant. This is the second pipeline to set up cluster (class) centres & cluster (class) threshold scores needed for anomaly detection. The third is the anomaly detection tool, which takes any image as input and uses all preparatory work done with Qdrant to detect if it's an anomaly to the uploaded dataset. For KNN (k nearest neighbours) classification The first pipeline to upload an image dataset to Qdrant. The second is the KNN classifier tool, which takes any image as input and classifies it on the uploaded to Qdrant dataset. To recreate both You'll have to upload crops and lands datasets from Kaggle to your own Google Storage bucket, and re-create APIs/connections to Qdrant Cloud (you can use Free Tier cluster), Voyage AI API & Google Cloud Storage. [This workflow] Setting Up Cluster (Class) Centres & Cluster (Class) Threshold Scores for Anomaly Detection Preparatory workflow to set cluster centres and cluster threshold scores so anomalies can be detected based on these thresholds. Here, we're using two approaches to set up these centres: the "distance matrix approach" and the "multimodal embedding model approach".
Evaluate hybrid search for legal question-answering using Qdrant & BM25/mxbai
Evaluate Hybrid Search on Legal Dataset *This is the second part of "Hybrid Search with Qdrant & n8n, Legal AI." The first part, "Indexing", covers preparing and uploading the dataset to Qdrant.* Overview This pipeline demonstrates how to perform Hybrid Search on a Qdrant collection using questions and text chunks (containing answers) from the LegalQAEval dataset (isaacus). On a small subset of questions, it shows: How to set up hybrid retrieval in Qdrant with: BM25-based keyword retrieval; mxbai-embed-large-v1 semantic retrieval; Reciprocal Rank Fusion (RRF), a simple zero-shot fusion of the two searches; How to run a basic evaluation: Calculate hits@1 — the percentage of evaluation questions where the top-1 retrieved text chunk contains the correct answer After running this pipeline, you will have a quality estimate of a simple hybrid retrieval setup. From there, you can reuse Qdrant’s Query Points node to build a legal RAG chatbot. Embedding Inference By default, this pipeline uses Qdrant Cloud Inference to convert questions to embeddings. You can also use an external embedding provider (e.g. OpenAI). In that case, minimally update the pipeline, similar to the adjustments showed in Part 1: Indexing. Prerequisites Completed Part 1 pipeline, "Hybrid Search with Qdrant & n8n, Legal AI: Indexing", and the collection created in it; All the requirements of Part 1 pipeline; Hybrid Search The example here is a basic hybrid query. You can extend/enhance it with: Reranking strategies; Different fusion techniques; Score boosting based on metadata; ... More details: Hybrid Queries in Qdrant. P.S. To ask retrieval in Qdrant-related questions, join the Qdrant Discord. Star Qdrant n8n community node repo <3
Vector database as a big data analysis tool for AI agents [2/2 KNN]
Vector Database as a Big Data Analysis Tool for AI Agents Workflows from the webinar "Build production-ready AI Agents with Qdrant and n8n". This series of workflows shows how to build big data analysis tools for production-ready AI agents with the help of vector databases. These pipelines are adaptable to any dataset of images, hence, many production use cases. Uploading (image) datasets to Qdrant Set up meta-variables for anomaly detection in Qdrant Anomaly detection tool KNN classifier tool For anomaly detection The first pipeline to upload an image dataset to Qdrant. The second pipeline is to set up cluster (class) centres & cluster (class) threshold scores needed for anomaly detection. The third is the anomaly detection tool, which takes any image as input and uses all preparatory work done with Qdrant to detect if it's an anomaly to the uploaded dataset. For KNN (k nearest neighbours) classification The first pipeline to upload an image dataset to Qdrant. This pipeline is the KNN classifier tool, which takes any image as input and classifies it on the uploaded to Qdrant dataset. To recreate both You'll have to upload crops and lands datasets from Kaggle to your own Google Storage bucket, and re-create APIs/connections to Qdrant Cloud (you can use Free Tier cluster), Voyage AI API & Google Cloud Storage. [This workflow] KNN classification tool This tool takes any image URL, and as output, it returns a class of the object on the image based on the image uploaded to the Qdrant dataset (lands). An image URL is received via the Execute Workflow Trigger*, which is then sent to the Voyage AI Multimodal Embeddings API to fetch its embedding. The image's embedding vector is then used to query Qdrant, returning a set of X similar images with pre-labeled classes. Majority voting is done for classes of neighbouring images. A loop is used to resolve scenarios where there is a tie in Majority Voting, and we increase the number of neighbours to retrieve. When the loop finally resolves, the identified class is returned to the calling workflow.