My solution for the "Agentic Arena Community Contest" (RAG, Qdrant, Mistral OCR)
π€π This workflow is my personal solution for the Agentic Arena Community Contest, where the goal is to build a Retrieval-Augmented Generation (RAG) AI agent capable of answering questions based on a provided PDF knowledge base.
Key Advantages
-
β End-to-End RAG Implementation Fully automates the ingestion, processing, and retrieval of knowledge from PDFs into a vector database.
-
β Accuracy through Multi-Layered Retrieval Combines embeddings, Qdrant search, and Cohere reranking to ensure the agent retrieves the most relevant policy information.
-
β Robust Evaluation System Includes an automated correctness evaluation pipeline powered by GPT-4.1 as a judge, ensuring transparent scoring and continuous improvement.
-
β Citation-Driven Compliance The AI agent is instructed to provide citations for every answer, making it suitable for high-stakes use cases like policy compliance.
-
β Scalability and Modularity Can easily integrate with different data sources (Google Drive, APIs, other storage systems) and be extended to new use cases.
-
β Seamless Collaboration with Google Sheets Both the evaluation set and the results are integrated with Google Sheets, enabling easy monitoring, iteration, and reporting.
-
β Cloud and Self-Hosted Flexibility Works with self-hosted Qdrant on Hetzner, Mistral Cloud for OCR, and OpenAI/Cohere APIs, combining local control with powerful cloud AI services.
How it Works
-
Knowledge Base Ingestion (The "Setup" Execution):
- When started manually, the workflow first clears an existing Qdrant vector database collection.
- It then searches a specified Google Drive folder for PDF files. For each PDF found, it performs the following steps:
- Uploads the file to the Mistral AI API.
- Processes the PDF using Mistral's OCR service to extract text and convert it into a structured markdown format.
- Splits the text into manageable chunks.
- Generates embeddings for each text chunk using OpenAI's model.
- Stores the embeddings in the Qdrant vector store, creating a searchable knowledge base.
-
Agent Evaluation (The "Testing" Execution):
- The workflow is triggered by an evaluation Google Sheet containing questions and correct answers.
- For each question, the core AI Agent is activated. This agent:
- Uses the RAG tool to search the pre-populated Qdrant vector store for relevant information from the PDFs.
- Employs a Cohere reranker to refine the search results for the highest quality context.
- Leverages a GPT-4.1 model to generate an answer based strictly on the retrieved context.
- The agent's answer is then passed to an "LLM as a Judge" (another GPT-4.1 instance), which compares it to the ground truth answer from the evaluation sheet.
- The judge provides a detailed score (1-5) based on factual correctness and citation accuracy.
- Finally, both the agent's answer and the correctness score are saved back to a Google Sheet for review.
Set up Steps
To implement this solution, you need to configure the following components and credentials:
-
Configure Core AI Services:
- OpenAI API Credentials: Required for the main AI agent, the judge LLM, and generating embeddings.
- Mistral AI API Credentials: Necessary for the OCR service that processes PDF files.
- Cohere API Credentials: Used for the reranker node that improves retrieval quality.
- Google Service Accounts: Set up OAuth for Google Sheets (to read questions and save results) and Google Drive (to access the PDF source files).
-
Set up the Vector Database (Qdrant):
- This workflow uses a self-hosted Qdrant instance. You must deploy and configure your own Qdrant server.
- Update the Qdrant Vector Store and RAG nodes with the correct API endpoint URL and credentials for your Qdrant instance.
- Ensure the collection name (
agentic-arena) is created or matches your setup.
-
Connect Data Sources:
- PDF Source: In the "Search PDFs" node, update the
folderIdparameter to point to your own Google Drive folder containing the contest PDFs. - Evaluation Sheet: In the "Eval Set" node, update the
documentIdto point to your own copy of the evaluation Google Sheet containing the test questions and answers. - Results Sheet: In the "Save Eval" node, update the
documentIdto point to the Google Sheet where you want to save the evaluation results.
- PDF Source: In the "Search PDFs" node, update the
Need help customizing?
Contact me for consulting and support or add me on Linkedin.
n8n Workflow: Agentic RAG with Qdrant, Mistral, and OCR for Community Contest
This n8n workflow demonstrates an advanced Retrieval-Augmented Generation (RAG) system, designed for the Agentic Arena Community Contest. It leverages Qdrant for vector storage, Mistral (via OpenAI Chat Model) for language generation, and OCR capabilities to process documents, providing a robust solution for querying information from various sources.
What it does
This workflow orchestrates a sophisticated RAG pipeline, enabling intelligent querying of document content.
- Trigger: The workflow can be initiated manually or as a sub-workflow call, making it flexible for integration into larger systems or for standalone testing.
- Document Loading: It's designed to process documents, likely from Google Drive, using a "Default Data Loader" which implies OCR capabilities to extract text from various file types (e.g., PDFs, images).
- Text Splitting: The extracted document text is then split into manageable chunks using a "Character Text Splitter" to prepare it for embedding.
- Embedding Generation: "Embeddings OpenAI" is used to convert these text chunks into vector embeddings, which capture the semantic meaning of the text.
- Vector Storage: These embeddings are stored and managed in a "Qdrant Vector Store," a high-performance vector database, enabling efficient similarity searches.
- Query Processing (AI Agent): An "AI Agent" (likely configured with a ReAct or Plan-and-Execute approach) handles incoming queries. It uses a "Simple Memory" to maintain conversational context.
- Language Model: The "OpenAI Chat Model" (acting as the Mistral model through an OpenAI-compatible API) is used by the AI Agent for generating responses based on retrieved information.
- Reranking (Optional but present): A "Reranker Cohere" node suggests the capability to re-rank retrieved documents, improving the relevance of information passed to the language model.
- Evaluation: The workflow includes "Evaluation Trigger" and "Evaluation" nodes, indicating it's set up for testing and measuring the performance of the RAG system.
- HTTP Request: An "HTTP Request" node is present, suggesting external API calls are made, possibly to an OCR service, a custom Mistral API endpoint, or other external tools.
- Data Manipulation: "Edit Fields (Set)" and "Code" nodes are used for data transformation and custom logic within the workflow.
- Looping: A "Loop Over Items (Split in Batches)" node allows for processing multiple documents or queries in batches.
- Conditional Logic: A "Filter" node enables conditional branching based on specific criteria.
- Delay: A "Wait" node can introduce pauses in the workflow, useful for rate limiting or waiting for external processes.
Prerequisites/Requirements
To run this workflow, you will need:
- n8n Instance: A running n8n instance.
- OpenAI API Key: For the "Embeddings OpenAI" and "OpenAI Chat Model" nodes. This is likely used to access Mistral via an OpenAI-compatible API endpoint.
- Qdrant Instance: Access to a Qdrant vector database instance.
- Google Drive Account: If documents are sourced from Google Drive.
- Cohere API Key: If the "Reranker Cohere" node is actively used.
- OCR Service: The "Default Data Loader" implies an underlying OCR capability, which might require an API key or configuration for a specific OCR service (e.g., Google Cloud Vision, Mindee, etc., though not explicitly configured in this JSON).
- Custom API Endpoints: Depending on the "HTTP Request" node's configuration, additional API keys or access to custom services might be required.
Setup/Usage
- Import the Workflow: Import the provided JSON into your n8n instance.
- Configure Credentials:
- Set up your OpenAI API key credentials for the "Embeddings OpenAI" and "OpenAI Chat Model" nodes.
- Configure your Qdrant credentials for the "Qdrant Vector Store" node.
- If using Google Drive, set up your Google Drive credentials.
- If using Cohere, configure your Cohere API key.
- Ensure any necessary OCR service credentials are set up if the "Default Data Loader" relies on an external service.
- Customize Nodes:
- Adjust the "Google Drive" node to point to your specific document folders or files.
- Modify the "Character Text Splitter" parameters (e.g., chunk size, overlap) as needed for your documents.
- Configure the "Qdrant Vector Store" with your desired collection name and other settings.
- Review and adjust the "AI Agent" and "OpenAI Chat Model" settings, including the model name (e.g., for Mistral), temperature, and system prompts.
- If using, configure the "Reranker Cohere" according to your needs.
- Examine the "HTTP Request" and "Code" nodes for any custom logic or external API calls that require specific configuration.
- Activate and Test:
- Activate the workflow.
- Use the "Manual Trigger" to test the workflow with sample inputs, or integrate it as a sub-workflow into your main application.
- Utilize the "Evaluation Trigger" and "Evaluation" nodes to assess the RAG system's performance.
Related Templates
Automate Dutch Public Procurement Data Collection with TenderNed
TenderNed Public Procurement What This Workflow Does This workflow automates the collection of public procurement data from TenderNed (the official Dutch tender platform). It: Fetches the latest tender publications from the TenderNed API Retrieves detailed information in both XML and JSON formats for each tender Parses and extracts key information like organization names, titles, descriptions, and reference numbers Filters results based on your custom criteria Stores the data in a database for easy querying and analysis Setup Instructions This template comes with sticky notes providing step-by-step instructions in Dutch and various query options you can customize. Prerequisites TenderNed API Access - Register at TenderNed for API credentials Configuration Steps Set up TenderNed credentials: Add HTTP Basic Auth credentials with your TenderNed API username and password Apply these credentials to the three HTTP Request nodes: "Tenderned Publicaties" "Haal XML Details" "Haal JSON Details" Customize filters: Modify the "Filter op ..." node to match your specific requirements Examples: specific organizations, contract values, regions, etc. How It Works Step 1: Trigger The workflow can be triggered either manually for testing or automatically on a daily schedule. Step 2: Fetch Publications Makes an API call to TenderNed to retrieve a list of recent publications (up to 100 per request). Step 3: Process & Split Extracts the tender array from the response and splits it into individual items for processing. Step 4: Fetch Details For each tender, the workflow makes two parallel API calls: XML endpoint - Retrieves the complete tender documentation in XML format JSON endpoint - Fetches metadata including reference numbers and keywords Step 5: Parse & Merge Parses the XML data and merges it with the JSON metadata and batch information into a single data structure. Step 6: Extract Fields Maps the raw API data to clean, structured fields including: Publication ID and date Organization name Tender title and description Reference numbers (kenmerk, TED number) Step 7: Filter Applies your custom filter criteria to focus on relevant tenders only. Step 8: Store Inserts the processed data into your database for storage and future analysis. Customization Tips Modify API Parameters In the "Tenderned Publicaties" node, you can adjust: offset: Starting position for pagination size: Number of results per request (max 100) Add query parameters for date ranges, status filters, etc. Add More Fields Extend the "Splits Alle Velden" node to extract additional fields from the XML/JSON data, such as: Contract value estimates Deadline dates CPV codes (procurement classification) Contact information Integrate Notifications Add a Slack, Email, or Discord node after the filter to get notified about new matching tenders. Incremental Updates Modify the workflow to only fetch new tenders by: Storing the last execution timestamp Adding date filters to the API query Only processing publications newer than the last run Troubleshooting No data returned? Verify your TenderNed API credentials are correct Check that you have setup youre filter proper Need help setting this up or interested in a complete tender analysis solution? Get in touch π LinkedIn β Wessel Bulte
π How to transform unstructured email data into structured format with AI agent
This workflow automates the process of extracting structured, usable information from unstructured email messages across multiple platforms. It connects directly to Gmail, Outlook, and IMAP accounts, retrieves incoming emails, and sends their content to an AI-powered parsing agent built on OpenAI GPT models. The AI agent analyzes each email, identifies relevant details, and returns a clean JSON structure containing key fields: From β senderβs email address To β recipientβs email address Subject β email subject line Summary β short AI-generated summary of the email body The extracted information is then automatically inserted into an n8n Data Table, creating a structured database of email metadata and summaries ready for indexing, reporting, or integration with other tools. --- Key Benefits β Full Automation: Eliminates manual reading and data entry from incoming emails. β Multi-Source Integration: Handles data from different email providers seamlessly. β AI-Driven Accuracy: Uses advanced language models to interpret complex or unformatted content. β Structured Storage: Creates a standardized, query-ready dataset from previously unstructured text. β Time Efficiency: Processes emails in real time, improving productivity and response speed. *β Scalability: Easily extendable to handle additional sources or extract more data fields. --- How it works This workflow automates the transformation of unstructured email data into a structured, queryable format. It operates through a series of connected steps: Email Triggering: The workflow is initiated by one of three different email triggers (Gmail, Microsoft Outlook, or a generic IMAP account), which constantly monitor for new incoming emails. AI-Powered Parsing & Structuring: When a new email is detected, its raw, unstructured content is passed to a central "Parsing Agent." This agent uses a specified OpenAI language model to intelligently analyze the email text. Data Extraction & Standardization: Following a predefined system prompt, the AI agent extracts key information from the email, such as the sender, recipient, subject, and a generated summary. It then forces the output into a strict JSON structure using a "Structured Output Parser" node, ensuring data consistency. Data Storage: Finally, the clean, structured data (the from, to, subject, and summarize fields) is inserted as a new row into a specified n8n Data Table, creating a searchable and reportable database of email information. --- Set up steps To implement this workflow, follow these configuration steps: Prepare the Data Table: Create a new Data Table within n8n. Define the columns with the following names and string type: From, To, Subject, and Summary. Configure Email Credentials: Set up the credential connections for the email services you wish to use (Gmail OAuth2, Microsoft Outlook OAuth2, and/or IMAP). Ensure the accounts have the necessary permissions to read emails. Configure AI Model Credentials: Set up the OpenAI API credential with a valid API key. The workflow is configured to use the model, but this can be changed in the respective nodes if needed. Connect the Nodes: The workflow canvas is already correctly wired. Visually confirm that the email triggers are connected to the "Parsing Agent," which is connected to the "Insert row" (Data Table) node. Also, ensure the "OpenAI Chat Model" and "Structured Output Parser" are connected to the "Parsing Agent" as its AI model and output parser, respectively. Activate the Workflow: Save the workflow and toggle the "Active" switch to ON. The triggers will begin polling for new emails according to their schedule (e.g., every minute), and the automation will start processing incoming messages. --- Need help customizing? Contact me for consulting and support or add me on Linkedin.
Tax deadline management & compliance alerts with GPT-4, Google Sheets & Slack
AI-Driven Tax Compliance & Deadline Management System Description Automate tax deadline monitoring with AI-powered insights. This workflow checks your tax calendar daily at 8 AM, uses GPT-4 to analyze upcoming deadlines across multiple jurisdictions, detects overdue and critical items, and sends intelligent alerts via email and Slack only when immediate action is required. Perfect for finance teams and accounting firms who need proactive compliance management without manual tracking. ποΈπ€π Good to Know AI-Powered: GPT-4 provides risk assessment and strategic recommendations Multi-Jurisdiction: Handles Federal, State, and Local tax requirements automatically Smart Alerts: Only notifies executives when deadlines are overdue or critical (β€3 days) Priority Classification: Categorizes deadlines as Overdue, Critical, High, or Medium priority Dual Notifications: Critical alerts to leadership + daily summaries to team channel Complete Audit Trail: Logs all checks and deadlines to Google Sheets for compliance records How It Works Daily Trigger - Runs at 8:00 AM every morning Fetch Data - Pulls tax calendar and company configuration from Google Sheets Analyze Deadlines - Calculates days remaining, filters by jurisdiction/entity type, categorizes by priority AI Analysis - GPT-4 provides strategic insights and risk assessment on upcoming deadlines Smart Routing - Only sends alerts if overdue or critical deadlines exist Critical Alerts - HTML email to executives + Slack alert for urgent items Team Updates - Slack summary to finance channel with all upcoming deadlines Logging - Records compliance check results to Google Sheets for audit trail Requirements Google Sheets Structure Sheet 1: TaxCalendar DeadlineID | DeadlineName | DeadlineDate | Jurisdiction | Category | AssignedTo | IsActive FED-Q1 | Form 1120 Q1 | 2025-04-15 | Federal | Income | John Doe | TRUE Sheet 2: CompanyConfig (single row) Jurisdictions | EntityType | FiscalYearEnd Federal, California | Corporation | 12-31 Sheet 3: ComplianceLog (auto-populated) Date | AlertLevel | TotalUpcoming | CriticalCount | OverdueCount 2025-01-15 | HIGH | 12 | 3 | 1 Credentials Needed Google Sheets - Service Account OAuth2 OpenAI - API Key (GPT-4 access required) SMTP - Email account for sending alerts Slack - Bot Token with chat:write permission Setup Steps Import workflow JSON into n8n Add all 4 credentials Replace these placeholders: YOURTAXCALENDAR_ID - Tax calendar sheet ID YOURCONFIGID - Company config sheet ID YOURLOGID - Compliance log sheet ID C12345678 - Slack channel ID tax@company.com - Sender email cfo@company.com - Recipient email Share all sheets with Google service account email Invite Slack bot to channels Test workflow manually Activate the trigger Customizing This Workflow Change Alert Thresholds: Edit "Analyze Deadlines" node: Critical: Change <= 3 to <= 5 for 5-day warning High: Change <= 7 to <= 14 for 2-week notice Medium: Change <= 30 to <= 60 for 2-month lookout Adjust Schedule: Edit "Daily Tax Check" trigger: Change hour/minute for different run time Add multiple trigger times for tax season (8 AM, 2 PM, 6 PM) Add More Recipients: Edit "Send Email" node: To: cfo@company.com, director@company.com CC: accounting@company.com BCC: archive@company.com Customize Email Design: Edit "Format Email" node to change colors, add logo, or modify layout Add SMS Alerts: Insert Twilio node after "Is Critical" for emergency notifications Integrate Task Management: Add HTTP Request node to create tasks in Asana/Jira for critical deadlines Troubleshooting | Issue | Solution | |-------|----------| | No deadlines found | Check date format (YYYY-MM-DD) and IsActive = TRUE | | AI analysis failed | Verify OpenAI API key and account credits | | Email not sending | Test SMTP credentials and check if critical condition met | | Slack not posting | Invite bot to channel and verify channel ID format | | Permission denied | Share Google Sheets with service account email | π Professional Services Need help with implementation or customization? Our team offers: π― Custom workflow development π’ Enterprise deployment support π Team training sessions π§ Ongoing maintenance π Custom reporting & dashboards π Additional API integrations Discover more workflows β Get in touch with us