Build an All-Source Knowledge Assistant with Claude, RAG, Perplexity, and Drive
📜 Detailed n8n Workflow Description
Main Flow
The workflow operates through a three-step process that handles incoming chat messages with intelligent tool orchestration:
-
Message Trigger: The
When chat message receivednode triggers whenever a user message arrives and passes it directly to theKnowledge Agentfor processing. -
Agent Orchestration: The
Knowledge Agentserves as the central orchestrator, registering a comprehensive toolkit of capabilities:- LLM Processing: Uses
Anthropic Chat Modelwith the claude-sonnet-4-20250514 model to craft final responses - Memory Management: Implements
Postgres Chat Memoryto save and recall conversation context across sessions - Reasoning Engine: Incorporates a
Thinktool to force internal chain-of-thought processing before taking any action - Semantic Search: Leverages
General knowledgevector store with OpenAI embeddings (1536-dimensional) and Cohere reranking for intelligent content retrieval - Structured Queries: Provides
structured dataPostgres tool for executing queries on relational database tables - Drive Integration: Includes
search about any doc in google drivefunctionality to locate specific file IDs - File Processing: Connects to
Read File From GDrivesub-workflow for fetching and processing various file formats - External Intelligence: Offers
Message a model in Perplexityfor accessing up-to-the-minute web information when internal knowledge proves insufficient
- LLM Processing: Uses
-
Response Generation: After invoking the
Thinkprocess, the agent intelligently selects appropriate tools based on the query, integrates results from multiple sources, and returns a comprehensive Markdown-formatted answer to the user.
Persistent Context Management
The workflow maintains conversation continuity through Postgres Chat Memory, which automatically logs every user-agent exchange. This ensures long-term context retention without requiring manual intervention, allowing for sophisticated multi-turn conversations that build upon previous interactions.
Semantic Retrieval Pipeline
The semantic search system operates through a sophisticated two-stage process:
- Embedding Generation:
Embeddings OpenAIconverts textual content into high-dimensional vector representations - Relevance Reranking:
Reranker Coherereorders search hits to prioritize the most contextually relevant results - Knowledge Integration: Processed results feed into the
General knowledgevector store, providing the agent with relevant internal knowledge snippets for enhanced response accuracy
Google Drive File Processing
The file reading capability handles multiple formats through a structured sub-workflow:
- Workflow Initiation: The agent calls
Read File From GDrivewith the selectedfileIdparameter - Sub-workflow Activation:
When Executed by Another Workflownode activates the dedicated file processing sub-workflow - Operation Validation:
Operationnode confirms the request type isreadFile - File Retrieval:
Download File1node retrieves the binary file data from Google Drive - Format-Specific Processing:
FileTypenode branches processing based on MIME type:- PDF Files: Route through
Extract from PDF→Get PDF Responseto extract plain text content - CSV Files: Process via
Extract from CSV→Get CSV Responseto obtain comma-delimited text data - Image Files: Analyze using
Analyse Imagewith GPT-4o-mini to generate visual descriptions - Audio/Video Files: Transcribe using
Transcribe Audiowith Whisper for text transcript generation
- PDF Files: Route through
- Content Integration: The extracted text content returns to
Knowledge Agent, which seamlessly weaves it into the final response
External Search Capability
When internal knowledge sources prove insufficient, the workflow can access current public information through Message a model in Perplexity, ensuring responses remain accurate and up-to-date with the latest available information.
Design Highlights
The workflow architecture incorporates several key design principles that enhance reliability and reusability:
- Forced Reasoning: The mandatory
Thinkstep significantly reduces hallucinations and prevents tool misuse by requiring deliberate consideration before action - Template Flexibility: The design is intentionally generic—organizations can replace [your company] placeholders with their specific company name and integrate their own credentials for immediate deployment
- Documentation Integration: Sticky notes throughout the canvas serve as inline documentation for workflow creators and maintainers, providing context without affecting runtime performance
System Benefits
With this comprehensive architecture, the assistant delivers powerful capabilities including long-term memory retention, semantic knowledge retrieval, multi-format file processing, and contextually rich responses tailored specifically for users at [your company]. The system balances sophisticated AI capabilities with practical business requirements, creating a robust foundation for enterprise-grade conversational AI deployment.
n8n All-Source Knowledge Assistant with Claude, RAG, Perplexity, and Drive
This n8n workflow demonstrates a sophisticated AI agent capable of answering questions by leveraging multiple knowledge sources and advanced AI capabilities. It combines retrieval-augmented generation (RAG) with a large language model (LLM) to provide comprehensive and contextually relevant responses.
Description
This workflow acts as an "All-Source Knowledge Assistant." It's designed to respond to chat messages by first attempting to retrieve relevant information from a vector store (populated from Google Drive documents), and then using an AI agent (powered by Anthropic's Claude) to synthesize an answer. It includes tools for thinking, calling other n8n workflows (potentially for Perplexity integration, though not explicitly defined in this JSON), and interacting with a Model Context Protocol (MCP) client.
What it does
- Triggers on Chat Message: The workflow starts when a chat message is received.
- Initial Setup (Edit Fields): Prepares the input for the AI agent, likely setting up initial context or variables.
- AI Agent Execution: An AI agent, configured with an Anthropic Chat Model (Claude) and a Postgres Chat Memory, takes over.
- Tools Available: The agent has access to several tools:
- Think Tool: Allows the agent to reason and plan its next steps.
- Call n8n Workflow Tool: Enables the agent to execute other n8n workflows, which could be used for external API calls (e.g., Perplexity for search, as hinted by the directory name).
- MCP Client Tool: Facilitates interaction with a Model Context Protocol server, potentially for advanced context management or multi-agent communication.
- Vector Store Integration: The agent utilizes a Supabase Vector Store for Retrieval-Augmented Generation (RAG).
- Document Loading: Documents are loaded from Google Drive using the "Default Data Loader" and "Extract from File" nodes.
- Text Splitting: The "Recursive Character Text Splitter" prepares documents for embedding.
- Embeddings: OpenAI Embeddings are used to convert document chunks into vector representations for the Supabase Vector Store.
- Reranking: A Cohere Reranker is included to improve the relevance of retrieved documents before they are passed to the LLM.
- Tools Available: The agent has access to several tools:
- Conditional Logic (Switch): A "Switch" node is present, suggesting conditional branching based on the AI agent's output or other workflow logic.
- Sub-Workflow Trigger: The "Execute Workflow Trigger" node indicates that this workflow can be called as a sub-workflow by another n8n workflow.
- Manual Execution: The workflow can also be triggered manually for testing or ad-hoc use.
- MCP Server Trigger: An "MCP Server Trigger" is included, suggesting this workflow can also act as a server in a Model Context Protocol setup, receiving requests from MCP clients.
Prerequisites/Requirements
- n8n Instance: A running n8n instance.
- Anthropic API Key: For the Anthropic Chat Model (Claude).
- OpenAI API Key: For OpenAI Embeddings.
- Supabase Account: For the Supabase Vector Store. This requires a Supabase Project URL and API Key.
- PostgreSQL Database: For the Postgres Chat Memory. Requires connection details (host, port, database, user, password).
- Google Drive Account: With credentials configured in n8n to access documents.
- Cohere API Key: For the Cohere Reranker.
- Model Context Protocol (MCP) Setup: If utilizing the MCP Client/Server tools, a compatible MCP environment is needed.
Setup/Usage
- Import the Workflow: Import the provided JSON into your n8n instance.
- Configure Credentials:
- Set up credentials for Anthropic, OpenAI, Supabase, PostgreSQL, Google Drive, and Cohere within n8n.
- Configure Nodes:
- Google Drive: Specify the files or folders containing your knowledge base documents.
- Supabase Vector Store: Configure the table name and other connection details for your Supabase instance.
- AI Agent: Ensure the Anthropic Chat Model and Postgres Chat Memory are correctly linked to their respective credentials.
- Call n8n Workflow Tool: If you intend to use external tools like Perplexity, you will need to create separate n8n workflows for those functionalities and link them here.
- MCP Client/Server: Configure the MCP client and server nodes if you are integrating with an MCP setup.
- Activate the Workflow: Enable the workflow to start listening for chat messages.
- Interact: Send chat messages to the configured chat trigger to interact with your knowledge assistant.
Related Templates
Auto-reply & create Linear tickets from Gmail with GPT-5, gotoHuman & human review
This workflow automatically classifies every new email from your linked mailbox, drafts a personalized reply, and creates Linear tickets for bugs or feature requests. It uses a human-in-the-loop with gotoHuman and continuously improves itself by learning from approved examples. How it works The workflow triggers on every new email from your linked mailbox. Self-learning Email Classifier: an AI model categorizes the email into defined categories (e.g., Bug Report, Feature Request, Sales Opportunity, etc.). It fetches previously approved classification examples from gotoHuman to refine decisions. Self-learning Email Writer: the AI drafts a reply to the email. It learns over time by using previously approved replies from gotoHuman, with per-classification context to tailor tone and style (e.g., different style for sales vs. bug reports). Human Review in gotoHuman: review the classification and the drafted reply. Drafts can be edited or retried. Approved values are used to train the self-learning agents. Send approved Reply: the approved response is sent as a reply to the email thread. Create ticket: if the classification is Bug or Feature Request, a ticket is created by another AI agent in Linear. Human Review in gotoHuman: How to set up Most importantly, install the gotoHuman node before importing this template! (Just add the node to a blank canvas before importing) Set up credentials for gotoHuman, OpenAI, your email provider (e.g. Gmail), and Linear. In gotoHuman, select and create the pre-built review template "Support email agent" or import the ID: 6fzuCJlFYJtlu9mGYcVT. Select this template in the gotoHuman node. In the "gotoHuman: Fetch approved examples" http nodes you need to add your formId. It is the ID of the review template that you just created/imported in gotoHuman. Requirements gotoHuman (human supervision, memory for self-learning) OpenAI (classification, drafting) Gmail or your preferred email provider (for email trigger+replies) Linear (ticketing) How to customize Expand or refine the categories used by the classifier. Update the prompt to reflect your own taxonomy. Filter fetched training data from gotoHuman by reviewer so the writer adapts to their personalized tone and preferences. Add more context to the AI email writer (calendar events, FAQs, product docs) to improve reply quality.
Document RAG & chat agent: Google Drive to Qdrant with Mistral OCR
Knowledge RAG & AI Chat Agent: Google Drive to Qdrant Description This workflow transforms a Google Drive folder into an intelligent, searchable knowledge base and provides a chat agent to query it. It’s composed of two distinct flows: An ingestion pipeline to process documents. A live chat agent that uses RAG (Retrieval-Augmented Generation) and optional web search to answer user questions. This system fully automates the creation of a “Chat with your docs” solution and enhances it with external web-searching capabilities. --- Quick Implementation Steps Import the workflow JSON into your n8n instance. Set up credentials for Google Drive, Mistral AI, OpenAI, and Qdrant. Open the Web Search node and add your Tavily AI API key to the Authorization header. In the Google Drive (List Files) node, set the Folder ID you want to ingest. Run the workflow manually once to populate your Qdrant database (Flow 1). Activate the workflow to enable the chat trigger (Flow 2). Copy the public webhook URL from the When chat message received node and open it in a new tab to start chatting. --- What It Does The workflow is divided into two primary functions: Knowledge Base Ingestion (Manual Trigger) This flow populates your vector database. Scans Google Drive: Lists all files from a specified folder. Processes Files Individually: Downloads each file. Extracts Text via OCR: Uses Mistral AI OCR API for text extraction from PDFs, images, etc. Generates Smart Metadata: A Mistral LLM assigns metadata like documenttype, project, and assignedto. Chunks & Embeds: Text is cleaned, chunked, and embedded via OpenAI’s text-embedding-3-small model. Stores in Qdrant: Text chunks, embeddings, and metadata are stored in a Qdrant collection (docaiauto). AI Chat Agent (Chat Trigger) This flow powers the conversational interface. Handles User Queries: Triggered when a user sends a chat message. Internal RAG Retrieval: Searches Qdrant Vector Store first for answers. Web Search Fallback: If unavailable internally, the agent offers to perform a Tavily AI web search. Contextual Responses: Combines internal and external info for comprehensive answers. --- Who's It For Ideal for: Teams building internal AI knowledge bases from Google Drive. Developers creating AI-powered support, research, or onboarding bots. Organizations implementing RAG pipelines. Anyone making unstructured Google Drive documents searchable via chat. --- Requirements n8n instance (self-hosted or cloud). Google Drive Credentials (to list and download files). Mistral AI API Key (for OCR & metadata extraction). OpenAI API Key (for embeddings and chat LLM). Qdrant instance (cloud or self-hosted). Tavily AI API Key (for web search). --- How It Works The workflow runs two independent flows in parallel: Flow 1: Ingestion Pipeline (Manual Trigger) List Files: Fetch files from Google Drive using the Folder ID. Loop & Download: Each file is processed one by one. OCR Processing: Upload file to Mistral Retrieve signed URL Extract text using Mistral DOC OCR Metadata Extraction: Analyze text using a Mistral LLM. Text Cleaning & Chunking: Split into 1000-character chunks. Embeddings Creation: Use OpenAI embeddings. Vector Insertion: Push chunks + metadata into Qdrant. Flow 2: AI Chat Agent (Chat Trigger) Chat Trigger: Starts when a chat message is received. AI Agent: Uses OpenAI + Simple Memory to process context. RAG Retrieval: Queries Qdrant for related data. Decision Logic: Found → Form answer. Not found → Ask if user wants web search. Web Search: Performs Tavily web lookup. Final Response: Synthesizes internal + external info. --- How To Set Up Import the Workflow Upload the provided JSON into your n8n instance. Configure Credentials Create and assign: Google Drive → Google Drive nodes Mistral AI → Upload, Signed URL, DOC OCR, Cloud Chat Model OpenAI → Embeddings + Chat Model nodes Qdrant → Vector Store nodes Add Tavily API Key Open Web Search node → Parameters → Headers Add your key under Authorization (e.g., tvly-xxxx). Node Configuration Google Drive (List Files): Set Folder ID. Qdrant Nodes: Ensure same collection name (docaiauto). Run Ingestion (Flow 1) Click Test workflow to populate Qdrant with your Drive documents. Activate Chat (Flow 2) Toggle the workflow ON to enable real-time chat. Test Open the webhook URL and start chatting! --- How To Customize Change LLMs: Swap models in OpenAI or Mistral nodes (e.g., GPT-4o, Claude 3). Modify Prompts: Edit the system message in ai chat agent to alter tone or logic. Chunking Strategy: Adjust chunkSize and chunkOverlap in the Code node. Different Sources: Replace Google Drive with AWS S3, Local Folder, etc. Automate Updates: Add a Cron node for scheduled ingestion. Validation: Add post-processing steps after metadata extraction. Expand Tools: Add more functional nodes like Google Calendar or Calculator. --- Use Case Examples Internal HR Bot: Answer HR-related queries from stored policy docs. Tech Support Assistant: Retrieve troubleshooting steps for products. Research Assistant: Summarize and compare market reports. Project Management Bot: Query document ownership or project status. --- Troubleshooting Guide | Issue | Possible Solution | |------------|------------------------| | Chat agent doesn’t respond | Check OpenAI API key and model availability (e.g., gpt-4.1-mini). | | Known documents not found | Ensure ingestion flow ran and both Qdrant nodes use same collection name. | | OCR node fails | Verify Mistral API key and input file integrity. | | Web search not triggered | Re-check Tavily API key in Web Search node headers. | | Incorrect metadata | Tune Information Extractor prompt or use a stronger Mistral model. | --- Need Help or More Workflows? Want to customize this workflow for your business or integrate it with your existing tools? Our team at Digital Biz Tech can tailor it precisely to your use case from automation logic to AI-powered enhancements. We can help you set it up for free — from connecting credentials to deploying it live. Contact: shilpa.raju@digitalbiz.tech Website: https://www.digitalbiz.tech LinkedIn: https://www.linkedin.com/company/digital-biz-tech/ You can also DM us on LinkedIn for any help. ---
Ai website scraper & company intelligence
AI Website Scraper & Company Intelligence Description This workflow automates the process of transforming any website URL into a structured, intelligent company profile. It's triggered by a form, allowing a user to submit a website and choose between a "basic" or "deep" scrape. The workflow extracts key information (mission, services, contacts, SEO keywords), stores it in a structured Supabase database, and archives a full JSON backup to Google Drive. It also features a secondary AI agent that automatically finds and saves competitors for each company, building a rich, interconnected database of company intelligence. --- Quick Implementation Steps Import the Workflow: Import the provided JSON file into your n8n instance. Install Custom Community Node: You must install the community node from: https://www.npmjs.com/package/n8n-nodes-crawl-and-scrape FIRECRAWL N8N Documentation https://docs.firecrawl.dev/developer-guides/workflow-automation/n8n Install Additional Nodes: n8n-nodes-crawl-and-scrape and n8n-nodes-mcp fire crawl mcp . Set up Credentials: Create credentials in n8n for FIRE CRAWL API,Supabase, Mistral AI, and Google Drive. Configure API Key (CRITICAL): Open the Web Search tool node. Go to Parameters → Headers and replace the hardcoded Tavily AI API key with your own. Configure Supabase Nodes: Assign your Supabase credential to all Supabase nodes. Ensure table names (e.g., companies, competitors) match your schema. Configure Google Drive Nodes: Assign your Google Drive credential to the Google Drive2 and save to Google Drive1 nodes. Select the correct Folder ID. Activate Workflow: Turn on the workflow and open the Webhook URL in the “On form submission” node to access the form. --- What It Does Form Trigger Captures user input: “Website URL” and “Scraping Type” (basic or deep). Scraping Router A Switch node routes the flow: Deep Scraping → AI-based MCP Firecrawler agent. Basic Scraping → Crawlee node. Deep Scraping (Firecrawl AI Agent) Uses Firecrawl and Tavily Web Search. Extracts a detailed JSON profile: mission, services, contacts, SEO keywords, etc. Basic Scraping (Crawlee) Uses Crawl and Scrape node to collect raw text. A Mistral-based AI extractor structures the data into JSON. Data Storage Stores structured data in Supabase tables (companies, company_basicprofiles). Archives a full JSON backup to Google Drive. Automated Competitor Analysis Runs after a deep scrape. Uses Tavily web search to find competitors (e.g., from Crunchbase). Saves competitor data to Supabase, linked by company_id. --- Who's It For Sales & Marketing Teams: Enrich leads with deep company info. Market Researchers: Build structured, searchable company databases. B2B Data Providers: Automate company intelligence collection. Developers: Use as a base for RAG or enrichment pipelines. --- Requirements n8n instance (self-hosted or cloud) Supabase Account: With tables like companies, competitors, social_links, etc. Mistral AI API Key Google Drive Credentials Tavily AI API Key (Optional) Custom Nodes: n8n-nodes-crawl-and-scrape --- How It Works Flow Summary Form Trigger: Captures “Website URL” and “Scraping Type”. Switch Node: deep → MCP Firecrawler (AI Agent). basic → Crawl and Scrape node. Scraping & Extraction: Deep path: Firecrawler → JSON structure. Basic path: Crawlee → Mistral extractor → JSON. Storage: Save JSON to Supabase. Archive in Google Drive. Competitor Analysis (Deep Only): Finds competitors via Tavily. Saves to Supabase competitors table. End: Finishes with a No Operation node. --- How To Set Up Import workflow JSON. Install community nodes (especially n8n-nodes-crawl-and-scrape from npm). Configure credentials (Supabase, Mistral AI, Google Drive). Add your Tavily API key. Connect Supabase and Drive nodes properly. Fix disconnected “basic” path if needed. Activate workflow. Test via the webhook form URL. --- How To Customize Change LLMs: Swap Mistral for OpenAI or Claude. Edit Scraper Prompts: Modify system prompts in AI agent nodes. Change Extraction Schema: Update JSON Schema in extractor nodes. Fix Relational Tables: Add Items node before Supabase inserts for arrays (social links, keywords). Enhance Automation: Add email/slack notifications, or replace form trigger with a Google Sheets trigger. --- Add-ons Automated Trigger: Run on new sheet rows. Notifications: Email or Slack alerts after completion. RAG Integration: Use the Supabase database as a chatbot knowledge source. --- Use Case Examples Sales Lead Enrichment: Instantly get company + competitor data from a URL. Market Research: Collect and compare companies in a niche. B2B Database Creation: Build a proprietary company dataset. --- WORKFLOW IMAGE --- Troubleshooting Guide | Issue | Possible Cause | Solution | |-------|----------------|-----------| | Form Trigger 404 | Workflow not active | Activate the workflow | | Web Search Tool fails | Missing Tavily API key | Replace the placeholder key | | FIRECRAWLER / find competitor fails | Missing MCP node | Install n8n-nodes-mcp | | Basic scrape does nothing | Switch node path disconnected | Reconnect “basic” output | | Supabase node error | Wrong table/column names | Match schema exactly | --- Need Help or More Workflows? Want to customize this workflow for your business or integrate it with your existing tools? Our team at Digital Biz Tech can tailor it precisely to your use case from automation logic to AI-powered enhancements. Contact: shilpa.raju@digitalbiz.tech For more such offerings, visit us: https://www.digitalbiz.tech ---