Document Q&A system with OpenAI GPT, Pinecone Vector DB & Google Drive integration

1011 views

2/3/2026

Transformation Data Processing Code HTTP Request ETL

This workflow contains community nodes that are only compatible with the self-hosted version of n8n.

🤖 AI-Powered Document QA System using Webhook, Pinecone + OpenAI + n8n

This project demonstrates how to build a Retrieval-Augmented Generation (RAG) system using n8n, and create a simple Question Answer system using Webhook to connect with User Interface (created using Lovable):

🧾 Downloads the pdf file format documents from Google Drive (contract document, user manual, HR policy document etc...)

📚 Converts them into vector embeddings using OpenAI

🔍 Stores and searches them in Pinecone Vector DB

💬 Allows natural language querying of contracts using AI Agents

📂 Flow 1: Document Loading & RAG Setup

This flow automates:

Reading documents from a Google Drive folder

Vectorizing using text-embedding-3-small

Uploading vectors into Pinecone for later semantic search

🧱 Workflow Structure

A [Manual Trigger] --> B[Google Drive Search] B --> C[Google Drive Download] C --> D[Pinecone Vector Store] D --> E[Default Data Loader] E --> F[Recursive Character Text Splitter] E --> G[OpenAI Embedding]

🪜 Steps

Manual Trigger: Kickstarts the workflow on demand for loading new documents.

Google Drive Search & Download

Node: Google Drive (Search: file/folder)

Downloads PDF documents

Apply Recursive Text Splitter: Breaks long documents into overlapping chunks

Settings: Chunk Size: 1000 Chunk Overlap: 100

OpenAI Embedding

Model: text-embedding-3-small Used for creating document vectors

Pinecone Vector Store

Host: url Index: index Batch Size: 200

Pinecone Settings:

Type: Dense Region: us-east-1 Mode: Insert Documents

💬 Flow 2: Chat-Based Q&A Agent

This flow enables chat-style querying of stored documents using OpenAI-powered agents with vector memory.

🧱 Workflow Diagram

A[Webhook (chat message)] --> B[AI Agent] B --> C[OpenAI Chat Model] B --> D[Simple Memory] B --> E[Answer with Vector Store] E --> F[Pinecone Vector Store] F --> G[Embeddings OpenAI]

🪜 Components

Chat (Trigger): Receives incoming chat queries

AI Agent Node

Handles query flow using:

Chat Model: OpenAI GPT

Memory: Simple Memory

Tool: Question Answer with Vector Store

Pinecone Vector Store: Connected via same embedding index as Flow 1

Embeddings: Ensures document chunks are retrievable using vector similarity

Response Node: Returns final AI response to user via webhook

🌐 Flow 3: UI-Based Query with Lovable

This flow uses a web UI built using Lovable to query contracts directly from a form interface.

📥 Webhook Setup for Lovable

Webhook Node

Method: POST URL:url Response: Using 'Respond to Webhook' Node

🧱 Workflow Logic

A[Webhook (Lovable Form)] --> B[AI Agent] B --> C[OpenAI Chat Model] B --> D[Simple Memory] B --> E[Answer with Vector Store] E --> F[Pinecone Vector Store] F --> G[Embeddings OpenAI] B --> H[Respond to Webhook]

💡 Lovable UI

Users can submit:

Full Name Email Department Freeform Query: User can enter any freeform query. Data is sent via webhook to n8n and responded with the answer from contract content.

🔍 Use Cases

Contract Querying for Legal/HR teams

Procurement & Vendor Agreement QA

Customer Support Automation (based on terms)

RAG Systems for private document knowledge

⚙️ Tools & Tech Stack

📌 Final Notes Pinecone Index: package1536

Dimension: 1536

Chunk Size: 1000, Overlap: 100

Embedding Model: text-embedding-3-small

Feel free to fork the workflow or request the full JSON export. Looking forward to your suggestions and improvements!

n8n Document QA System with OpenAI GPT, Pinecone Vector DB, and Google Drive Integration

This n8n workflow creates a powerful question-answering system that leverages OpenAI's GPT models, Pinecone as a vector database, and Google Drive for document ingestion. It allows you to ask questions about documents stored in Google Drive and receive intelligent, context-aware answers.

What it does

This workflow automates the following steps:

Triggers on Chat Message or Manual Execution: The system can be initiated either by receiving a chat message (e.g., from a chat application integrated with n8n) or by manually executing the workflow.
Loads Documents from Google Drive: It connects to Google Drive to fetch documents. These documents will serve as the knowledge base for the AI agent.
Splits Documents into Chunks: The fetched documents are processed by a "Recursive Character Text Splitter" to break them down into smaller, manageable chunks. This is crucial for efficient embedding and retrieval.
Generates Embeddings with OpenAI: Each document chunk is then converted into a numerical vector (embedding) using OpenAI's Embeddings API. These embeddings capture the semantic meaning of the text.
Stores Embeddings in Pinecone Vector Store: The generated embeddings, along with their corresponding text chunks, are stored in a Pinecone Vector Database. Pinecone enables fast and accurate similarity searches.
Initializes AI Agent with Chat Model and Memory: An AI Agent is set up using an OpenAI Chat Model (GPT) and a simple buffer memory to maintain conversational context.
Configures Vector Store Question Answer Tool: The AI Agent is equipped with a tool that allows it to query the Pinecone Vector Store. This tool enables the agent to find relevant document chunks based on a user's question.
Answers Questions using AI Agent: When a question is received, the AI Agent uses the configured tool to search the Pinecone Vector Store for relevant document sections. It then uses the OpenAI Chat Model to formulate a coherent answer based on the retrieved information.
Responds to the Webhook: The final answer generated by the AI Agent is sent back as a response to the initial webhook trigger.

Prerequisites/Requirements

To use this workflow, you will need:

n8n Instance: A running n8n instance (self-hosted or cloud).
OpenAI API Key: For generating text embeddings and powering the chat model (GPT).
Pinecone Account and API Key: For storing and retrieving vector embeddings. You'll need to create an index in Pinecone.
Google Drive Account: With access to the documents you wish to use as a knowledge base.
n8n Langchain Nodes: Ensure you have the @n8n/n8n-nodes-langchain package installed in your n8n instance, as this workflow heavily relies on its nodes.

Setup/Usage

Import the Workflow: Download the JSON definition and import it into your n8n instance.
Configure Credentials:
- OpenAI: Set up an OpenAI credential with your API key.
- Pinecone: Configure a Pinecone credential with your API key and environment details.
- Google Drive: Set up a Google Drive OAuth2 credential to allow n8n to access your documents.
Configure Nodes:
- Google Drive Node (ID: 58): Specify the folder or files you want to load from Google Drive.
- Pinecone Vector Store Node (ID: 1230): Configure your Pinecone index name and namespace (if applicable).
- AI Agent Node (ID: 1119): Ensure the OpenAI Chat Model and Simple Memory nodes are correctly connected as sub-nodes.
- Vector Store Question Answer Tool Node (ID: 1269): Verify it's correctly linked to the Pinecone Vector Store.
Activate the Workflow: Once all credentials and nodes are configured, activate the workflow.
Trigger the Workflow:
- Webhook (ID: 47): Use the provided webhook URL to send questions (e.g., via a POST request with your question in the body).
- Chat Trigger (ID: 1247): If you've integrated a chat platform, send a message to trigger the workflow.
- Manual Trigger (ID: 838): Click "Execute Workflow" in the n8n editor to run it manually for testing.

The workflow will then process your query, consult your Google Drive documents via Pinecone, and return an answer through the webhook response or chat interface.

Related Templates

Track daily moods with AI analysis & reports using GPT-4o, Data Tables & Gmail

Track your daily mood in one tap and receive automated AI summaries of your emotional trends every week and month. Perfect for self-reflection, wellness tracking, or personal analytics. This workflow logs moods sent through a webhook (/mood) into Data Tables, analyzes them weekly and monthly with OpenAI (GPT-4o), and emails you clear summaries and actionable recommendations via Gmail. ⚙️ How It Works Webhook – Mood → Collects new entries (🙂, 😐, or 😩) plus an optional note. Set Mood Data → Adds date, hour, and note fields automatically. Insert Mood Row → Stores each record in a Data Table. Weekly Schedule (Sunday 20:00) → Aggregates the last 7 days and sends a summarized report. Monthly Schedule (Day 1 at 08:00) → Aggregates the last 30 days for a deeper AI analysis. OpenAI Analysis → Generates insights, patterns, and 3 actionable recommendations. Gmail → Sends the full report (chart + AI text) to your inbox. 📊 Example Auto-Email Weekly Mood Summary (last 7 days) 🙂 5 ██████████ 😐 2 ████ 😩 0 Average: 1.7 (Positive 🙂) AI Insights: You’re trending upward this week — notes show that exercise days improved mood. Try keeping short walks mid-week to stabilize energy. 🧩 Requirements n8n Data Tables enabled OpenAI credential (GPT-4o or GPT-4 Turbo) Gmail OAuth2 credential to send summaries 🔧 Setup Instructions Connect your credentials: Add your own OpenAI and Gmail OAuth2 credentials. Set your Data Table ID: Open the Insert Mood Row node and enter your own Data Table ID. Without this, new moods won’t be stored. Replace the email placeholder: In the Gmail nodes, replace your.email@example.com with your actual address. Deploy and run: Send a test POST request to /mood (e.g. { "mood": "🙂", "note": "productive day" }) to log your first entry. ⚠️ Before activating the workflow, ensure you have configured the Data Table ID in the “Insert Mood Row” node. 🧠 AI Analysis Interprets mood patterns using GPT-4o. Highlights trends, potential triggers, and suggests 3 specific actions. Runs automatically every week and month. 🔒 Security No personal data is exposed outside your n8n instance. Always remove or anonymize credential references before sharing publicly. 💡 Ideal For Personal mood journaling and AI feedback Therapists tracking client progress Productivity or self-quantification projects 🗒️ Sticky Notes Guide 🟡 Mood Logging Webhook POST /mood receives mood + optional note. ⚠️ Configure your own Data Table ID in the “Insert Mood Row” node before running. 🟢 Weekly Summary Runs every Sunday 20:00 → aggregates last 7 days → generates AI insights + emails report. 🔵 Monthly Summary Runs on Day 1 at 08:00 → aggregates last 30 days → creates monthly reflection. 🟣 AI Analysis Uses OpenAI GPT-4o to interpret trends and recommend actions. 🟠 Email Delivery Sends formatted summaries to your inbox automatically.

By Jose Castillo

105

IT ops AI SlackBot workflow - chat with your knowledge base

Video Demo: Click here to see a video of this workflow in action. Summary Description: The "IT Department Q&A Workflow" is designed to streamline and automate the process of handling IT-related inquiries from employees through Slack. When an employee sends a direct message (DM) to the IT department's Slack channel, the workflow is triggered. The initial step involves the "Receive DMs" node, which listens for new messages. Upon receiving a message, the workflow verifies the webhook by responding to Slack's challenge request, ensuring that the communication channel is active and secure. Once the webhook is verified, the workflow checks if the message sender is a bot using the "Check if Bot" node. If the sender is identified as a bot, the workflow terminates the process to avoid unnecessary actions. If the sender is a human, the workflow sends an acknowledgment message back to the user, confirming that their query is being processed. This is achieved through the "Send Initial Message" node, which posts a simple message like "On it!" to the user's Slack channel. The core functionality of the workflow is powered by the "AI Agent" node, which utilizes the OpenAI GPT-4 model to interpret and respond to the user's query. This AI-driven node processes the text of the received message, generating an appropriate response based on the context and information available. To maintain conversation context, the "Window Buffer Memory" node stores the last five messages from each user, ensuring that the AI agent can provide coherent and contextually relevant answers. Additionally, the workflow includes a custom Knowledge Base (KB) tool (see that tool template here) that integrates with the AI agent, allowing it to search the company's internal KB for relevant information. After generating the response, the workflow cleans up the initial acknowledgment message using the "Delete Initial Message" node to keep the conversation thread clean. Finally, the generated response is sent back to the user via the "Send Message" node, providing them with the information or assistance they requested. This workflow effectively automates the IT support process, reducing response times and improving efficiency. To quickly deploy the Knowledge Ninja app in Slack, use the app manifest below and don't forget to replace the two sample urls: { "display_information": { "name": "Knowledge Ninja", "description": "IT Department Q&A Workflow", "background_color": "005e5e" }, "features": { "bot_user": { "display_name": "IT Ops AI SlackBot Workflow", "always_online": true } }, "oauth_config": { "redirect_urls": [ "Replace everything inside the double quotes with your slack redirect oauth url, for example: https://n8n.domain.com/rest/oauth2-credential/callback" ], "scopes": { "user": [ "search:read" ], "bot": [ "chat:write", "chat:write.customize", "groups:history", "groups:read", "groups:write", "groups:write.invites", "groups:write.topic", "im:history", "im:read", "im:write", "mpim:history", "mpim:read", "mpim:write", "mpim:write.topic", "usergroups:read", "usergroups:write", "users:write", "channels:history" ] } }, "settings": { "event_subscriptions": { "request_url": "Replace everything inside the double quotes with your workflow webhook url, for example: https://n8n.domain.com/webhook/99db3e73-57d8-4107-ab02-5b7e713894ad", "bot_events": [ "message.im" ] }, "orgdeployenabled": false, "socketmodeenabled": false, "tokenrotationenabled": false } }

By Angel Menendez

39013

Competitor intelligence agent: SERP monitoring + summary with Thordata + OpenAI

Who this is for? This workflow is designed for: Marketing analysts, SEO specialists, and content strategists who want automated intelligence on their online competitors. Growth teams that need quick insights from SERP (Search Engine Results Pages) without manual data scraping. Agencies managing multiple clients’ SEO presence and tracking competitive positioning in real-time. What problem is this workflow solving? Manual competitor research is time-consuming, fragmented, and often lacks actionable insights. This workflow automates the entire process by: Fetching SERP results from multiple search engines (Google, Bing, Yandex, DuckDuckGo) using Thordata’s Scraper API. Using OpenAI GPT-4.1-mini to analyze, summarize, and extract keyword opportunities, topic clusters, and competitor weaknesses. Producing structured, JSON-based insights ready for dashboards or reports. Essentially, it transforms raw SERP data into strategic marketing intelligence — saving hours of research time. What this workflow does Here’s a step-by-step overview of how the workflow operates: Step 1: Manual Trigger Initiates the process on demand when you click “Execute Workflow.” Step 2: Set the Input Query The “Set Input Fields” node defines your search query, such as: > “Top SEO strategies for e-commerce in 2025” Step 3: Multi-Engine SERP Fetching Four HTTP request tools send the query to Thordata Scraper API to retrieve results from: Google Bing Yandex DuckDuckGo Each uses Bearer Authentication configured via “Thordata SERP Bearer Auth Account.” Step 4: AI Agent Processing The LangChain AI Agent orchestrates the data flow, combining inputs and preparing them for structured analysis. Step 5: SEO Analysis The SEO Analyst node (powered by GPT-4.1-mini) parses SERP results into a structured schema, extracting: Competitor domains Page titles & content types Ranking positions Keyword overlaps Traffic share estimations Strengths and weaknesses Step 6: Summarization The Summarize the content node distills complex data into a concise executive summary using GPT-4.1-mini. Step 7: Keyword & Topic Extraction The Keyword and Topic Analysis node extracts: Primary and secondary keywords Topic clusters and content gaps SEO strength scores Competitor insights Step 8: Output Formatting The Structured Output Parser ensures results are clean, validated JSON objects for further integration (e.g., Google Sheets, Notion, or dashboards). Setup Prerequisites n8n Cloud or Self-Hosted instance Thordata Scraper API Key (for SERP data retrieval) OpenAI API Key (for GPT-based reasoning) Setup Steps Add Credentials Go to Credentials → Add New → HTTP Bearer Auth* → Paste your Thordata API token. Add OpenAI API Credentials* for the GPT model. Import the Workflow Copy the provided JSON or upload it into your n8n instance. Set Input In the “Set the Input Fields” node, replace the example query with your desired topic, e.g.: “Google Search for Top SEO strategies for e-commerce in 2025” Execute Click “Execute Workflow” to run the analysis. How to customize this workflow to your needs Modify Search Query Change the search_query variable in the Set Node to any target keyword or topic. Change AI Model In the OpenAI Chat Model nodes, you can switch from gpt-4.1-mini to another model for better quality or lower cost. Extend Analysis Edit the JSON schema in the “Information Extractor” nodes to include: Sentiment analysis of top pages SERP volatility metrics Content freshness indicators Export Results Connect the output to: Google Sheets / Airtable for analytics Notion / Slack for team reporting Webhook / Database for automated storage Summary This workflow creates an AI-powered Competitor Intelligence System inside n8n by blending: Real-time SERP scraping (Thordata) Automated AI reasoning (OpenAI GPT-4.1-mini) Structured data extraction (LangChain Information Extractors)

By Ranjan Dailata

632