PDF proposal knowledge base with S3, OpenAI GPT-4o & Qdrant RAG agent

Joe Swink

947 views

2/3/2026

File Management Google Drive IMAP Attachments Email

Official Page

This template has a two part setup:

Ingest PDF files from S3, extract text, chunk, embed with OpenAI embeddings, and index into a Qdrant collection with metadata.
Provide a chat entry point that uses an Agent with OpenAI to retrieve from the same Qdrant collection as a tool and answer proposal knowledge questions.

What it does

Lists objects in an S3 bucket, loops through keys, downloads each file, and extracts text from PDFs.
Chunks text and loads it into Qdrant with metadata for retrieval.
Exposes a chat trigger wired to an Agent using an OpenAI chat model.
Adds a retrieve as tool Qdrant node so the Agent can ground answers in the indexed corpus.

Why it is useful

Simple pattern for building a proposal or knowledge base from PDFs stored in S3.
End to end path from ingestion to retrieval augmented answers.
Easy to swap models or collections, and to extend with more tools.

Setup notes

Attach your own AWS credentials to the two S3 nodes and set your bucket name.
Attach your Qdrant credentials to both Qdrant nodes and set your collection.
Attach your OpenAI credentials to the embedding and chat nodes.
The sanitized template uses placeholders for bucket and collection names.

PDF Proposal Knowledge Base with S3, OpenAI GPT-4o, and Qdrant RAG Agent

This n8n workflow creates a robust knowledge base from PDF documents stored in an AWS S3 bucket. It leverages OpenAI for embeddings and a Qdrant vector store for efficient retrieval-augmented generation (RAG). The system can then be queried via a chat interface, allowing users to ask questions and receive answers based on the content of the uploaded PDFs.

What it does

This workflow automates the process of ingesting PDF documents into a RAG-powered knowledge base and enabling interactive querying:

Manual Trigger: The workflow is initiated manually, typically to start the ingestion process or to prepare for a chat session.
AWS S3 File Listing: It connects to an AWS S3 bucket and lists all PDF files within a specified directory.
Loop Over PDF Files: Each listed PDF file is processed individually in a batch.
Extract Content from PDF: For each PDF, the workflow extracts its textual content.
Load Document Data: The extracted text is then loaded as a document for further processing.
Split Text into Chunks: The document text is split into smaller, manageable chunks using a Recursive Character Text Splitter. This is crucial for efficient embedding and retrieval.
Generate Embeddings: OpenAI's embedding model (e.g., text-embedding-ada-002) is used to convert each text chunk into a numerical vector (embedding).
Store in Qdrant Vector Store: These embeddings, along with their corresponding text chunks, are stored in a Qdrant vector database, creating the searchable knowledge base.
Chat Trigger (Optional/Separate Flow): Once the knowledge base is built, a separate "Chat Trigger" can be used to initiate a conversational interaction.
AI Agent for RAG: An AI Agent (configured with an OpenAI Chat Model like GPT-4o and connected to the Qdrant Vector Store) processes user queries. It retrieves relevant information from Qdrant based on the query's embedding and then uses the OpenAI Chat Model to generate a coherent and informed response.

Prerequisites/Requirements

To use this workflow, you will need:

n8n Instance: A running n8n instance.
AWS S3 Account: An AWS account with an S3 bucket containing your PDF documents. You'll need AWS credentials (Access Key ID and Secret Access Key) configured in n8n.
OpenAI API Key: An OpenAI API key with access to embedding models (e.g., text-embedding-ada-002) and chat models (e.g., gpt-4o). This needs to be configured as an OpenAI credential in n8n.
Qdrant Instance: A running Qdrant instance (either self-hosted or cloud-based). You'll need the Qdrant host URL and API key (if applicable) configured as a Qdrant credential in n8n.
Langchain Nodes: Ensure the @n8n/n8n-nodes-langchain package is installed in your n8n instance.

Setup/Usage

Import the Workflow: Download the provided JSON and import it into your n8n instance.
Configure Credentials:
- AWS S3: Create or update an AWS credential in n8n with your Access Key ID and Secret Access Key.
- OpenAI: Create or update an OpenAI credential in n8n with your API Key.
- Qdrant: Create or update a Qdrant credential in n8n with your host URL and API key.
Configure AWS S3 Node (ID: 307):
- Specify your S3 Bucket Name.
- Set the Operation to List and Resource to File.
- Provide the Folder Path where your PDF documents are located.
Configure Qdrant Vector Store Node (ID: 1248):
- Specify your Qdrant Collection Name. This is where your document embeddings will be stored.
Configure OpenAI Chat Model Node (ID: 1153):
- Select your desired OpenAI Model (e.g., gpt-4o).
Configure AI Agent Node (ID: 1119):
- Ensure the Language Model is set to the "OpenAI Chat Model" node and the Vector Store is set to the "Qdrant Vector Store" node.
Run the Ingestion Flow:
- Activate the workflow.
- Click "Execute Workflow" on the "Manual Trigger" node (ID: 838) to start the PDF ingestion process. This will read your PDFs, create embeddings, and store them in Qdrant.
Interact via Chat (after ingestion):
- Once ingestion is complete, you can use the "Chat Trigger" node (ID: 1247) to send messages to the AI Agent. The agent will then use the Qdrant knowledge base to answer your questions.

Related Templates

AI multi-agent executive team for entrepreneurs with Gemini, Perplexity and WhatsApp

This workflow is an AI-powered multi-agent system built for startup founders and small business owners who want to automate decision-making, accountability, research, and communication, all through WhatsApp. The “virtual executive team,” is designed to help small teams to work smarter. This workflow sends you market analysis, market and sales tips, It can also monitor what your competitors are doing using perplexity (Research agent) and help you stay a head, or make better decisions. And when you feeling stuck with your start-up accountability director is creative enough to break the barrier 🎯 Core Features 🧑‍💼 1. President (Super Agent) Acts as the main controller that coordinates all sub-agents. Routes messages, assigns tasks, and ensures workflow synchronization between the AI Directors. 📊 2. Sales & Marketing Director Uses SerpAPI to search for market opportunities, leads, and trends. Suggests marketing campaigns, keywords, or outreach ideas. Can analyze current engagement metrics to adjust content strategy. 🕵️‍♀️ 3. Business Research Director Powered by Perplexity AI for competitive and market analysis. Monitors competitor moves, social media engagement, and product changes. Provides concise insights to help the founder adapt and stay ahead. ⏰ 4. Accountability Director Keeps the founder and executive team on track. Sends motivational nudges, task reminders, and progress reports. Promotes consistency and discipline — key traits for early-stage success. 🗓️ 5. Executive Secretary Handles scheduling, email drafting, and reminders. Connects with Google Calendar, Gmail, and Sheets through OAuth. Automates follow-ups, meeting summaries, and notifications directly via WhatsApp. 💬 WhatsApp as the Main Interface Interact naturally with your AI team through WhatsApp Business API. All responses, updates, and summaries are delivered to your chat. Ideal for founders who want to manage operations on the go. ⚙️ How It Works Trigger: The workflow starts from a WhatsApp Trigger node (via Meta Developer Account). Routing: The President agent analyzes the incoming message and determines which Director should handle it. Processing: Marketing or sales queries go to the Sales & Marketing Director. Research questions are handled by the Business Research Director. Accountability tasks are assigned to the Accountability Director. Scheduling or communication requests are managed by the Secretary. Collaboration: Each sub-agent returns results to the President, who summarizes and sends the reply back via WhatsApp. Memory: Context is maintained between sessions, ensuring personalized and coherent communication. 🧩 Integrations Required Gemini API – for general intelligence and task reasoning Supabase- for RAG and postgres persistent memory Perplexity API – for business and competitor analysis SerpAPI – for market research and opportunity scouting Google OAuth – to connect Sheets, Calendar, and Gmail WhatsApp Business API – for message triggers and responses 🚀 Benefits Acts like a team of tireless employees available 24/7. Saves time by automating research, reminders, and communication. Enhances accountability and strategy consistency for founders. Keeps operations centralized in a simple WhatsApp interface. 🧰 Setup Steps Create API credentials for: WhatsApp (via Meta Developer Account) Gemini, Perplexity, and SerpAPI Google OAuth (Sheets, Calendar, Gmail) Create a supabase account at supabase Add the credentials in the corresponding n8n nodes. Customize the system prompts for each Director based on your startup’s needs. Activate and start interacting with your virtual executive team on WhatsApp. Use Case You are a small organisation or start-up that can not afford hiring; marketing department, research department and secretar office, then this workflow is for you 💡 Need Customization? Want to tailor it for your startup or integrate with CRM tools like Notion or HubSpot? You can easily extend the workflow or contact the creator for personalized support. Consider adjusting the system prompt to suite your business

By Shadrack

331

Automated YouTube video uploads with 12h interval scheduling in JST

This workflow automates a batch upload of multiple videos to YouTube, spacing each upload 12 hours apart in Japan Standard Time (UTC+9) and automatically adding them to a playlist. ⚙️ Workflow Logic Manual Trigger — Starts the workflow manually. List Video Files — Uses a shell command to find all .mp4 files under the specified directory (/opt/downloads/单词卡/A1-A2). Sort and Generate Items — Sorts videos by day number (dayXX) extracted from filenames and assigns a sequential order value. Calculate Publish Schedule (+12h Interval) — Computes the next rounded JST hour plus a configurable buffer (default 30 min). Staggers each video’s scheduled time by order × 12 hours. Converts JST back to UTC for YouTube’s publishAt field. Split in Batches (1 per video) — Iterates over each video item. Read Video File — Loads the corresponding video from disk. Upload to YouTube (Scheduled) — Uploads the video privately with the computed publishAtUtc. Add to Playlist — Adds the newly uploaded video to the target playlist. 🕒 Highlights Timezone-safe: Pure UTC ↔ JST conversion avoids double-offset errors. Sequential scheduling: Ensures each upload is 12 hours apart to prevent clustering. Customizable: Change SPANHOURS, BUFFERMIN, or directory paths easily. Retry-ready: Each upload and playlist step has retry logic to handle transient errors. 💡 Typical Use Cases Multi-part educational video series (e.g., A1–A2 English learning). Regular content release cadence without manual scheduling. Automated YouTube publishing pipelines for pre-produced content. --- Author: Zane Category: Automation / YouTube / Scheduler Timezone: JST (UTC+09:00)

By Zane

226

Monitor bank transactions with multi-channel alerts for accounting teams

Enhance financial oversight with this automated n8n workflow. Triggered every 5 minutes, it fetches real-time bank transactions via an API, enriches and transforms the data, and applies smart logic to detect critical, high, and medium priority alerts based on error conditions, amounts, or risk scores. It sends multi-channel notifications via email and Slack, logs all data to Google Sheets, and generates summary statistics for comprehensive tracking. 💰🚨 Key Features Real-time monitoring every 5 minutes for instant alerts. Smart prioritization (Critical, High, Medium) based on risk and errors. Multi-channel notifications via email and Slack. Detailed logging and summary reports in Google Sheets. How It Works Schedule Trigger: Runs every 5 minutes. Fetch Transactions: HTTP request retrieves real-time transaction data. API Error?: If condition for error logic is met, sends error alert. Enrich & Transform Data: Advanced risk calculation enhances data. Critical Alert?: If condition (50% or risk > 8) is met, raises alert. High Priority?: If condition (5% or risk > 7) is met, raises alert. Medium Priority?: If condition is met, raises alert. Log Priority to Sheet: Google Sheets appends critical, high, or medium priority data. Send Critical Email: HTML email to execute sheets append. Send High Priority Email: Email to finance team. Send High Priority Slack: Slack notification to finance team. Send Medium Priority Email: Email to finance team. Merge All Alerts: Combines all alerts for comprehensive tracking. Generate Summary Stats: Code block for analytics. Log Summary to Sheet: Summary statistics storage. Setup Instructions Import the workflow into n8n and configure the bank API credentials in "Fetch Transactions." Set up Google Sheets OAuth2 and replace the sheet ID for logging nodes. Configure Gmail API Key and Slack Bot Token for alerts. Test the workflow with sample transaction data exceeding risk or amount thresholds. Adjust priority conditions (e.g., 50%, 5%, risk > 8) based on your risk policy. Prerequisites Bank API access with real-time transaction data (e.g., https://api.bank.com) Google Sheets OAuth2 credentials Gmail API Key for email alerts Slack Bot Token (with chat:write permissions) Structured transaction data format Google Sheet Structure: Create a sheet with columns: Transaction ID Amount Date Risk Score Priority (Critical/High/Medium) Alert Sent Summary Stats Updated At Modification Options Adjust the "Schedule Trigger" interval (e.g., every 10 minutes). Modify "Critical Alert?" and "High Priority?" conditions for custom thresholds. Customize email and Slack templates with branded messaging. Integrate with fraud detection tools for enhanced risk analysis. Enhance "Generate Summary Stats" with additional metrics (e.g., average risk). Discover more workflows – Get in touch with us

By Oneclick AI Squad

333