Back to Catalog

Process documents & build semantic search with OpenAI, Gemini & Qdrant

JezJez
898 views
2/3/2026
Official Page

๐ŸŽฏ Overview

This n8n workflow automates the process of ingesting documents from multiple sources (Google Drive and web forms) into a Qdrant vector database for semantic search capabilities. It handles batch processing, document analysis, embedding generation, and vector storage - all while maintaining proper error handling and execution tracking.

๐Ÿš€ Key Features

  • Dual Input Sources: Accepts files from both Google Drive folders and web form uploads
  • Batch Processing: Processes files one at a time to prevent memory issues and ensure reliability
  • AI-Powered Analysis: Uses Google Gemini to extract metadata and understand document context
  • Vector Embeddings: Generates OpenAI embeddings for semantic search capabilities
  • Automated Cleanup: Optionally deletes processed files from Google Drive (configurable)
  • Loop Processing: Handles multiple files efficiently with Split In Batches nodes
  • Interactive Chat Interface: Built-in chatbot for testing semantic search queries against indexed documents

๐Ÿ“‹ Use Cases

  • Knowledge Base Creation: Build searchable document repositories for organizations
  • Document Compliance: Process and index legal/regulatory documents (like Fair Work documents)
  • Content Management: Automatically categorize and store uploaded documents
  • Research Libraries: Create semantic search capabilities for research papers or reports
  • Customer Support: Enable instant answers to policy and documentation questions via chat interface

๐Ÿ”ง Workflow Components

Input Methods

  1. Google Drive Integration

    • Monitors a specific folder for new files
    • Processes existing files in batch mode
    • Supports automatic file conversion to PDF
  2. Web Form Upload

    • Public-facing form for document submission
    • Accepts PDF, DOCX, DOC, and CSV files
    • Processes multiple file uploads in a single submission

Processing Pipeline

  1. File Splitting: Separates multiple uploads into individual items
  2. Document Analysis: Google Gemini extracts document understanding
  3. Text Extraction: Converts documents to plain text
  4. Embedding Generation: Creates vector embeddings via OpenAI
  5. Vector Storage: Inserts documents with embeddings into Qdrant
  6. Loop Control: Manages batch processing with proper state handling

Key Nodes

  • Split In Batches: Processes files one at a time with reset: false to maintain state
  • Google Gemini: Analyzes documents for context and metadata
  • Langchain Vector Store: Handles Qdrant insertion with embeddings
  • HTTP Request: Direct API calls for custom operations
  • Chat Interface: Interactive chatbot for testing vector search queries

๐Ÿ› ๏ธ Technical Implementation

Batch Processing Logic

The workflow uses a clever looping mechanism:

  • Split In Batches with batchSize: 1 ensures single-file processing
  • reset: false maintains loop state across iterations
  • Loop continues until all files are processed

Error Handling

  • All nodes include continueOnFail options where appropriate
  • Execution logs are preserved for debugging
  • File deletion only occurs after successful insertion

Data Flow

Form Upload โ†’ Split Files โ†’ Batch Loop โ†’ Analyze โ†’ Insert โ†’ Loop Back
Google Drive โ†’ List Files โ†’ Batch Loop โ†’ Download โ†’ Analyze โ†’ Insert โ†’ Delete โ†’ Loop Back

๐Ÿ“Š Performance Considerations

  • Processing Time: ~20-30 seconds per file
  • Batch Size: Set to 1 for reliability (configurable)
  • Memory Usage: Optimized for files under 10MB
  • API Costs: Uses OpenAI embeddings (text-embedding-3-large model)

๐Ÿ” Required Credentials

  1. Google Drive OAuth2: For file access and management
  2. OpenAI API: For embedding generation
  3. Qdrant API: For vector database operations
  4. Google Gemini API: For document analysis

๐Ÿ’ก Implementation Tips

  1. Start Small: Test with a few files before processing large batches
  2. Monitor Costs: Track OpenAI API usage for embedding generation
  3. Backup First: Consider archiving instead of deleting processed files
  4. Check Collections: Ensure Qdrant collection exists before running

๐ŸŽจ Customization Options

  • Change Embedding Model: Switch to text-embedding-3-small for cost savings
  • Adjust Chunk Size: Modify text splitting parameters for different document types
  • Add Metadata: Extend the Gemini prompt to extract specific fields
  • Archive vs Delete: Replace delete operation with move to "processed" folder

๐Ÿ“ˆ Real-World Application

This workflow was developed to process business documents and legal agreements, making them searchable through semantic queries. It's particularly useful for organizations dealing with large volumes of regulatory documentation that need to be quickly accessible and searchable.

Chat Interface Testing

The integrated chatbot interface allows users to:

  • Query processed documents using natural language
  • Test semantic search capabilities in real-time
  • Verify document indexing and retrieval accuracy
  • Ask questions about specific topics (e.g., "What are the pay rates for junior employees?")
  • Get instant AI-powered responses based on the indexed content

๐ŸŒŸ Benefits

  • Automation: Eliminates manual document processing
  • Scalability: Handles individual files or bulk uploads
  • Intelligence: AI-powered understanding of document content
  • Flexibility: Multiple input sources and processing options
  • Reliability: Robust error handling and state management

๐Ÿ‘จโ€๐Ÿ’ป About the Creator

Jeremy Dawes is the CEO of Jezweb, specializing in AI and automation deployment solutions. This workflow represents practical, production-ready automation that solves real business challenges while maintaining simplicity and reliability.

๐Ÿ“ Notes

  • The workflow intelligently handles the n8n form upload pattern where multiple files create a single item with multiple binary properties (Files_0, Files_1, etc.)
  • The Split In Batches pattern with reset: false is crucial for proper loop execution
  • Direct API integration provides more control than pure Langchain implementations

๐Ÿ”— Resources


This workflow demonstrates practical automation that bridges document management with modern AI capabilities, creating intelligent document processing systems that scale with your needs.

n8n Workflow: Document Processing for Semantic Search with OpenAI/Gemini and Qdrant

This n8n workflow automates the process of extracting text from documents in Google Drive, splitting the text into manageable chunks, generating embeddings using OpenAI, and storing these embeddings in a Qdrant vector database. It also includes an AI agent for conversational interaction with the stored documents, leveraging either OpenAI or Google Gemini.

What it does

This workflow streamlines the creation of a semantic search index from your documents, enabling powerful AI-driven querying.

  1. Triggers on document upload: The workflow can be initiated manually, via an n8n form submission, or automatically when a new file is uploaded to a specified Google Drive folder.
  2. Downloads documents from Google Drive: Retrieves the newly added or specified document from Google Drive.
  3. Loads document content: Extracts the text content from the downloaded document.
  4. Splits text into chunks: Divides the document's text into smaller, semantically meaningful segments using a Recursive Character Text Splitter.
  5. Generates embeddings: Uses OpenAI's embedding models to create vector representations (embeddings) for each text chunk.
  6. Stores embeddings in Qdrant: Indexes the generated embeddings and their associated text chunks into a Qdrant vector database.
  7. Enables AI-powered chat: An integrated AI agent, configurable with either Google Gemini or OpenAI, allows for conversational queries against the Qdrant vector store. This agent can retrieve relevant document chunks based on semantic similarity.

Prerequisites/Requirements

To use this workflow, you will need:

  • n8n instance: A running n8n instance (cloud or self-hosted).
  • Google Drive Account: For storing and triggering on document uploads.
  • OpenAI API Key: For generating text embeddings and potentially for the AI chat model.
  • Google Gemini API Key: (Optional) If you choose to use the Google Gemini Chat Model.
  • Qdrant Instance: A running Qdrant vector database instance (local or cloud).
  • n8n Langchain Nodes: Ensure you have the @n8n/n8n-nodes-langchain package installed in your n8n instance.

Setup/Usage

  1. Import the workflow: Download the JSON definition and import it into your n8n instance.
  2. Configure Credentials:
    • Set up a Google Drive credential for the "Google Drive Trigger" and "Google Drive" nodes.
    • Set up an OpenAI API credential for the "Embeddings OpenAI" node and potentially for the "AI Agent" node.
    • Set up a Google Gemini API credential (if using) for the "Google Gemini Chat Model" node.
    • Configure the Qdrant Vector Store node with your Qdrant instance details (host, API key if applicable, and collection name).
  3. Configure Trigger Nodes:
    • Google Drive Trigger: Specify the Google Drive folder to monitor for new files.
    • On form submission: (Optional) If using the form trigger, customize the form fields as needed.
    • When clicking โ€˜Execute workflowโ€™: (Optional) Use this for manual testing.
    • When chat message received: (Optional) If integrating with a chat interface, configure this trigger.
  4. Configure AI Agent:
    • Select your preferred Language Model (LLM) for the "AI Agent" (e.g., "Google Gemini Chat Model" or an OpenAI chat model).
    • Ensure the "Qdrant Vector Store" is connected as a tool for the AI Agent.
  5. Activate the workflow: Once configured, activate the workflow to start processing documents.

This workflow provides a robust foundation for building powerful semantic search capabilities over your document repositories.

Related Templates

Auto-create TikTok videos with VEED.io AI avatars, ElevenLabs & GPT-4

๐Ÿ’ฅ Viral TikTok Video Machine: Auto-Create Videos with Your AI Avatar --- ๐ŸŽฏ Who is this for? This workflow is for content creators, marketers, and agencies who want to use Veed.ioโ€™s AI avatar technology to produce short, engaging TikTok videos automatically. Itโ€™s ideal for creators who want to appear on camera without recording themselves, and for teams managing multiple brands who need to generate videos at scale. --- โš™๏ธ What problem this workflow solves Manually creating videos for TikTok can take hours โ€” finding trends, writing scripts, recording, and editing. By combining Veed.io, ElevenLabs, and GPT-4, this workflow transforms a simple Telegram input into a ready-to-post TikTok video featuring your AI avatar powered by Veed.io โ€” speaking naturally with your cloned voice. --- ๐Ÿš€ What this workflow does This automation links Veed.ioโ€™s video-generation API with multiple AI tools: Analyzes TikTok trends via Perplexity AI Writes a 10-second viral script using GPT-4 Generates your voiceover via ElevenLabs Uses Veed.io (Fabric 1.0 via FAL.ai) to animate your avatar and sync the lips to the voice Creates an engaging caption + hashtags for TikTok virality Publishes the video automatically via Blotato TikTok API Logs all results to Google Sheets for tracking --- ๐Ÿงฉ Setup Telegram Bot Create your bot via @BotFather Configure it as the trigger for sending your photo and theme Connect Veed.io Create an account on Veed.io Get your FAL.ai API key (Veed Fabric 1.0 model) Use HTTPS image/audio URLs compatible with Veed Fabric Other APIs Add Perplexity, ElevenLabs, and Blotato TikTok keys Connect your Google Sheet for logging results --- ๐Ÿ› ๏ธ How to customize this workflow Change your Avatar: Upload a new image through Telegram, and Veed.io will generate a new talking version automatically. Modify the Script Style: Adjust the GPT prompt for tone (educational, funny, storytelling). Adjust Voice Tone: Tweak ElevenLabs stability and similarity settings. Expand Platforms: Add Instagram, YouTube Shorts, or X (Twitter) posting nodes. Track Performance: Customize your Google Sheet to measure your most successful Veed.io-based videos. --- ๐Ÿง  Expected Outcome In just a few seconds after sending your photo and theme, this workflow โ€” powered by Veed.io โ€” creates a fully automated TikTok video featuring your AI avatar with natural lip-sync and voice. The result is a continuous stream of viral short videos, made without cameras, editing, or effort. --- โœ… Import the JSON file in n8n, add your API keys (including Veed.io via FAL.ai), and start generating viral TikTok videos starring your AI avatar today! ๐ŸŽฅ Watch This Tutorial --- ๐Ÿ“„ Documentation: Notion Guide Need help customizing? Contact me for consulting and support : Linkedin / Youtube

Dr. FirasBy Dr. Firas
39510

Track competitor SEO keywords with Decodo + GPT-4.1-mini + Google Sheets

This workflow automates competitor keyword research using OpenAI LLM and Decodo for intelligent web scraping. Who this is for SEO specialists, content strategists, and growth marketers who want to automate keyword research and competitive intelligence. Marketing analysts managing multiple clients or websites who need consistent SEO tracking without manual data pulls. Agencies or automation engineers using Google Sheets as an SEO data dashboard for keyword monitoring and reporting. What problem this workflow solves Tracking competitor keywords manually is slow and inconsistent. Most SEO tools provide limited API access or lack contextual keyword analysis. This workflow solves that by: Automatically scraping any competitorโ€™s webpage with Decodo. Using OpenAI GPT-4.1-mini to interpret keyword intent, density, and semantic focus. Storing structured keyword insights directly in Google Sheets for ongoing tracking and trend analysis. What this workflow does Trigger โ€” Manually start the workflow or schedule it to run periodically. Input Setup โ€” Define the website URL and target country (e.g., https://dev.to, france). Data Scraping (Decodo) โ€” Fetch competitor web content and metadata. Keyword Analysis (OpenAI GPT-4.1-mini) Extract primary and secondary keywords. Identify focus topics and semantic entities. Generate a keyword density summary and SEO strength score. Recommend optimization and internal linking opportunities. Data Structuring โ€” Clean and convert GPT output into JSON format. Data Storage (Google Sheets) โ€” Append structured keyword data to a Google Sheet for long-term tracking. Setup Prerequisites If you are new to Decode, please signup on this link visit.decodo.com n8n account with workflow editor access Decodo API credentials OpenAI API key Google Sheets account connected via OAuth2 Make sure to install the Decodo Community node. Create a Google Sheet Add columns for: primarykeywords, seostrengthscore, keyworddensity_summary, etc. Share with your n8n Google account. Connect Credentials Add credentials for: Decodo API credentials - You need to register, login and obtain the Basic Authentication Token via Decodo Dashboard OpenAI API (for GPT-4o-mini) Google Sheets OAuth2 Configure Input Fields Edit the โ€œSet Input Fieldsโ€ node to set your target site and region. Run the Workflow Click Execute Workflow in n8n. View structured results in your connected Google Sheet. How to customize this workflow Track Multiple Competitors โ†’ Use a Google Sheet or CSV list of URLs; loop through them using the Split In Batches node. Add Language Detection โ†’ Add a Gemini or GPT node before keyword analysis to detect content language and adjust prompts. Enhance the SEO Report โ†’ Expand the GPT prompt to include backlink insights, metadata optimization, or readability checks. Integrate Visualization โ†’ Connect your Google Sheet to Looker Studio for SEO performance dashboards. Schedule Auto-Runs โ†’ Use the Cron Node to run weekly or monthly for competitor keyword refreshes. Summary This workflow automates competitor keyword research using: Decodo for intelligent web scraping OpenAI GPT-4.1-mini for keyword and SEO analysis Google Sheets for live tracking and reporting Itโ€™s a complete AI-powered SEO intelligence pipeline ideal for teams that want actionable insights on keyword gaps, optimization opportunities, and content focus trends, without relying on expensive SEO SaaS tools.

Ranjan DailataBy Ranjan Dailata
161

Generate song lyrics and music from text prompts using OpenAI and Fal.ai Minimax

Spark your creativity instantly in any chatโ€”turn a simple prompt like "heartbreak ballad" into original, full-length lyrics and a professional AI-generated music track, all without leaving your conversation. ๐Ÿ“‹ What This Template Does This chat-triggered workflow harnesses AI to generate detailed, genre-matched song lyrics (at least 600 characters) from user messages, then queues them for music synthesis via Fal.ai's minimax-music model. It polls asynchronously until the track is ready, delivering lyrics and audio URL back in chat. Crafts original, structured lyrics with verses, choruses, and bridges using OpenAI Submits to Fal.ai for melody, instrumentation, and vocals aligned to the style Handles long-running generations with smart looping and status checks Returns complete song package (lyrics + audio link) for seamless sharing ๐Ÿ”ง Prerequisites n8n account (self-hosted or cloud with chat integration enabled) OpenAI account with API access for GPT models Fal.ai account for AI music generation ๐Ÿ”‘ Required Credentials OpenAI API Setup Go to platform.openai.com โ†’ API keys (sidebar) Click "Create new secret key" โ†’ Name it (e.g., "n8n Songwriter") Copy the key and add to n8n as "OpenAI API" credential type Test by sending a simple chat completion request Fal.ai HTTP Header Auth Setup Sign up at fal.ai โ†’ Dashboard โ†’ API Keys Generate a new API key โ†’ Copy it In n8n, create "HTTP Header Auth" credential: Name="Fal.ai", Header Name="Authorization", Header Value="Key [Your API Key]" Test with a simple GET to their queue endpoint (e.g., /status) โš™๏ธ Configuration Steps Import the workflow JSON into your n8n instance Assign OpenAI API credentials to the "OpenAI Chat Model" node Assign Fal.ai HTTP Header Auth to the "Generate Music Track", "Check Generation Status", and "Fetch Final Result" nodes Activate the workflowโ€”chat trigger will appear in your n8n chat interface Test by messaging: "Create an upbeat pop song about road trips" ๐ŸŽฏ Use Cases Content Creators: YouTubers generating custom jingles for videos on the fly, streamlining production from idea to audio export Educators: Music teachers using chat prompts to create era-specific folk tunes for classroom discussions, fostering interactive learning Gift Personalization: Friends crafting anniversary R&B tracks from shared memories via quick chats, delivering emotional audio surprises Artist Brainstorming: Songwriters prototyping hip-hop beats in real-time during sessions, accelerating collaboration and iteration โš ๏ธ Troubleshooting Invalid JSON from AI Agent: Ensure the system prompt stresses valid JSON; test the agent standalone with a sample query Music Generation Fails (401/403): Verify Fal.ai API key has minimax-music access; check usage quotas in dashboard Status Polling Loops Indefinitely: Bump wait time to 45-60s for complex tracks; inspect fal.ai queue logs for bottlenecks Lyrics Under 600 Characters: Tweak agent prompt to enforce fuller structures like [V1][C][V2][B][C]; verify output length in executions

Daniel NkenchoBy Daniel Nkencho
601