Mistral OCR, GPT-4o-mini, and Google Sheets integration for enhanced data processing
Process multiple invoices automatically using Mistral's dedicated OCR model—at approximately $0.002 per page. Upload batches of PDF, PNG, or JPG invoices through a simple form, extract structured financial data with AI, validate results with confidence scoring, and save everything to Google Sheets.
What this workflow does
- Accepts multiple invoice uploads via n8n Form Trigger
- Processes files in batch with rate limiting
- Converts each file to base64 and sends to Mistral OCR API
- Extracts 9 standard fields using GPT-4o-mini Information Extractor
- Validates data and assigns confidence scores (high/medium/low)
- Saves all results to Google Sheets with status tracking
Fields extracted
Invoice Number, Date, Vendor Name, Tax ID, Subtotal, Tax Rate, Tax Amount, Total Amount, Currency
Use cases
- Accountants processing client invoices in bulk
- Small businesses digitizing paper receipts
- Bookkeepers automating repetitive data entry
- Finance teams building searchable invoice databases
Setup requirements
- Mistral API Key (console.mistral.ai) — HTTP Header Auth credential
- OpenAI API Key (platform.openai.com)
- Google Sheets OAuth connection
- Google Sheet with 15 columns (template in workflow notes)
n8n Workflow: Mistral OCR, GPT-4o Mini, and Google Sheets Integration for Enhanced Data Processing
This n8n workflow automates the process of extracting structured information from text using a Large Language Model (LLM) and then storing that information in a Google Sheet. It's designed to streamline data entry and analysis by leveraging AI for intelligent data extraction.
What it does
This workflow simplifies and automates the following steps:
- Triggers on Form Submission: The workflow starts when an n8n form is submitted, likely containing text data to be processed.
- Extracts Information with AI: It uses an "Information Extractor" node (likely powered by an LLM like Mistral or GPT-4o Mini, based on the directory name) to parse the submitted text and extract structured data based on a defined schema.
- Processes with OpenAI Chat Model: The extracted information is then further processed or refined using an OpenAI Chat Model (e.g., GPT-4o Mini), allowing for more nuanced understanding or transformation of the data.
- Loops for Batch Processing: The workflow includes a "Loop Over Items" node, indicating it can handle multiple items or batches of data, ensuring efficient processing.
- Adds Extracted Data to Google Sheets: Finally, the structured and processed data is appended as new rows to a specified Google Sheet, providing a centralized and organized repository for the extracted information.
- Includes Delay for Rate Limiting: A "Wait" node is integrated, likely to manage API rate limits or introduce pauses between processing batches.
- Custom Logic with Code: A "Code" node allows for custom JavaScript logic to be executed at a specific point in the workflow, offering flexibility for data manipulation or conditional processing.
Prerequisites/Requirements
To use this workflow, you will need:
- n8n Instance: A running instance of n8n.
- Google Sheets Account: Access to a Google Sheets spreadsheet where the extracted data will be stored.
- OpenAI API Key: An API key for OpenAI to use the Chat Model.
- Langchain Credentials: Appropriate credentials for the Langchain nodes (Information Extractor, OpenAI Chat Model) within n8n.
- n8n Form: A configured n8n Form Trigger to initiate the workflow.
Setup/Usage
- Import the Workflow: Download the JSON provided and import it into your n8n instance.
- Configure Credentials:
- Set up your Google Sheets credentials.
- Configure your OpenAI API key within the "OpenAI Chat Model" node.
- Ensure all Langchain-related credentials are correctly set up.
- Configure Google Sheets Node: Specify the Spreadsheet ID and Sheet Name where you want the data to be added.
- Configure Form Trigger: Customize the n8n form fields to match the input you expect for your data extraction.
- Configure Information Extractor: Define the schema or instructions for the "Information Extractor" node to guide the AI on what data to extract from the input text.
- Customize Code Node (Optional): If needed, modify the JavaScript code in the "Code" node to perform any specific data transformations or logic.
- Activate the Workflow: Once configured, activate the workflow. It will now run automatically upon form submissions.
Related Templates
Generate song lyrics and music from text prompts using OpenAI and Fal.ai Minimax
Spark your creativity instantly in any chat—turn a simple prompt like "heartbreak ballad" into original, full-length lyrics and a professional AI-generated music track, all without leaving your conversation. 📋 What This Template Does This chat-triggered workflow harnesses AI to generate detailed, genre-matched song lyrics (at least 600 characters) from user messages, then queues them for music synthesis via Fal.ai's minimax-music model. It polls asynchronously until the track is ready, delivering lyrics and audio URL back in chat. Crafts original, structured lyrics with verses, choruses, and bridges using OpenAI Submits to Fal.ai for melody, instrumentation, and vocals aligned to the style Handles long-running generations with smart looping and status checks Returns complete song package (lyrics + audio link) for seamless sharing 🔧 Prerequisites n8n account (self-hosted or cloud with chat integration enabled) OpenAI account with API access for GPT models Fal.ai account for AI music generation 🔑 Required Credentials OpenAI API Setup Go to platform.openai.com → API keys (sidebar) Click "Create new secret key" → Name it (e.g., "n8n Songwriter") Copy the key and add to n8n as "OpenAI API" credential type Test by sending a simple chat completion request Fal.ai HTTP Header Auth Setup Sign up at fal.ai → Dashboard → API Keys Generate a new API key → Copy it In n8n, create "HTTP Header Auth" credential: Name="Fal.ai", Header Name="Authorization", Header Value="Key [Your API Key]" Test with a simple GET to their queue endpoint (e.g., /status) ⚙️ Configuration Steps Import the workflow JSON into your n8n instance Assign OpenAI API credentials to the "OpenAI Chat Model" node Assign Fal.ai HTTP Header Auth to the "Generate Music Track", "Check Generation Status", and "Fetch Final Result" nodes Activate the workflow—chat trigger will appear in your n8n chat interface Test by messaging: "Create an upbeat pop song about road trips" 🎯 Use Cases Content Creators: YouTubers generating custom jingles for videos on the fly, streamlining production from idea to audio export Educators: Music teachers using chat prompts to create era-specific folk tunes for classroom discussions, fostering interactive learning Gift Personalization: Friends crafting anniversary R&B tracks from shared memories via quick chats, delivering emotional audio surprises Artist Brainstorming: Songwriters prototyping hip-hop beats in real-time during sessions, accelerating collaboration and iteration ⚠️ Troubleshooting Invalid JSON from AI Agent: Ensure the system prompt stresses valid JSON; test the agent standalone with a sample query Music Generation Fails (401/403): Verify Fal.ai API key has minimax-music access; check usage quotas in dashboard Status Polling Loops Indefinitely: Bump wait time to 45-60s for complex tracks; inspect fal.ai queue logs for bottlenecks Lyrics Under 600 Characters: Tweak agent prompt to enforce fuller structures like [V1][C][V2][B][C]; verify output length in executions
Auto-reply & create Linear tickets from Gmail with GPT-5, gotoHuman & human review
This workflow automatically classifies every new email from your linked mailbox, drafts a personalized reply, and creates Linear tickets for bugs or feature requests. It uses a human-in-the-loop with gotoHuman and continuously improves itself by learning from approved examples. How it works The workflow triggers on every new email from your linked mailbox. Self-learning Email Classifier: an AI model categorizes the email into defined categories (e.g., Bug Report, Feature Request, Sales Opportunity, etc.). It fetches previously approved classification examples from gotoHuman to refine decisions. Self-learning Email Writer: the AI drafts a reply to the email. It learns over time by using previously approved replies from gotoHuman, with per-classification context to tailor tone and style (e.g., different style for sales vs. bug reports). Human Review in gotoHuman: review the classification and the drafted reply. Drafts can be edited or retried. Approved values are used to train the self-learning agents. Send approved Reply: the approved response is sent as a reply to the email thread. Create ticket: if the classification is Bug or Feature Request, a ticket is created by another AI agent in Linear. Human Review in gotoHuman: How to set up Most importantly, install the gotoHuman node before importing this template! (Just add the node to a blank canvas before importing) Set up credentials for gotoHuman, OpenAI, your email provider (e.g. Gmail), and Linear. In gotoHuman, select and create the pre-built review template "Support email agent" or import the ID: 6fzuCJlFYJtlu9mGYcVT. Select this template in the gotoHuman node. In the "gotoHuman: Fetch approved examples" http nodes you need to add your formId. It is the ID of the review template that you just created/imported in gotoHuman. Requirements gotoHuman (human supervision, memory for self-learning) OpenAI (classification, drafting) Gmail or your preferred email provider (for email trigger+replies) Linear (ticketing) How to customize Expand or refine the categories used by the classifier. Update the prompt to reflect your own taxonomy. Filter fetched training data from gotoHuman by reviewer so the writer adapts to their personalized tone and preferences. Add more context to the AI email writer (calendar events, FAQs, product docs) to improve reply quality.
Ai website scraper & company intelligence
AI Website Scraper & Company Intelligence Description This workflow automates the process of transforming any website URL into a structured, intelligent company profile. It's triggered by a form, allowing a user to submit a website and choose between a "basic" or "deep" scrape. The workflow extracts key information (mission, services, contacts, SEO keywords), stores it in a structured Supabase database, and archives a full JSON backup to Google Drive. It also features a secondary AI agent that automatically finds and saves competitors for each company, building a rich, interconnected database of company intelligence. --- Quick Implementation Steps Import the Workflow: Import the provided JSON file into your n8n instance. Install Custom Community Node: You must install the community node from: https://www.npmjs.com/package/n8n-nodes-crawl-and-scrape FIRECRAWL N8N Documentation https://docs.firecrawl.dev/developer-guides/workflow-automation/n8n Install Additional Nodes: n8n-nodes-crawl-and-scrape and n8n-nodes-mcp fire crawl mcp . Set up Credentials: Create credentials in n8n for FIRE CRAWL API,Supabase, Mistral AI, and Google Drive. Configure API Key (CRITICAL): Open the Web Search tool node. Go to Parameters → Headers and replace the hardcoded Tavily AI API key with your own. Configure Supabase Nodes: Assign your Supabase credential to all Supabase nodes. Ensure table names (e.g., companies, competitors) match your schema. Configure Google Drive Nodes: Assign your Google Drive credential to the Google Drive2 and save to Google Drive1 nodes. Select the correct Folder ID. Activate Workflow: Turn on the workflow and open the Webhook URL in the “On form submission” node to access the form. --- What It Does Form Trigger Captures user input: “Website URL” and “Scraping Type” (basic or deep). Scraping Router A Switch node routes the flow: Deep Scraping → AI-based MCP Firecrawler agent. Basic Scraping → Crawlee node. Deep Scraping (Firecrawl AI Agent) Uses Firecrawl and Tavily Web Search. Extracts a detailed JSON profile: mission, services, contacts, SEO keywords, etc. Basic Scraping (Crawlee) Uses Crawl and Scrape node to collect raw text. A Mistral-based AI extractor structures the data into JSON. Data Storage Stores structured data in Supabase tables (companies, company_basicprofiles). Archives a full JSON backup to Google Drive. Automated Competitor Analysis Runs after a deep scrape. Uses Tavily web search to find competitors (e.g., from Crunchbase). Saves competitor data to Supabase, linked by company_id. --- Who's It For Sales & Marketing Teams: Enrich leads with deep company info. Market Researchers: Build structured, searchable company databases. B2B Data Providers: Automate company intelligence collection. Developers: Use as a base for RAG or enrichment pipelines. --- Requirements n8n instance (self-hosted or cloud) Supabase Account: With tables like companies, competitors, social_links, etc. Mistral AI API Key Google Drive Credentials Tavily AI API Key (Optional) Custom Nodes: n8n-nodes-crawl-and-scrape --- How It Works Flow Summary Form Trigger: Captures “Website URL” and “Scraping Type”. Switch Node: deep → MCP Firecrawler (AI Agent). basic → Crawl and Scrape node. Scraping & Extraction: Deep path: Firecrawler → JSON structure. Basic path: Crawlee → Mistral extractor → JSON. Storage: Save JSON to Supabase. Archive in Google Drive. Competitor Analysis (Deep Only): Finds competitors via Tavily. Saves to Supabase competitors table. End: Finishes with a No Operation node. --- How To Set Up Import workflow JSON. Install community nodes (especially n8n-nodes-crawl-and-scrape from npm). Configure credentials (Supabase, Mistral AI, Google Drive). Add your Tavily API key. Connect Supabase and Drive nodes properly. Fix disconnected “basic” path if needed. Activate workflow. Test via the webhook form URL. --- How To Customize Change LLMs: Swap Mistral for OpenAI or Claude. Edit Scraper Prompts: Modify system prompts in AI agent nodes. Change Extraction Schema: Update JSON Schema in extractor nodes. Fix Relational Tables: Add Items node before Supabase inserts for arrays (social links, keywords). Enhance Automation: Add email/slack notifications, or replace form trigger with a Google Sheets trigger. --- Add-ons Automated Trigger: Run on new sheet rows. Notifications: Email or Slack alerts after completion. RAG Integration: Use the Supabase database as a chatbot knowledge source. --- Use Case Examples Sales Lead Enrichment: Instantly get company + competitor data from a URL. Market Research: Collect and compare companies in a niche. B2B Database Creation: Build a proprietary company dataset. --- WORKFLOW IMAGE --- Troubleshooting Guide | Issue | Possible Cause | Solution | |-------|----------------|-----------| | Form Trigger 404 | Workflow not active | Activate the workflow | | Web Search Tool fails | Missing Tavily API key | Replace the placeholder key | | FIRECRAWLER / find competitor fails | Missing MCP node | Install n8n-nodes-mcp | | Basic scrape does nothing | Switch node path disconnected | Reconnect “basic” output | | Supabase node error | Wrong table/column names | Match schema exactly | --- Need Help or More Workflows? Want to customize this workflow for your business or integrate it with your existing tools? Our team at Digital Biz Tech can tailor it precisely to your use case from automation logic to AI-powered enhancements. Contact: shilpa.raju@digitalbiz.tech For more such offerings, visit us: https://www.digitalbiz.tech ---