Extract embedded images from Google Drive documents with VLM run agent

41 views

2/3/2026

SFTP Lead Management Data Integration Salesforce CSV

🧾 Image Extraction Pipeline (Google Drive + VLM Run + n8n)

⚙️ What This Workflow Does

This workflow automates the process of extracting images from uploaded documents in Google Drive using the VLM Run Execute Agent, then downloads and saves those extracted images into a designated Drive folder.

🧩 Requirements

Google Drive OAuth2 credentials
VLM Run API credentials with Execute Agent access
A reachable n8n Webhook URL (e.g., /image-extract-via-agent)

⚡Quick Setup

Configure Google Drive OAuth2 and create upload folder and folder for saving extracted images.
Install the verified VLM Run node by searching for VLM Run in the node list, then click Install. Once installed, you can start using it in your workflows.
Add VLM Run API credentials for document parsing.

⚙️ How It Works

Monitor Uploads – The workflow watches a specific Google Drive folder for new file uploads (e.g., receipts, reports, or PDFs).
Download File – When a file is created, it’s automatically downloaded in binary form.
Extract Images (VLM Run) – The file is sent to the VLM Run Execute Agent, which analyzes the document and extracts image URLs via its callback.
Receive Image Links (Webhook) – The workflow’s Webhook node listens for the agent’s response containing extracted image URLs.
Split & Download – The Split Out node processes each extracted link, and the HTTP Request node downloads each image.
Save Image – Finally, each image is uploaded to your chosen Google Drive folder for storage or further processing.

💡Why Use This Workflow

Manual image extraction from PDFs and scanned files is repetitive and error-prone. This pipeline automates it using VLM Run, a vision-language AI service that:

Understands document layout and structure
Handles multi-page and mixed-content files
Extracts accurate image data with minimal setup. For example- the output contains URLs to extracted images

{
  "image_urls": [
    "https://vlm.run/api/files/img1.jpg",
    "https://vlm.run/api/files/img2.jpg"
  ]
}

Works with both images and PDFs

🧠 Perfect For

Extracting photos or receipts from multi-page PDFs
Archiving embedded images from reports or invoices
Preparing image datasets for labeling or ML model training

🛠️ How to Customize

You can extend this workflow by:
Adding naming conventions or folder structures based on upload type
Integrating Slack/Email notifications when extraction completes
Including metadata logging (file name, timestamp, source) into Google Sheets or a database
Chaining with classification or OCR workflows using VLM Run’s other agents

⚠️ Community Node Disclaimer

This workflow uses community nodes (VLM Run) that may need additional permissions and custom setup.

Extract Embedded Images from Google Drive Documents with VLM Run Agent

This n8n workflow automates the process of extracting embedded images from newly added or modified Google Drive documents. It leverages a VLM (Vision-Language Model) Run Agent to process the document and identify images, then downloads these images.

What it does

Triggers on Google Drive Changes: The workflow starts whenever a new file is added or an existing file is modified in a specified Google Drive folder.
Retrieves File Content: It fetches the content of the changed Google Drive file.
Sends to VLM Run Agent: The file content is then sent to an external API (VLM Run Agent) via an HTTP Request. This agent is responsible for processing the document and identifying embedded images.
Splits Output: The response from the VLM Run Agent is processed, likely splitting an array of identified images or data into individual items for further processing.
Downloads Images (Implicit): Although not explicitly detailed in the provided JSON, the typical next step after splitting the output from a VLM agent that identifies images would be to download those images, possibly using another Google Drive node or an HTTP Request if the VLM agent provides direct image URLs.

Prerequisites/Requirements

n8n Instance: A running instance of n8n.
Google Drive Account: A Google account with access to Google Drive.
Google Drive Credential: An n8n credential configured for your Google Drive account (OAuth 2.0).
VLM Run Agent API Endpoint: An external API endpoint for a Vision-Language Model (VLM) Run Agent that can process document content and extract image information.
VLM Run Agent API Key/Authentication: Appropriate authentication (e.g., API key, bearer token) for the VLM Run Agent API.

Setup/Usage

Import the workflow: Import the provided JSON into your n8n instance.
Configure Google Drive Trigger:
- Select your Google Drive credential.
- Specify the Google Drive folder you want to monitor for new or modified documents.
Configure Google Drive Node:
- Ensure it uses the same Google Drive credential.
- Verify the operation is set to "Get" to retrieve the file content.
Configure HTTP Request Node ("HTTP Request"):
- Set the URL to your VLM Run Agent's API endpoint.
- Configure the Method (e.g., POST).
- Set up Headers for authentication (e.g., Authorization: Bearer YOUR_VLM_API_KEY).
- Configure the Body to send the Google Drive file content to the VLM Run Agent. You might need to adjust the data format based on your VLM agent's requirements (e.g., sending the binary data, a base64 encoded string, or a public URL if the VLM agent supports it).
Review Split Out Node: This node is designed to handle the output from the VLM Run Agent. You might need to configure the "Field to Split Out" property to correctly parse the array of image data returned by your VLM agent.
Add Subsequent Nodes (Optional but Recommended): After the Split Out node, you will likely want to add nodes to:
- Download the extracted images (e.g., another HTTP Request if the VLM provides image URLs, or a Google Drive upload node if you want to save them back to Drive).
- Process the image metadata.
- Send notifications (e.g., Slack, Email) about the extracted images.
Activate the workflow: Once configured, activate the workflow to start monitoring your Google Drive folder.

Related Templates

Generate song lyrics and music from text prompts using OpenAI and Fal.ai Minimax

Spark your creativity instantly in any chat—turn a simple prompt like "heartbreak ballad" into original, full-length lyrics and a professional AI-generated music track, all without leaving your conversation. 📋 What This Template Does This chat-triggered workflow harnesses AI to generate detailed, genre-matched song lyrics (at least 600 characters) from user messages, then queues them for music synthesis via Fal.ai's minimax-music model. It polls asynchronously until the track is ready, delivering lyrics and audio URL back in chat. Crafts original, structured lyrics with verses, choruses, and bridges using OpenAI Submits to Fal.ai for melody, instrumentation, and vocals aligned to the style Handles long-running generations with smart looping and status checks Returns complete song package (lyrics + audio link) for seamless sharing 🔧 Prerequisites n8n account (self-hosted or cloud with chat integration enabled) OpenAI account with API access for GPT models Fal.ai account for AI music generation 🔑 Required Credentials OpenAI API Setup Go to platform.openai.com → API keys (sidebar) Click "Create new secret key" → Name it (e.g., "n8n Songwriter") Copy the key and add to n8n as "OpenAI API" credential type Test by sending a simple chat completion request Fal.ai HTTP Header Auth Setup Sign up at fal.ai → Dashboard → API Keys Generate a new API key → Copy it In n8n, create "HTTP Header Auth" credential: Name="Fal.ai", Header Name="Authorization", Header Value="Key [Your API Key]" Test with a simple GET to their queue endpoint (e.g., /status) ⚙️ Configuration Steps Import the workflow JSON into your n8n instance Assign OpenAI API credentials to the "OpenAI Chat Model" node Assign Fal.ai HTTP Header Auth to the "Generate Music Track", "Check Generation Status", and "Fetch Final Result" nodes Activate the workflow—chat trigger will appear in your n8n chat interface Test by messaging: "Create an upbeat pop song about road trips" 🎯 Use Cases Content Creators: YouTubers generating custom jingles for videos on the fly, streamlining production from idea to audio export Educators: Music teachers using chat prompts to create era-specific folk tunes for classroom discussions, fostering interactive learning Gift Personalization: Friends crafting anniversary R&B tracks from shared memories via quick chats, delivering emotional audio surprises Artist Brainstorming: Songwriters prototyping hip-hop beats in real-time during sessions, accelerating collaboration and iteration ⚠️ Troubleshooting Invalid JSON from AI Agent: Ensure the system prompt stresses valid JSON; test the agent standalone with a sample query Music Generation Fails (401/403): Verify Fal.ai API key has minimax-music access; check usage quotas in dashboard Status Polling Loops Indefinitely: Bump wait time to 45-60s for complex tracks; inspect fal.ai queue logs for bottlenecks Lyrics Under 600 Characters: Tweak agent prompt to enforce fuller structures like [V1][C][V2][B][C]; verify output length in executions

By Daniel Nkencho

601

Auto-reply & create Linear tickets from Gmail with GPT-5, gotoHuman & human review

This workflow automatically classifies every new email from your linked mailbox, drafts a personalized reply, and creates Linear tickets for bugs or feature requests. It uses a human-in-the-loop with gotoHuman and continuously improves itself by learning from approved examples. How it works The workflow triggers on every new email from your linked mailbox. Self-learning Email Classifier: an AI model categorizes the email into defined categories (e.g., Bug Report, Feature Request, Sales Opportunity, etc.). It fetches previously approved classification examples from gotoHuman to refine decisions. Self-learning Email Writer: the AI drafts a reply to the email. It learns over time by using previously approved replies from gotoHuman, with per-classification context to tailor tone and style (e.g., different style for sales vs. bug reports). Human Review in gotoHuman: review the classification and the drafted reply. Drafts can be edited or retried. Approved values are used to train the self-learning agents. Send approved Reply: the approved response is sent as a reply to the email thread. Create ticket: if the classification is Bug or Feature Request, a ticket is created by another AI agent in Linear. Human Review in gotoHuman: How to set up Most importantly, install the gotoHuman node before importing this template! (Just add the node to a blank canvas before importing) Set up credentials for gotoHuman, OpenAI, your email provider (e.g. Gmail), and Linear. In gotoHuman, select and create the pre-built review template "Support email agent" or import the ID: 6fzuCJlFYJtlu9mGYcVT. Select this template in the gotoHuman node. In the "gotoHuman: Fetch approved examples" http nodes you need to add your formId. It is the ID of the review template that you just created/imported in gotoHuman. Requirements gotoHuman (human supervision, memory for self-learning) OpenAI (classification, drafting) Gmail or your preferred email provider (for email trigger+replies) Linear (ticketing) How to customize Expand or refine the categories used by the classifier. Update the prompt to reflect your own taxonomy. Filter fetched training data from gotoHuman by reviewer so the writer adapts to their personalized tone and preferences. Add more context to the AI email writer (calendar events, FAQs, product docs) to improve reply quality.

By gotoHuman

353

Synchronizing WooCommerce inventory and creating products with Google Gemini AI and BrowserAct

Synchronize WooCommerce Inventory & Create Products with Gemini AI & BrowserAct This sophisticated n8n template automates WooCommerce inventory management by scraping supplier data, updating existing products, and intelligently creating new ones with AI-formatted descriptions. This workflow is essential for e-commerce operators, dropshippers, and inventory managers who need to ensure their product pricing and stock levels are synchronized with multiple third-party suppliers, minimizing overselling and maximizing profit. --- Self-Hosted Only This Workflow uses a community contribution and is designed and tested for self-hosted n8n instances only. --- How it works The workflow is typically run by a Schedule Trigger (though a Manual Trigger is also shown) to check stock automatically. It reads a list of suppliers and their inventory page URLs from a central Google Sheet. The workflow loops through each supplier: A BrowserAct node scrapes the current stock and price data from the supplier's inventory page. A Code node parses this bulk data into individual product items. It then loops through each individual product found. The workflow checks WooCommerce to see if the product already exists based on its name. If the product exists: It proceeds to update the existing product's price and stock quantity. If the product DOES NOT exist: An If node checks if the missing product's category matches a predefined type (optional filtering). If it passes the filter, a second BrowserAct workflow scrapes detailed product attributes from a dedicated product page (e.g., DigiKey). An AI Agent (Gemini) transforms these attributes into a specific, styled HTML table for the product description. Finally, the product is created in WooCommerce with all scraped details and the AI-generated description. Error Handling: Multiple Slack nodes are configured to alert your team immediately if any scraping task fails or if the product update/creation process encounters an issue. Note: This workflow does not support image uploads for new products. To enable this functionality, you must modify both the n8n and BrowserAct workflows. --- Requirements BrowserAct API account for web scraping BrowserAct n8n Community Node -> (n8n Nodes BrowserAct) BrowserAct templates named “WooCommerce Inventory & Stock Synchronization” and “WooCommerce Product Data Reconciliation” Google Sheets credentials for the supplier list WooCommerce credentials for product management Google Gemini account for the AI Agent Slack credentials for error alerts --- Need Help? How to Find Your BrowseAct API Key & Workflow ID How to Connect n8n to Browseract How to Use & Customize BrowserAct Templates How to Use the BrowserAct N8N Community Node --- Workflow Guidance and Showcase STOP Overselling! Auto-Sync WooCommerce Inventory from ANY Supplier

By Madame AI Team | Kai

600