Back to Catalog

Easy image captioning with Gemini 1.5 Pro

JimleukJimleuk
13894 views
2/3/2026
Official Page

This n8n workflow demonstrates how to automate image captioning tasks using Gemini 1.5 Pro - a multimodal LLM which can accept and analyse images. This is a really simple example of how easy it is to build and leverage powerful AI models in your repetitive tasks.

How it works

  • For this demo, we'll import a public image from a popular stock photography website, Pexel.com, into our workflow using the HTTP request node.
  • With multimodal LLMs, there is little do preprocess other than ensuring the image dimensions fit within the LLMs accepted limits. Though not essential, we'll resize the image using the Edit image node to achieve fast processing.
  • The image is used as an input to the basic LLM node by defining a "user message" entry with the binary (data) type.
  • The LLM node has the Gemini 1.5 Pro language model attached and we'll prompt it to generate a caption title and text appropriate for the image it sees.
  • Once generated, the generated caption text is positioning over the original image to complete the task. We can calculate the positioning relative to the amount of characters produced using the code node.

An example of the combined image and caption can be found here: https://res.cloudinary.com/daglih2g8/image/upload/f_auto,q_auto/v1/n8n-workflows/l5xbb4ze4wyxwwefqmnc

Requirements

  • Google Gemini API Key.
  • Access to Google Drive.

Customising the workflow

  • Not using Google Gemini? n8n's basic LLM node supports the standard syntax for image content for models that support it - try using GPT4o, Claude or LLava (via Ollama).

  • Google Drive is only used for demonstration purposes. Feel free to swap this out for other triggers such as webhooks to fit your use case.

n8n Workflow: Easy Image Captioning with Gemini 1.5 Pro

This n8n workflow simplifies the process of generating descriptive captions for images using Google's Gemini 1.5 Pro large language model. It allows you to upload an image, and in return, receive a structured JSON output containing a generated caption.

What it does

This workflow performs the following steps:

  1. Manual Trigger: Initiates the workflow upon manual execution.
  2. Edit Image: (Currently unused in the provided JSON, but available for image manipulation if needed). This node can blur, add borders, crop, composite, draw, get info, rotate, resize, shear, or add text to an image.
  3. HTTP Request: Sends an HTTP request to an external service (likely an image hosting or processing service) to retrieve an image.
  4. Code: Processes the image data received from the HTTP Request node, likely converting it into a format suitable for the Gemini model.
  5. Google Gemini Chat Model: Utilizes the Google Gemini 1.5 Pro model to analyze the provided image and generate a descriptive caption.
  6. Basic LLM Chain: (Currently unused in the provided JSON, but available for chaining multiple LLM operations). This node facilitates complex interactions with large language models.
  7. Structured Output Parser: Parses the output from the Gemini model, ensuring the generated caption is structured into a consistent JSON format.
  8. Merge: (Currently unused in the provided JSON, but available for combining data from multiple branches). This node can join or concatenate data streams.

Prerequisites/Requirements

  • n8n Instance: A running n8n instance (cloud or self-hosted).
  • Google Gemini API Key: An API key for accessing the Google Gemini Pro 1.5 model. This will need to be configured as a credential within n8n for the "Google Gemini Chat Model" node.
  • Image URL: An accessible URL for the image you want to caption. This will be provided to the "HTTP Request" node.

Setup/Usage

  1. Import the Workflow:
    • Download the provided JSON file.
    • In your n8n instance, click "Workflows" in the left sidebar.
    • Click "New" -> "Import from JSON" and upload the downloaded file.
  2. Configure Credentials:
    • Locate the "Google Gemini Chat Model" node.
    • Click on the "Credential" field and either select an existing Google Gemini API credential or create a new one by providing your API key.
  3. Configure HTTP Request:
    • Open the "HTTP Request" node.
    • Update the "URL" field with the URL of the image you want to caption.
  4. Execute the Workflow:
    • Click the "Execute Workflow" button (the play icon) on the "Manual Trigger" node.
    • The workflow will run, and the final output (the generated caption in JSON format) will be available in the output of the "Structured Output Parser" node.

Related Templates

Create, update, and get a person from Copper

This workflow allows you to create, update, and get a person from Copper. Copper node: This node will create a new person in Copper. Copper1 node: This node will update the information of the person that we created using the previous node. Copper2 node: This node will retrieve the information of the person that we created earlier.

Harshil AgrawalBy Harshil Agrawal
603

Competitor intelligence agent: SERP monitoring + summary with Thordata + OpenAI

Who this is for? This workflow is designed for: Marketing analysts, SEO specialists, and content strategists who want automated intelligence on their online competitors. Growth teams that need quick insights from SERP (Search Engine Results Pages) without manual data scraping. Agencies managing multiple clients’ SEO presence and tracking competitive positioning in real-time. What problem is this workflow solving? Manual competitor research is time-consuming, fragmented, and often lacks actionable insights. This workflow automates the entire process by: Fetching SERP results from multiple search engines (Google, Bing, Yandex, DuckDuckGo) using Thordata’s Scraper API. Using OpenAI GPT-4.1-mini to analyze, summarize, and extract keyword opportunities, topic clusters, and competitor weaknesses. Producing structured, JSON-based insights ready for dashboards or reports. Essentially, it transforms raw SERP data into strategic marketing intelligence — saving hours of research time. What this workflow does Here’s a step-by-step overview of how the workflow operates: Step 1: Manual Trigger Initiates the process on demand when you click “Execute Workflow.” Step 2: Set the Input Query The “Set Input Fields” node defines your search query, such as: > “Top SEO strategies for e-commerce in 2025” Step 3: Multi-Engine SERP Fetching Four HTTP request tools send the query to Thordata Scraper API to retrieve results from: Google Bing Yandex DuckDuckGo Each uses Bearer Authentication configured via “Thordata SERP Bearer Auth Account.” Step 4: AI Agent Processing The LangChain AI Agent orchestrates the data flow, combining inputs and preparing them for structured analysis. Step 5: SEO Analysis The SEO Analyst node (powered by GPT-4.1-mini) parses SERP results into a structured schema, extracting: Competitor domains Page titles & content types Ranking positions Keyword overlaps Traffic share estimations Strengths and weaknesses Step 6: Summarization The Summarize the content node distills complex data into a concise executive summary using GPT-4.1-mini. Step 7: Keyword & Topic Extraction The Keyword and Topic Analysis node extracts: Primary and secondary keywords Topic clusters and content gaps SEO strength scores Competitor insights Step 8: Output Formatting The Structured Output Parser ensures results are clean, validated JSON objects for further integration (e.g., Google Sheets, Notion, or dashboards). Setup Prerequisites n8n Cloud or Self-Hosted instance Thordata Scraper API Key (for SERP data retrieval) OpenAI API Key (for GPT-based reasoning) Setup Steps Add Credentials Go to Credentials → Add New → HTTP Bearer Auth* → Paste your Thordata API token. Add OpenAI API Credentials* for the GPT model. Import the Workflow Copy the provided JSON or upload it into your n8n instance. Set Input In the “Set the Input Fields” node, replace the example query with your desired topic, e.g.: “Google Search for Top SEO strategies for e-commerce in 2025” Execute Click “Execute Workflow” to run the analysis. How to customize this workflow to your needs Modify Search Query Change the search_query variable in the Set Node to any target keyword or topic. Change AI Model In the OpenAI Chat Model nodes, you can switch from gpt-4.1-mini to another model for better quality or lower cost. Extend Analysis Edit the JSON schema in the “Information Extractor” nodes to include: Sentiment analysis of top pages SERP volatility metrics Content freshness indicators Export Results Connect the output to: Google Sheets / Airtable for analytics Notion / Slack for team reporting Webhook / Database for automated storage Summary This workflow creates an AI-powered Competitor Intelligence System inside n8n by blending: Real-time SERP scraping (Thordata) Automated AI reasoning (OpenAI GPT-4.1-mini) Structured data extraction (LangChain Information Extractors)

Ranjan DailataBy Ranjan Dailata
632

Client review collection & sentiment analysis with HighLevel, GPT-4o, Gmail & Slack

📘 Description: This automation streamlines client review collection and sentiment summarization for Techdome using HighLevel CRM, Azure OpenAI GPT-4o, Gmail, Slack, and Google Sheets. It starts by pulling recently won deals from HighLevel, then generates and sends AI-written HTML review request emails with built-in Google Review and feedback form links. After waiting 24 hours, it fetches the client’s reply thread, summarizes the sentiment using GPT-4o, and posts a clean update to Slack for team visibility. Any failures—API errors, empty responses, or data validation issues—are logged automatically to Google Sheets for full transparency and QA. The result: a fully hands-free Client Appreciation + Feedback Intelligence Loop, improving brand perception and internal responsiveness. ⚙️ What This Workflow Does (Step-by-Step) ▶️ When Clicking ‘Execute Workflow’ (Manual Trigger) Allows on-demand execution or scheduled testing of the workflow. Initiates the fetch for all newly “Won” deals from HighLevel CRM. 🏆 Fetch All Won Deals from HighLevel Retrieves all opportunities labeled “won” in HighLevel, gathering essential client details such as name, email, and deal information to personalize outgoing emails. 🔍 Validate Deal Fetch Success (IF Node) Checks each record for a valid id field. ✅ True Path: Moves ahead to generate AI email content. ❌ False Path: Logs the event to Google Sheets under the error log sheet. 🧠 Configure GPT-4o Model (Azure OpenAI) Initializes the GPT-4o engine that powers all language-generation tasks in this workflow—ensuring precise tone, correct formatting, and safe structured HTML output. 💌 Generate Personalized Review Request Email (AI Agent) Uses GPT-4o to create a tailored, HTML-formatted email thanking the client for their business and requesting feedback. Includes two clickable CTA buttons: ⭐ Google Review Link: 📝 Internal Feedback Form: Google Form link for in-depth feedback Each email maintains Techdome’s friendly, brand-consistent voice with clean inline CSS styling. 📨 Send Review Request Email to Client (Gmail Node) Automatically sends the AI-generated email to the client’s registered address through Gmail. Ensures timely post-service communication without manual follow-ups. ⏳ Wait for 24 Hours Before Next Action Pauses the workflow for 24 hours to give clients time to read and respond to the review request. 📥 Retrieve Email Thread for Response (Gmail Node) After the waiting period, fetches the Gmail thread associated with the initial email to capture client replies or feedback messages. 🧠 Configure GPT-4o Model (Summarization Engine) Prepares another GPT-4o instance specialized for summarizing client replies into concise, sentiment-aware Slack messages. 💬 Summarize Client Feedback (AI Agent) Analyzes the Gmail thread and produces a short Slack-formatted summary using this structure: 🎉 New Client Review Received!Client: <Name> Feedback: <Message snippet> Sentiment: Positive / Neutral / Negative Focuses on tone clarity and quick readability for internal teams. 📢 Announce Review Summary in Slack Posts the AI-generated summary in a designated Slack channel, keeping success and support teams instantly informed of client sentiments and feedback trends. 📊 Log Errors in Google Sheets Appends all failures—including fetch issues, missing fields, or parsing errors—to the Google Sheets “error log sheet,” maintaining workflow reliability and accountability. 🧩 Prerequisites HighLevel CRM OAuth credentials (to fetch deals) Azure OpenAI GPT-4o access (for AI-driven writing and summarization) Gmail API connection (for sending & reading threads) Slack API integration (for posting summaries) Google Sheets access (for error logging) 💡 Key Benefits ✅ Automates personalized review outreach after project completion ✅ Waits intelligently before analyzing responses ✅ Uses GPT-4o to summarize client sentiment in human tone ✅ Sends instant Slack updates for real-time visibility ✅ Keeps audit logs of all errors for debugging 👥 Perfect For Client Success and Account Management Teams Agencies using HighLevel CRM for project delivery Teams aiming to collect consistent client feedback and reviews Businesses wanting AI-assisted sentiment insights in Slack

Rahul JoshiBy Rahul Joshi
159