Back to Catalog

WhatsApp AI agent that understand text, image , audio

Tharwat MohamedTharwat Mohamed
169 views
2/3/2026
Official Page

Overview πŸ“Œ

This template is a multimodal WhatsApp assistant that understands text, images, and audio, aggregates media inputs, and returns intelligent replies using Google Gemini. It can fetch knowledge from Google Docs, log conversations into Google Sheets, and respond via WhatsApp β€” all orchestrated inside n8n.

Features ✨

Multimodal input handling: Receives images and audio from WhatsApp, analyzes them, and sends contextual responses. πŸ–ΌοΈπŸŽ§

Audio transcription: Converts voice messages to text for analysis. πŸ”Šβž‘οΈπŸ“

Image analysis: Extracts objects/labels/text from images to inform responses. πŸ–ΌοΈπŸ”

Knowledge fetch: Pulls relevant documents from Google Docs for richer answers. πŸ“„πŸ“š

Conversation logging: Appends each interaction to Google Sheets for auditing or analytics. πŸ“ˆ

Modular design: Clear subflows for media reception, transcription, image analysis, aggregation, and the AI Agent. βš™οΈ

Ready for customization: Swap models, change providers, or extend with a vector store. πŸ”

Requirements βœ…

n8n instance (self-hosted or n8n.cloud) with public webhook access

Google Cloud project with these APIs enabled: Vertex AI (Gemini), Cloud Vision API, Cloud Speech-to-Text, Drive & Docs & Sheets 🧠

Google Service Account JSON key (with permissions for above services) πŸ”

WhatsApp Business API provider credentials (Twilio, 360dialog, or similar) πŸ“ž

(Optional) Pinecone or another vector store if you want embeddings / retrieval augmentation πŸ—‚οΈ

Setup Instructions πŸ”§

Import the workflow

Download the .json from your package and in n8n Editor β†’ Import β†’ From File β†’ select JSON. Save.

Create credentials

Google Service Account: Create service account in GCP, grant Vertex AI, Vision, Speech, Drive, Docs, Sheets roles, download JSON β†’ create n8n Google Service Account credential and upload JSON.

WhatsApp: Add your WhatsApp API credentials (API key/token, phone ID, webhook secret).

(Optional) Vector store: Add Pinecone credentials if using embeddings.

Configure media receiver nodes

WhatsApp Trigger: Ensure webhook URL is set on your WhatsApp provider to forward incoming messages/media to n8n. 🌐

Audio Receiver nodes: point to incoming audio payload path or download URL. 🎧

Image Receiver nodes: point to image URL or attachment field. πŸ–ΌοΈ

Transcription & Image Analysis

Transcribe Audio: Select Google Speech-to-Text via your Google Service Account credential. Confirm language and sampling settings. πŸ—£οΈβž‘οΈπŸ“

Analyze Image: Point to Vision API via same Google credential. Choose required outputs (labels, landmarks, OCR). πŸ”Ž

Aggregate media

Confirm the Aggregate node collects text + transcribed audio + image analysis results into a single payload for the AI Agent. 🧩

AI Agent (Gemini)

Open AI Agent node β†’ choose your Google Gemini/Vertex AI credentials.

Edit System Prompt to include: how to prioritize image/audio text, where to look up knowledge (Docs), and style/tone. Replace placeholders (business name, policies). ✍️

Knowledge Fetch (Google Docs)

Configure Get a document in Google Docs node to point to your knowledge docs folder or specific document IDs. πŸ“‚

Conversation Logging

Configure Google Sheets node to append rows to your chosen spreadsheet (structure below). 🧾

WhatsApp Reply

Map AI Agent output to the WhatsApp Send message node and choose template messaging if needed. πŸ’¬

Test end-to-end

Activate the workflow, send a text message, an image, and an audio message to your WhatsApp test number. Verify the AI reply and that logs appear in Google Sheets. πŸ§ͺ

Google Sheet: Conversation Log (suggested columns) πŸ—’οΈ Column Type Description Timestamp Date/Time When the message arrived UserID Text WhatsApp user identifier (phone) MessageType Text text / image / audio / mixed MessageText Text Original text or transcribed text ImageAnalysis Text Vision API summary / labels / OCR text AttachmentURL Text Link to image/audio file AIResponse Text Final text reply sent to user Notes Text Any extra flags (e.g., escalation) Customization πŸ”

Swap Gemini with another model provider by changing the AI node credentials and prompt structure. πŸ”„

Add a vector store (Pinecone) to enable retrieval-augmented generation from indexed docs. 🧠

Extend image analysis to OCR for receipts/invoices or barcode detection. πŸ“‘

Add an escalation path (send to human) by adding an β€œIf β†’ Escalate” node when confidence is low. 🚨

Suggested Node Renaming (for clarity) 🏷️

Rename generic nodes to explicit names so reviewers and users instantly understand flow:

WhatsApp Trigger β†’ WhatsApp Trigger (Inbound)

Switch β†’ Message Type Router

audio receiver1/2 β†’ Audio Receiver (Download) / Audio Receiver (Fallback) 🎧

Transcribe a recording β†’ Transcribe Audio (Speech-to-Text) πŸ“

image receiver1/2 β†’ Image Receiver (Download) / Image Receiver (Fallback) πŸ–ΌοΈ

Analyze image β†’ Image Analysis (Vision API) πŸ”

Aggregate β†’ Aggregate Media Inputs 🧩

AI Agent β†’ Multimodal AI Agent (Gemini) πŸ€–

Get a document in Google Docs β†’ Fetch Knowledge Doc πŸ“‚

Code β†’ Format AI Response 🧾

Send message β†’ WhatsApp – Send Reply πŸ’¬

Testing & Going Live πŸš€

Ensure n8n webhook is publicly reachable (use an SSL domain/tunnel). πŸ”’

Test in a sandbox WhatsApp number first. πŸ§ͺ

Monitor n8n Executions and enable an Error Workflow for graceful failure handling. ⚠️

If you expect high media volume, consider storage/retention policy for attachments. πŸ—„οΈ

Support & Notes 🀝

I offer setup assistance and will help troubleshoot credential issues or prompt tuning until your workflow is working perfectly. Include contact info and a short support pledge in your template description to build trust. Feel free to ping me anytimeβ€”even after launch!πŸ‘‰ Contact: tharwat.elsayed2000@gmail.com | +20β€―106β€―180β€―3236

n8n WhatsApp AI Agent for Text, Image, and Audio

This n8n workflow creates a sophisticated AI agent that can interact with users via WhatsApp, understanding and responding to text, image, and audio messages. It leverages Google Gemini's multimodal capabilities to provide a versatile conversational experience.

What it does

This workflow automates the following steps:

  1. Receives WhatsApp Messages: It acts as a listener for incoming messages on a configured WhatsApp Business Cloud account.
  2. Processes Message Type: It checks the type of the incoming message (text, image, or audio).
  3. Handles Text Messages:
    • If a text message is received, it is passed directly to the AI Agent.
  4. Handles Image Messages:
    • If an image message is received, it downloads the image from the provided URL.
    • It then encodes the image to Base64.
    • The Base64 encoded image is passed to the AI Agent along with a default prompt "What is in this image?".
  5. Handles Audio Messages:
    • If an audio message is received, it downloads the audio file from the provided URL.
    • It then transcribes the audio using the Google Gemini node.
    • The transcribed text is passed to the AI Agent.
  6. Maintains Conversation History: A "Simple Memory" node keeps track of the conversation context, allowing the AI to remember previous interactions.
  7. Engages AI Agent: The AI Agent (powered by Google Gemini Chat Model) processes the input (text, image data, or transcribed audio) and generates a response.
  8. Sends WhatsApp Reply: The AI Agent's response is sent back to the user via WhatsApp.

Prerequisites/Requirements

To use this workflow, you will need:

  • n8n Instance: A running n8n instance.
  • WhatsApp Business Cloud Account: Configured with a webhook pointing to your n8n WhatsApp Trigger node.
  • Google Gemini API Key: For the Google Gemini Chat Model and Google Gemini (for audio transcription).
  • Credentials: Appropriate credentials configured in n8n for WhatsApp Business Cloud and Google Gemini.

Setup/Usage

  1. Import the Workflow:
    • Download the provided JSON file.
    • In your n8n instance, go to "Workflows" and click "New".
    • Click the three dots menu (...) in the top right and select "Import from JSON".
    • Paste the workflow JSON or upload the file.
  2. Configure Credentials:
    • Locate the "WhatsApp Trigger" and "WhatsApp Business Cloud" nodes. Configure them with your WhatsApp Business Cloud credentials.
    • Locate the "Google Gemini Chat Model" and "Google Gemini" nodes. Configure them with your Google Gemini API key.
  3. Activate the Workflow:
    • Once all credentials are set up, activate the workflow by toggling the "Active" switch in the top right corner of the workflow editor.
  4. Test the Agent:
    • Send a text, image, or audio message to your WhatsApp Business number to test the AI agent's responses.

Related Templates

AI multi-agent executive team for entrepreneurs with Gemini, Perplexity and WhatsApp

This workflow is an AI-powered multi-agent system built for startup founders and small business owners who want to automate decision-making, accountability, research, and communication, all through WhatsApp. The β€œvirtual executive team,” is designed to help small teams to work smarter. This workflow sends you market analysis, market and sales tips, It can also monitor what your competitors are doing using perplexity (Research agent) and help you stay a head, or make better decisions. And when you feeling stuck with your start-up accountability director is creative enough to break the barrier 🎯 Core Features πŸ§‘β€πŸ’Ό 1. President (Super Agent) Acts as the main controller that coordinates all sub-agents. Routes messages, assigns tasks, and ensures workflow synchronization between the AI Directors. πŸ“Š 2. Sales & Marketing Director Uses SerpAPI to search for market opportunities, leads, and trends. Suggests marketing campaigns, keywords, or outreach ideas. Can analyze current engagement metrics to adjust content strategy. πŸ•΅οΈβ€β™€οΈ 3. Business Research Director Powered by Perplexity AI for competitive and market analysis. Monitors competitor moves, social media engagement, and product changes. Provides concise insights to help the founder adapt and stay ahead. ⏰ 4. Accountability Director Keeps the founder and executive team on track. Sends motivational nudges, task reminders, and progress reports. Promotes consistency and discipline β€” key traits for early-stage success. πŸ—“οΈ 5. Executive Secretary Handles scheduling, email drafting, and reminders. Connects with Google Calendar, Gmail, and Sheets through OAuth. Automates follow-ups, meeting summaries, and notifications directly via WhatsApp. πŸ’¬ WhatsApp as the Main Interface Interact naturally with your AI team through WhatsApp Business API. All responses, updates, and summaries are delivered to your chat. Ideal for founders who want to manage operations on the go. βš™οΈ How It Works Trigger: The workflow starts from a WhatsApp Trigger node (via Meta Developer Account). Routing: The President agent analyzes the incoming message and determines which Director should handle it. Processing: Marketing or sales queries go to the Sales & Marketing Director. Research questions are handled by the Business Research Director. Accountability tasks are assigned to the Accountability Director. Scheduling or communication requests are managed by the Secretary. Collaboration: Each sub-agent returns results to the President, who summarizes and sends the reply back via WhatsApp. Memory: Context is maintained between sessions, ensuring personalized and coherent communication. 🧩 Integrations Required Gemini API – for general intelligence and task reasoning Supabase- for RAG and postgres persistent memory Perplexity API – for business and competitor analysis SerpAPI – for market research and opportunity scouting Google OAuth – to connect Sheets, Calendar, and Gmail WhatsApp Business API – for message triggers and responses πŸš€ Benefits Acts like a team of tireless employees available 24/7. Saves time by automating research, reminders, and communication. Enhances accountability and strategy consistency for founders. Keeps operations centralized in a simple WhatsApp interface. 🧰 Setup Steps Create API credentials for: WhatsApp (via Meta Developer Account) Gemini, Perplexity, and SerpAPI Google OAuth (Sheets, Calendar, Gmail) Create a supabase account at supabase Add the credentials in the corresponding n8n nodes. Customize the system prompts for each Director based on your startup’s needs. Activate and start interacting with your virtual executive team on WhatsApp. Use Case You are a small organisation or start-up that can not afford hiring; marketing department, research department and secretar office, then this workflow is for you πŸ’‘ Need Customization? Want to tailor it for your startup or integrate with CRM tools like Notion or HubSpot? You can easily extend the workflow or contact the creator for personalized support. Consider adjusting the system prompt to suite your business

ShadrackBy Shadrack
331

Automated YouTube video uploads with 12h interval scheduling in JST

This workflow automates a batch upload of multiple videos to YouTube, spacing each upload 12 hours apart in Japan Standard Time (UTC+9) and automatically adding them to a playlist. βš™οΈ Workflow Logic Manual Trigger β€” Starts the workflow manually. List Video Files β€” Uses a shell command to find all .mp4 files under the specified directory (/opt/downloads/单词卑/A1-A2). Sort and Generate Items β€” Sorts videos by day number (dayXX) extracted from filenames and assigns a sequential order value. Calculate Publish Schedule (+12h Interval) β€” Computes the next rounded JST hour plus a configurable buffer (default 30 min). Staggers each video’s scheduled time by order Γ— 12 hours. Converts JST back to UTC for YouTube’s publishAt field. Split in Batches (1 per video) β€” Iterates over each video item. Read Video File β€” Loads the corresponding video from disk. Upload to YouTube (Scheduled) β€” Uploads the video privately with the computed publishAtUtc. Add to Playlist β€” Adds the newly uploaded video to the target playlist. πŸ•’ Highlights Timezone-safe: Pure UTC ↔ JST conversion avoids double-offset errors. Sequential scheduling: Ensures each upload is 12 hours apart to prevent clustering. Customizable: Change SPANHOURS, BUFFERMIN, or directory paths easily. Retry-ready: Each upload and playlist step has retry logic to handle transient errors. πŸ’‘ Typical Use Cases Multi-part educational video series (e.g., A1–A2 English learning). Regular content release cadence without manual scheduling. Automated YouTube publishing pipelines for pre-produced content. --- Author: Zane Category: Automation / YouTube / Scheduler Timezone: JST (UTC+09:00)

ZaneBy Zane
226

Monitor bank transactions with multi-channel alerts for accounting teams

Enhance financial oversight with this automated n8n workflow. Triggered every 5 minutes, it fetches real-time bank transactions via an API, enriches and transforms the data, and applies smart logic to detect critical, high, and medium priority alerts based on error conditions, amounts, or risk scores. It sends multi-channel notifications via email and Slack, logs all data to Google Sheets, and generates summary statistics for comprehensive tracking. πŸ’°πŸš¨ Key Features Real-time monitoring every 5 minutes for instant alerts. Smart prioritization (Critical, High, Medium) based on risk and errors. Multi-channel notifications via email and Slack. Detailed logging and summary reports in Google Sheets. How It Works Schedule Trigger: Runs every 5 minutes. Fetch Transactions: HTTP request retrieves real-time transaction data. API Error?: If condition for error logic is met, sends error alert. Enrich & Transform Data: Advanced risk calculation enhances data. Critical Alert?: If condition (50% or risk > 8) is met, raises alert. High Priority?: If condition (5% or risk > 7) is met, raises alert. Medium Priority?: If condition is met, raises alert. Log Priority to Sheet: Google Sheets appends critical, high, or medium priority data. Send Critical Email: HTML email to execute sheets append. Send High Priority Email: Email to finance team. Send High Priority Slack: Slack notification to finance team. Send Medium Priority Email: Email to finance team. Merge All Alerts: Combines all alerts for comprehensive tracking. Generate Summary Stats: Code block for analytics. Log Summary to Sheet: Summary statistics storage. Setup Instructions Import the workflow into n8n and configure the bank API credentials in "Fetch Transactions." Set up Google Sheets OAuth2 and replace the sheet ID for logging nodes. Configure Gmail API Key and Slack Bot Token for alerts. Test the workflow with sample transaction data exceeding risk or amount thresholds. Adjust priority conditions (e.g., 50%, 5%, risk > 8) based on your risk policy. Prerequisites Bank API access with real-time transaction data (e.g., https://api.bank.com) Google Sheets OAuth2 credentials Gmail API Key for email alerts Slack Bot Token (with chat:write permissions) Structured transaction data format Google Sheet Structure: Create a sheet with columns: Transaction ID Amount Date Risk Score Priority (Critical/High/Medium) Alert Sent Summary Stats Updated At Modification Options Adjust the "Schedule Trigger" interval (e.g., every 10 minutes). Modify "Critical Alert?" and "High Priority?" conditions for custom thresholds. Customize email and Slack templates with branded messaging. Integrate with fraud detection tools for enhanced risk analysis. Enhance "Generate Summary Stats" with additional metrics (e.g., average risk). Discover more workflows – Get in touch with us

Oneclick AI SquadBy Oneclick AI Squad
333