Convert image to text using GROQ LLaVA V1.5 7B

11372 views

2/3/2026

Notification Slack Lead Management Salesforce CRM

What this template does

This template uses GROQ LLAVA V1.5 7B API that offers fast inference for multimodal models with vision capabilities for understanding and interpreting visual data from images. .

The users send a image and get a description of the image from the model.

Setup

Open the Telegram app and search for the BotFather user (@BotFather)
Start a chat with the BotFather
Type /newbot to create a new bot
Follow the prompts to name your bot and get a unique API token
Save your access token and username
Once you set your bot, you can send the image, and get the descriptions.

n8n Workflow: Convert Image to Text using Groq LLaVA v1.5-7B

This n8n workflow enables you to extract text from images using the Groq LLaVA v1.5-7B model, all triggered directly from Telegram. Simply send an image to your Telegram bot, and the workflow will process it and return the extracted text.

What it does

This workflow automates the following steps:

Listens for new images on Telegram: It acts as a Telegram bot, waiting for incoming messages that contain an image.
Extracts image data: When an image is received, it extracts the binary data of the image.
Prepares data for Groq API: It takes the extracted image data and formats it into a base64 encoded string, suitable for the Groq API. It also constructs the necessary JSON payload for the LLaVA v1.5-7B model, requesting it to describe the image.
Sends request to Groq API: It makes an HTTP POST request to the Groq API endpoint for chat completions, sending the prepared image data and prompt.
Processes Groq API response: It receives the response from the Groq API, which contains the extracted text description of the image.
Sends extracted text back to Telegram: Finally, it sends the text generated by the LLaVA model back to the user who sent the image via Telegram.

Prerequisites/Requirements

To use this workflow, you will need:

n8n Instance: A running instance of n8n.
Telegram Bot Token: A Telegram bot set up with a BotFather token. You'll need to configure a Telegram credential in n8n.
Groq API Key: An API key from Groq to access their LLaVA v1.5-7B model. This will be used in the HTTP Request node.

Setup/Usage

Import the workflow: Download the provided JSON and import it into your n8n instance.
Configure Telegram Trigger:
- Click on the "Telegram Trigger" node.
- Select or create a new Telegram API credential using your bot token.
- Activate the workflow to enable the webhook.
Configure HTTP Request (Groq API):
- Click on the "HTTP Request" node.
- In the "Headers" section, add an Authorization header with the value Bearer YOUR_GROQ_API_KEY, replacing YOUR_GROQ_API_KEY with your actual Groq API Key.
- Ensure the URL is set to https://api.groq.com/openai/v1/chat/completions.
- The "Body Parameters" are already configured to send the image data and prompt.
Configure Telegram (Send Response):
- Click on the "Telegram" node.
- Select the same Telegram API credential used in the trigger.
- The "Chat ID" and "Text" fields are already configured to send the response back to the original sender with the extracted text.
Activate the workflow: Once all credentials and configurations are set, activate the workflow.

Now, you can send an image to your Telegram bot, and it will respond with a text description generated by the Groq LLaVA v1.5-7B model.

Related Templates

Generate song lyrics and music from text prompts using OpenAI and Fal.ai Minimax

Spark your creativity instantly in any chat—turn a simple prompt like "heartbreak ballad" into original, full-length lyrics and a professional AI-generated music track, all without leaving your conversation. 📋 What This Template Does This chat-triggered workflow harnesses AI to generate detailed, genre-matched song lyrics (at least 600 characters) from user messages, then queues them for music synthesis via Fal.ai's minimax-music model. It polls asynchronously until the track is ready, delivering lyrics and audio URL back in chat. Crafts original, structured lyrics with verses, choruses, and bridges using OpenAI Submits to Fal.ai for melody, instrumentation, and vocals aligned to the style Handles long-running generations with smart looping and status checks Returns complete song package (lyrics + audio link) for seamless sharing 🔧 Prerequisites n8n account (self-hosted or cloud with chat integration enabled) OpenAI account with API access for GPT models Fal.ai account for AI music generation 🔑 Required Credentials OpenAI API Setup Go to platform.openai.com → API keys (sidebar) Click "Create new secret key" → Name it (e.g., "n8n Songwriter") Copy the key and add to n8n as "OpenAI API" credential type Test by sending a simple chat completion request Fal.ai HTTP Header Auth Setup Sign up at fal.ai → Dashboard → API Keys Generate a new API key → Copy it In n8n, create "HTTP Header Auth" credential: Name="Fal.ai", Header Name="Authorization", Header Value="Key [Your API Key]" Test with a simple GET to their queue endpoint (e.g., /status) ⚙️ Configuration Steps Import the workflow JSON into your n8n instance Assign OpenAI API credentials to the "OpenAI Chat Model" node Assign Fal.ai HTTP Header Auth to the "Generate Music Track", "Check Generation Status", and "Fetch Final Result" nodes Activate the workflow—chat trigger will appear in your n8n chat interface Test by messaging: "Create an upbeat pop song about road trips" 🎯 Use Cases Content Creators: YouTubers generating custom jingles for videos on the fly, streamlining production from idea to audio export Educators: Music teachers using chat prompts to create era-specific folk tunes for classroom discussions, fostering interactive learning Gift Personalization: Friends crafting anniversary R&B tracks from shared memories via quick chats, delivering emotional audio surprises Artist Brainstorming: Songwriters prototyping hip-hop beats in real-time during sessions, accelerating collaboration and iteration ⚠️ Troubleshooting Invalid JSON from AI Agent: Ensure the system prompt stresses valid JSON; test the agent standalone with a sample query Music Generation Fails (401/403): Verify Fal.ai API key has minimax-music access; check usage quotas in dashboard Status Polling Loops Indefinitely: Bump wait time to 45-60s for complex tracks; inspect fal.ai queue logs for bottlenecks Lyrics Under 600 Characters: Tweak agent prompt to enforce fuller structures like [V1][C][V2][B][C]; verify output length in executions

By Daniel Nkencho

601

Synchronizing WooCommerce inventory and creating products with Google Gemini AI and BrowserAct

Synchronize WooCommerce Inventory & Create Products with Gemini AI & BrowserAct This sophisticated n8n template automates WooCommerce inventory management by scraping supplier data, updating existing products, and intelligently creating new ones with AI-formatted descriptions. This workflow is essential for e-commerce operators, dropshippers, and inventory managers who need to ensure their product pricing and stock levels are synchronized with multiple third-party suppliers, minimizing overselling and maximizing profit. --- Self-Hosted Only This Workflow uses a community contribution and is designed and tested for self-hosted n8n instances only. --- How it works The workflow is typically run by a Schedule Trigger (though a Manual Trigger is also shown) to check stock automatically. It reads a list of suppliers and their inventory page URLs from a central Google Sheet. The workflow loops through each supplier: A BrowserAct node scrapes the current stock and price data from the supplier's inventory page. A Code node parses this bulk data into individual product items. It then loops through each individual product found. The workflow checks WooCommerce to see if the product already exists based on its name. If the product exists: It proceeds to update the existing product's price and stock quantity. If the product DOES NOT exist: An If node checks if the missing product's category matches a predefined type (optional filtering). If it passes the filter, a second BrowserAct workflow scrapes detailed product attributes from a dedicated product page (e.g., DigiKey). An AI Agent (Gemini) transforms these attributes into a specific, styled HTML table for the product description. Finally, the product is created in WooCommerce with all scraped details and the AI-generated description. Error Handling: Multiple Slack nodes are configured to alert your team immediately if any scraping task fails or if the product update/creation process encounters an issue. Note: This workflow does not support image uploads for new products. To enable this functionality, you must modify both the n8n and BrowserAct workflows. --- Requirements BrowserAct API account for web scraping BrowserAct n8n Community Node -> (n8n Nodes BrowserAct) BrowserAct templates named “WooCommerce Inventory & Stock Synchronization” and “WooCommerce Product Data Reconciliation” Google Sheets credentials for the supplier list WooCommerce credentials for product management Google Gemini account for the AI Agent Slack credentials for error alerts --- Need Help? How to Find Your BrowseAct API Key & Workflow ID How to Connect n8n to Browseract How to Use & Customize BrowserAct Templates How to Use the BrowserAct N8N Community Node --- Workflow Guidance and Showcase STOP Overselling! Auto-Sync WooCommerce Inventory from ANY Supplier

By Madame AI Team | Kai

600

Document RAG & chat agent: Google Drive to Qdrant with Mistral OCR

Knowledge RAG & AI Chat Agent: Google Drive to Qdrant Description This workflow transforms a Google Drive folder into an intelligent, searchable knowledge base and provides a chat agent to query it. It’s composed of two distinct flows: An ingestion pipeline to process documents. A live chat agent that uses RAG (Retrieval-Augmented Generation) and optional web search to answer user questions. This system fully automates the creation of a “Chat with your docs” solution and enhances it with external web-searching capabilities. --- Quick Implementation Steps Import the workflow JSON into your n8n instance. Set up credentials for Google Drive, Mistral AI, OpenAI, and Qdrant. Open the Web Search node and add your Tavily AI API key to the Authorization header. In the Google Drive (List Files) node, set the Folder ID you want to ingest. Run the workflow manually once to populate your Qdrant database (Flow 1). Activate the workflow to enable the chat trigger (Flow 2). Copy the public webhook URL from the When chat message received node and open it in a new tab to start chatting. --- What It Does The workflow is divided into two primary functions: Knowledge Base Ingestion (Manual Trigger) This flow populates your vector database. Scans Google Drive: Lists all files from a specified folder. Processes Files Individually: Downloads each file. Extracts Text via OCR: Uses Mistral AI OCR API for text extraction from PDFs, images, etc. Generates Smart Metadata: A Mistral LLM assigns metadata like documenttype, project, and assignedto. Chunks & Embeds: Text is cleaned, chunked, and embedded via OpenAI’s text-embedding-3-small model. Stores in Qdrant: Text chunks, embeddings, and metadata are stored in a Qdrant collection (docaiauto). AI Chat Agent (Chat Trigger) This flow powers the conversational interface. Handles User Queries: Triggered when a user sends a chat message. Internal RAG Retrieval: Searches Qdrant Vector Store first for answers. Web Search Fallback: If unavailable internally, the agent offers to perform a Tavily AI web search. Contextual Responses: Combines internal and external info for comprehensive answers. --- Who's It For Ideal for: Teams building internal AI knowledge bases from Google Drive. Developers creating AI-powered support, research, or onboarding bots. Organizations implementing RAG pipelines. Anyone making unstructured Google Drive documents searchable via chat. --- Requirements n8n instance (self-hosted or cloud). Google Drive Credentials (to list and download files). Mistral AI API Key (for OCR & metadata extraction). OpenAI API Key (for embeddings and chat LLM). Qdrant instance (cloud or self-hosted). Tavily AI API Key (for web search). --- How It Works The workflow runs two independent flows in parallel: Flow 1: Ingestion Pipeline (Manual Trigger) List Files: Fetch files from Google Drive using the Folder ID. Loop & Download: Each file is processed one by one. OCR Processing: Upload file to Mistral Retrieve signed URL Extract text using Mistral DOC OCR Metadata Extraction: Analyze text using a Mistral LLM. Text Cleaning & Chunking: Split into 1000-character chunks. Embeddings Creation: Use OpenAI embeddings. Vector Insertion: Push chunks + metadata into Qdrant. Flow 2: AI Chat Agent (Chat Trigger) Chat Trigger: Starts when a chat message is received. AI Agent: Uses OpenAI + Simple Memory to process context. RAG Retrieval: Queries Qdrant for related data. Decision Logic: Found → Form answer. Not found → Ask if user wants web search. Web Search: Performs Tavily web lookup. Final Response: Synthesizes internal + external info. --- How To Set Up Import the Workflow Upload the provided JSON into your n8n instance. Configure Credentials Create and assign: Google Drive → Google Drive nodes Mistral AI → Upload, Signed URL, DOC OCR, Cloud Chat Model OpenAI → Embeddings + Chat Model nodes Qdrant → Vector Store nodes Add Tavily API Key Open Web Search node → Parameters → Headers Add your key under Authorization (e.g., tvly-xxxx). Node Configuration Google Drive (List Files): Set Folder ID. Qdrant Nodes: Ensure same collection name (docaiauto). Run Ingestion (Flow 1) Click Test workflow to populate Qdrant with your Drive documents. Activate Chat (Flow 2) Toggle the workflow ON to enable real-time chat. Test Open the webhook URL and start chatting! --- How To Customize Change LLMs: Swap models in OpenAI or Mistral nodes (e.g., GPT-4o, Claude 3). Modify Prompts: Edit the system message in ai chat agent to alter tone or logic. Chunking Strategy: Adjust chunkSize and chunkOverlap in the Code node. Different Sources: Replace Google Drive with AWS S3, Local Folder, etc. Automate Updates: Add a Cron node for scheduled ingestion. Validation: Add post-processing steps after metadata extraction. Expand Tools: Add more functional nodes like Google Calendar or Calculator. --- Use Case Examples Internal HR Bot: Answer HR-related queries from stored policy docs. Tech Support Assistant: Retrieve troubleshooting steps for products. Research Assistant: Summarize and compare market reports. Project Management Bot: Query document ownership or project status. --- Troubleshooting Guide | Issue | Possible Solution | |------------|------------------------| | Chat agent doesn’t respond | Check OpenAI API key and model availability (e.g., gpt-4.1-mini). | | Known documents not found | Ensure ingestion flow ran and both Qdrant nodes use same collection name. | | OCR node fails | Verify Mistral API key and input file integrity. | | Web search not triggered | Re-check Tavily API key in Web Search node headers. | | Incorrect metadata | Tune Information Extractor prompt or use a stronger Mistral model. | --- Need Help or More Workflows? Want to customize this workflow for your business or integrate it with your existing tools? Our team at Digital Biz Tech can tailor it precisely to your use case from automation logic to AI-powered enhancements. We can help you set it up for free — from connecting credentials to deploying it live. Contact: shilpa.raju@digitalbiz.tech Website: https://www.digitalbiz.tech LinkedIn: https://www.linkedin.com/company/digital-biz-tech/ You can also DM us on LinkedIn for any help. ---

By DIGITAL BIZ TECH

1409