Back to Catalog

Automated structured data extract & summary via Decodo + Gemini & Google Sheets

Ranjan DailataRanjan Dailata
112 views
2/3/2026
Official Page

Who this is for

This workflow is designed for:

  • Automation engineers building AI-powered data pipelines

  • Product managers & analysts needing structured insights from web pages

  • Researchers & content teams extracting summaries from documentation or articles

  • HR, compliance, and knowledge teams converting unstructured web content into structured records

  • n8n self-hosted users leveraging advanced scraping and LLM enrichment

It is ideal for anyone who wants to transform any public URL into structured data + clean summaries automatically.

What problem this workflow solves

Web content is often unstructured, verbose, and inconsistent, making it difficult to:

  • Extract structured fields reliably

  • Generate consistent summaries

  • Reuse data across spreadsheets, dashboards, or databases

  • Eliminate manual copy-paste and interpretation

This workflow solves the problem of turning arbitrary web pages into machine-readable JSON and human-readable summaries, without custom scrapers or manual parsing logic.

What this workflow does

The workflow integrates Decodo, Google Gemini, and Google Sheets to perform automated extraction of structured data.

Here’s how it works step-by-step:

  1. Input Setup

    • The workflow begins when the user executes it manually or passes a valid URL.
    • The input includes url.
  2. Profile Extraction with Decodo

  • Accepts any valid URL as input

  • Scrapes the page content using Decodo

Uses Google Gemini to:

  • Extract structured data in JSON format

  • Generate a concise, factual summary

  • Cleans and parses AI-generated JSON safely

  • Merges structured data and summary output

  • Stores the final result in Google Sheets for reporting or downstream automation

  1. JSON Parsing & Merging

    • The Code Node cleans and parses the JSON output from the AI for reliable downstream use.
    • The Merge Node combines both structured data and the AI-generated summary.
  2. Data Storage in Google Sheets

    • The Google Sheets Node appends or updates the record, storing the structured JSON and summary into a connected spreadsheet.
  3. End Output

    • A unified, machine-readable data in JSON + an executive-level summary suitable data analysis or downstream automation.

Setup Instructions

Prerequisites

If you are new to Decode, please signup on this link visit.decodo.com

  • n8n account with workflow editor access
  • Decodo API credentials - You need to register, login and obtain the Basic Authentication Token via Decodo Dashboard

image.png

n8n Decodo

  • Google Gemini (PaLM) API access
  • Google Sheets OAuth credentials

Setup Steps

  1. Import the workflow into your n8n instance.

  2. Configure Credentials

    • Add your Decodo API credentials in the Decodo node.
    • Connect your Google Gemini (PaLM) credentials for both AI nodes.
    • Authenticate your Google Sheets account.
  3. Edit Input Node

    • In the Set the Input Fields node, replace the default URL with your desired profile or dynamic data source.
  4. Run the Workflow

    • Trigger manually or via webhook integration for automation.
    • Verify that structured profile data and summary are written to the linked Google Sheet.

How to customize this workflow to your needs

You can easily extend or adapt this workflow:

Modify Structured Output

  • Change the Gemini extraction prompt to match your own JSON schema
  • Add required fields such as authors, dates, entities, or metadata

Improve Summarization

  • Adjust summary length or tone (technical, executive, simplified)
  • Add multi-language summarization using Gemini

Change Output Destination

  • Replace Google Sheets with:

    • Databases (Postgres, MySQL)
    • Notion
    • Slack / Email
    • File storage (JSON, CSV)

Add Validation or Filtering

  • Insert IF nodes to:

    • Reject incomplete data
    • Detect errors or hallucinated output
    • Trigger alerts for malformed JSON

Scale the Workflow

  • Replace manual trigger with:

    • Webhook
    • Scheduled trigger
    • Batch URL processing

Summary

This workflow provides a powerful, generic solution for converting unstructured web pages into structured, AI-enriched datasets.

By combining Decodo for scraping, Google Gemini for intelligence, and Google Sheets for persistence, it enables repeatable, scalable, and production-ready data extraction without custom scrapers or brittle parsing logic.

Automated Structured Data Extraction & Summary via Decodō, Gemini, and Google Sheets

This n8n workflow automates the process of extracting structured data from text, summarizing it, and then storing the results in a Google Sheet. It leverages the power of Decodō for structured data extraction and Google Gemini for summarization, streamlining data processing from unstructured sources.

What it does

This workflow simplifies and automates the following steps:

  1. Manual Trigger: Initiates the workflow upon a manual execution.
  2. Edit Fields (Set): Prepares the input data by setting a text field with the content to be processed.
  3. Basic LLM Chain (Decodō): Utilizes a Decodō-powered LLM chain to extract structured data from the provided text based on a defined schema.
  4. Google Gemini Chat Model: Takes the extracted structured data and generates a concise summary using the Google Gemini AI model.
  5. Merge: Combines the original input data with the extracted structured data and the generated summary.
  6. Code: Transforms the merged data into a suitable format for writing to Google Sheets, specifically converting the structured data and summary into JSON strings.
  7. Google Sheets: Appends a new row to a specified Google Sheet, containing the original text, the extracted structured data (as a JSON string), and the generated summary (as a JSON string).

Prerequisites/Requirements

To use this workflow, you will need:

  • n8n Instance: A running instance of n8n.
  • Decodō API Key: For the "Basic LLM Chain" node to extract structured data.
  • Google Gemini API Key: For the "Google Gemini Chat Model" node to generate summaries.
  • Google Sheets Account: With a spreadsheet ready to receive the processed data. You'll need to configure Google Sheets credentials in n8n.

Setup/Usage

  1. Import the workflow: Download the provided JSON and import it into your n8n instance.
  2. Configure Credentials:
    • Google Sheets: Set up your Google Sheets OAuth2 or Service Account credentials in n8n.
    • Decodō: Configure your Decodō API key as a credential in n8n.
    • Google Gemini: Configure your Google Gemini API key as a credential in n8n.
  3. Update Node Settings:
    • Edit Fields (Set): Modify the text field to contain the actual text you want to process, or connect it to another node that provides the text dynamically.
    • Basic LLM Chain (Decodō): Ensure the "Decodō" credential is selected and the schema for structured data extraction is correctly defined within the node's configuration.
    • Google Gemini Chat Model: Ensure the "Google Gemini" credential is selected.
    • Google Sheets:
      • Select your Google Sheets credential.
      • Specify the Spreadsheet ID and Sheet Name where you want to append the data.
      • Verify the data mapping in the "Append Row" operation to ensure text, structuredData, and summary are mapped to the correct columns in your sheet.
  4. Execute the workflow: Click "Execute workflow" on the "Manual Trigger" node to run the workflow. You can also set up a different trigger (e.g., a Webhook, Cron) to automate its execution based on your needs.

Related Templates

Generate song lyrics and music from text prompts using OpenAI and Fal.ai Minimax

Spark your creativity instantly in any chat—turn a simple prompt like "heartbreak ballad" into original, full-length lyrics and a professional AI-generated music track, all without leaving your conversation. 📋 What This Template Does This chat-triggered workflow harnesses AI to generate detailed, genre-matched song lyrics (at least 600 characters) from user messages, then queues them for music synthesis via Fal.ai's minimax-music model. It polls asynchronously until the track is ready, delivering lyrics and audio URL back in chat. Crafts original, structured lyrics with verses, choruses, and bridges using OpenAI Submits to Fal.ai for melody, instrumentation, and vocals aligned to the style Handles long-running generations with smart looping and status checks Returns complete song package (lyrics + audio link) for seamless sharing 🔧 Prerequisites n8n account (self-hosted or cloud with chat integration enabled) OpenAI account with API access for GPT models Fal.ai account for AI music generation 🔑 Required Credentials OpenAI API Setup Go to platform.openai.com → API keys (sidebar) Click "Create new secret key" → Name it (e.g., "n8n Songwriter") Copy the key and add to n8n as "OpenAI API" credential type Test by sending a simple chat completion request Fal.ai HTTP Header Auth Setup Sign up at fal.ai → Dashboard → API Keys Generate a new API key → Copy it In n8n, create "HTTP Header Auth" credential: Name="Fal.ai", Header Name="Authorization", Header Value="Key [Your API Key]" Test with a simple GET to their queue endpoint (e.g., /status) ⚙️ Configuration Steps Import the workflow JSON into your n8n instance Assign OpenAI API credentials to the "OpenAI Chat Model" node Assign Fal.ai HTTP Header Auth to the "Generate Music Track", "Check Generation Status", and "Fetch Final Result" nodes Activate the workflow—chat trigger will appear in your n8n chat interface Test by messaging: "Create an upbeat pop song about road trips" 🎯 Use Cases Content Creators: YouTubers generating custom jingles for videos on the fly, streamlining production from idea to audio export Educators: Music teachers using chat prompts to create era-specific folk tunes for classroom discussions, fostering interactive learning Gift Personalization: Friends crafting anniversary R&B tracks from shared memories via quick chats, delivering emotional audio surprises Artist Brainstorming: Songwriters prototyping hip-hop beats in real-time during sessions, accelerating collaboration and iteration ⚠️ Troubleshooting Invalid JSON from AI Agent: Ensure the system prompt stresses valid JSON; test the agent standalone with a sample query Music Generation Fails (401/403): Verify Fal.ai API key has minimax-music access; check usage quotas in dashboard Status Polling Loops Indefinitely: Bump wait time to 45-60s for complex tracks; inspect fal.ai queue logs for bottlenecks Lyrics Under 600 Characters: Tweak agent prompt to enforce fuller structures like [V1][C][V2][B][C]; verify output length in executions

Daniel NkenchoBy Daniel Nkencho
601

Automate Dutch Public Procurement Data Collection with TenderNed

TenderNed Public Procurement What This Workflow Does This workflow automates the collection of public procurement data from TenderNed (the official Dutch tender platform). It: Fetches the latest tender publications from the TenderNed API Retrieves detailed information in both XML and JSON formats for each tender Parses and extracts key information like organization names, titles, descriptions, and reference numbers Filters results based on your custom criteria Stores the data in a database for easy querying and analysis Setup Instructions This template comes with sticky notes providing step-by-step instructions in Dutch and various query options you can customize. Prerequisites TenderNed API Access - Register at TenderNed for API credentials Configuration Steps Set up TenderNed credentials: Add HTTP Basic Auth credentials with your TenderNed API username and password Apply these credentials to the three HTTP Request nodes: "Tenderned Publicaties" "Haal XML Details" "Haal JSON Details" Customize filters: Modify the "Filter op ..." node to match your specific requirements Examples: specific organizations, contract values, regions, etc. How It Works Step 1: Trigger The workflow can be triggered either manually for testing or automatically on a daily schedule. Step 2: Fetch Publications Makes an API call to TenderNed to retrieve a list of recent publications (up to 100 per request). Step 3: Process & Split Extracts the tender array from the response and splits it into individual items for processing. Step 4: Fetch Details For each tender, the workflow makes two parallel API calls: XML endpoint - Retrieves the complete tender documentation in XML format JSON endpoint - Fetches metadata including reference numbers and keywords Step 5: Parse & Merge Parses the XML data and merges it with the JSON metadata and batch information into a single data structure. Step 6: Extract Fields Maps the raw API data to clean, structured fields including: Publication ID and date Organization name Tender title and description Reference numbers (kenmerk, TED number) Step 7: Filter Applies your custom filter criteria to focus on relevant tenders only. Step 8: Store Inserts the processed data into your database for storage and future analysis. Customization Tips Modify API Parameters In the "Tenderned Publicaties" node, you can adjust: offset: Starting position for pagination size: Number of results per request (max 100) Add query parameters for date ranges, status filters, etc. Add More Fields Extend the "Splits Alle Velden" node to extract additional fields from the XML/JSON data, such as: Contract value estimates Deadline dates CPV codes (procurement classification) Contact information Integrate Notifications Add a Slack, Email, or Discord node after the filter to get notified about new matching tenders. Incremental Updates Modify the workflow to only fetch new tenders by: Storing the last execution timestamp Adding date filters to the API query Only processing publications newer than the last run Troubleshooting No data returned? Verify your TenderNed API credentials are correct Check that you have setup youre filter proper Need help setting this up or interested in a complete tender analysis solution? Get in touch 🔗 LinkedIn – Wessel Bulte

Wessel BulteBy Wessel Bulte
247

🎓 How to transform unstructured email data into structured format with AI agent

This workflow automates the process of extracting structured, usable information from unstructured email messages across multiple platforms. It connects directly to Gmail, Outlook, and IMAP accounts, retrieves incoming emails, and sends their content to an AI-powered parsing agent built on OpenAI GPT models. The AI agent analyzes each email, identifies relevant details, and returns a clean JSON structure containing key fields: From – sender’s email address To – recipient’s email address Subject – email subject line Summary – short AI-generated summary of the email body The extracted information is then automatically inserted into an n8n Data Table, creating a structured database of email metadata and summaries ready for indexing, reporting, or integration with other tools. --- Key Benefits ✅ Full Automation: Eliminates manual reading and data entry from incoming emails. ✅ Multi-Source Integration: Handles data from different email providers seamlessly. ✅ AI-Driven Accuracy: Uses advanced language models to interpret complex or unformatted content. ✅ Structured Storage: Creates a standardized, query-ready dataset from previously unstructured text. ✅ Time Efficiency: Processes emails in real time, improving productivity and response speed. *✅ Scalability: Easily extendable to handle additional sources or extract more data fields. --- How it works This workflow automates the transformation of unstructured email data into a structured, queryable format. It operates through a series of connected steps: Email Triggering: The workflow is initiated by one of three different email triggers (Gmail, Microsoft Outlook, or a generic IMAP account), which constantly monitor for new incoming emails. AI-Powered Parsing & Structuring: When a new email is detected, its raw, unstructured content is passed to a central "Parsing Agent." This agent uses a specified OpenAI language model to intelligently analyze the email text. Data Extraction & Standardization: Following a predefined system prompt, the AI agent extracts key information from the email, such as the sender, recipient, subject, and a generated summary. It then forces the output into a strict JSON structure using a "Structured Output Parser" node, ensuring data consistency. Data Storage: Finally, the clean, structured data (the from, to, subject, and summarize fields) is inserted as a new row into a specified n8n Data Table, creating a searchable and reportable database of email information. --- Set up steps To implement this workflow, follow these configuration steps: Prepare the Data Table: Create a new Data Table within n8n. Define the columns with the following names and string type: From, To, Subject, and Summary. Configure Email Credentials: Set up the credential connections for the email services you wish to use (Gmail OAuth2, Microsoft Outlook OAuth2, and/or IMAP). Ensure the accounts have the necessary permissions to read emails. Configure AI Model Credentials: Set up the OpenAI API credential with a valid API key. The workflow is configured to use the model, but this can be changed in the respective nodes if needed. Connect the Nodes: The workflow canvas is already correctly wired. Visually confirm that the email triggers are connected to the "Parsing Agent," which is connected to the "Insert row" (Data Table) node. Also, ensure the "OpenAI Chat Model" and "Structured Output Parser" are connected to the "Parsing Agent" as its AI model and output parser, respectively. Activate the Workflow: Save the workflow and toggle the "Active" switch to ON. The triggers will begin polling for new emails according to their schedule (e.g., every minute), and the automation will start processing incoming messages. --- Need help customizing? Contact me for consulting and support or add me on Linkedin.

DavideBy Davide
1616