Enrich company data from Google Sheet with OpenAI Agent and ScrapingBee

9375 views

2/3/2026

Webhook Automation Notion Database Data Entry

This workflow demonstrates how to enrich data from a list of companies in a spreadsheet. While this workflow is production-ready if all steps are followed, adding error handling would enhance its robustness.

Important notes

Check legal regulations: This workflow involves scraping, so make sure to check the legal regulations around scraping in your country before getting started. Better safe than sorry!
Mind those tokens: OpenAI tokens can add up fast, so keep an eye on usage unless you want a surprising bill that could knock your socks off! 💸

Main Workflow

Node 1 - `Webhook`

This node triggers the workflow via a webhook call. You can replace it with any other trigger of your choice, such as form submission, a new row added in Google Sheets, or a manual trigger.

Node 2 - `Get Rows from Google Sheet`

This node retrieves the list of companies from your spreadsheet. here is the Google Sheet Template you can use. The columns in this Google Sheet are:

Company: The name of the company
Website: The website URL of the company
These two fields are required at this step.
Business Area: The business area deduced by OpenAI from the scraped data
Offer: The offer deduced by OpenAI from the scraped data
Value Proposition: The value proposition deduced by OpenAI from the scraped data
Business Model: The business model deduced by OpenAI from the scraped data
ICP: The Ideal Customer Profile deduced by OpenAI from the scraped data
Additional Information: Information related to the scraped data, including:
- Information Sufficiency:
  - Description: Indicates if the information was sufficient to provide a full analysis.
  - Options: "Sufficient" or "Insufficient"
- Insufficient Details:
  - Description: If labeled "Insufficient," specifies what information was missing or needed to complete the analysis.
- Mismatched Content:
  - Description: Indicates whether the page content aligns with that of a typical company page.
- Suggested Actions:
  - Description: Provides recommendations if the page content is insufficient or mismatched, such as verifying the URL or searching for alternative sources.

Node 3 - `Loop Over Items`

This node ensures that, in subsequent steps, the website in "extra workflow input" corresponds to the row being processed. You can delete this node, but you'll need to ensure that the "query" sent to the scraping workflow corresponds to the website of the specific company being scraped (rather than just the first row).

Node 4 - `AI Agent`

This AI agent is configured with a prompt to extract data from the content it receives. The node has three sub-nodes:

OpenAI Chat Model: The model used is currently gpt4-o-mini.
Call n8n Workflow: This sub-node calls the workflow to use ScrapingBee and retrieves the scraped data.
Structured Output Parser: This parser structures the output for clarity and ease of use, and then adds rows to the Google Sheet.

Node 5 - `Update Company Row in Google Sheet`

This node updates the specific company's row in Google Sheets with the enriched data.

Scraper Agent Workflow

Node 1 - `Tool Called from Agent`

This is the trigger for when the AI Agent calls the Scraper. A query is sent with:

Company name
Website (the URL of the website)

Node 2 - `Set Company URL`

This node renames a field, which may seem trivial but is useful for performing transformations on data received from the AI Agent.

Node 3 - `ScrapingBee: Scrape Company's Website`

This node scrapes data from the URL provided using ScrapingBee. You can use any scraper of your choice, but ScrapingBee is recommended, as it allows you to configure scraper behavior directly. Once configured, copy the provided "curl" command and import it into n8n.

Node 4 - `HTML to Markdown`

This node converts the scraped HTML data to Markdown, which is then sent to OpenAI. The Markdown format generally uses fewer tokens than HTML.

Improving the Workflow

It's always a pleasure to share workflows, but creators sometimes want to keep some magic to themselves ✨. Here are some ways you can enhance this workflow:

Handle potential errors
Configure the scraper tool to scrape other pages on the website. Although this will cost more tokens, it can be useful (e.g., scraping "Pricing" or "About Us" pages in addition to the homepage).
Instead of Google Sheets, connect directly to your CRM to enrich company data.
Trigger the workflow from form submissions on your website and send the scraped data about the lead to a Slack or Teams channel.

Enrich Company Data from Google Sheet with OpenAI Agent and ScrapingBee

This n8n workflow automates the process of enriching company data from a Google Sheet using an OpenAI agent and a web scraping tool (ScrapingBee, via a sub-workflow). It's designed to process a list of companies, gather additional information, and potentially update the original sheet.

What it does

This workflow performs the following key steps:

Triggers on execution: The workflow is designed to be triggered by another n8n workflow, indicating it's likely a sub-workflow or part of a larger automation.
Reads data from Google Sheets: It connects to a specified Google Sheet to read company data.
Prepares data for processing: An "Edit Fields (Set)" node likely transforms or selects specific fields from the Google Sheet data for further processing.
Loops over company items: It processes the company data in batches, allowing for efficient handling of large datasets.
Utilizes an AI Agent: For each company, an OpenAI Agent is invoked. This agent is configured with:
- OpenAI Chat Model: To provide conversational AI capabilities for understanding and generating responses.
- Structured Output Parser: To ensure the AI agent's output is in a structured, usable format (e.g., JSON).
- Call n8n Workflow Tool: This tool allows the AI agent to execute another n8n workflow. This is where the web scraping (e.g., using ScrapingBee) would likely occur, as hinted by the directory name.
Generates Markdown output: The final processed information is formatted into Markdown, possibly for logging, reporting, or further integration.

Prerequisites/Requirements

To use this workflow, you will need:

n8n Instance: A running instance of n8n.
Google Sheets Account: Configured with a Google Sheets credential in n8n.
OpenAI API Key: Configured as an OpenAI Chat Model credential in n8n.
ScrapingBee Account (Implicit): While not directly visible in this JSON, the "Call n8n Workflow Tool" and the directory name strongly suggest a sub-workflow that uses ScrapingBee. You would need a ScrapingBee API key and a separate n8n workflow configured to use it.
Another n8n Workflow: This workflow expects to be triggered by another workflow, which would pass the initial data.

Setup/Usage

Import the workflow: Download the JSON and import it into your n8n instance.
Configure Credentials:
- Set up your Google Sheets credential.
- Set up your OpenAI Chat Model credential with your OpenAI API key.
- Ensure the sub-workflow called by the "Call n8n Workflow Tool" (likely for ScrapingBee) is also configured with its necessary credentials.
Configure Google Sheets Node:
- Specify the Spreadsheet ID and Sheet Name from which to read company data.
- Ensure the columns containing company names or other relevant identifiers are correctly mapped.
Configure AI Agent Node:
- Review the prompt and instructions for the AI Agent to ensure it understands the task of enriching company data.
- Verify the "Call n8n Workflow Tool" is correctly configured to call your ScrapingBee sub-workflow and pass the necessary company information.
Activate the Workflow: Once configured, activate the workflow.
Trigger the Workflow: This workflow is designed to be triggered by another n8n workflow. You would typically have a parent workflow that initiates this one, passing the initial company data.

Related Templates

Track competitor SEO keywords with Decodo + GPT-4.1-mini + Google Sheets

This workflow automates competitor keyword research using OpenAI LLM and Decodo for intelligent web scraping. Who this is for SEO specialists, content strategists, and growth marketers who want to automate keyword research and competitive intelligence. Marketing analysts managing multiple clients or websites who need consistent SEO tracking without manual data pulls. Agencies or automation engineers using Google Sheets as an SEO data dashboard for keyword monitoring and reporting. What problem this workflow solves Tracking competitor keywords manually is slow and inconsistent. Most SEO tools provide limited API access or lack contextual keyword analysis. This workflow solves that by: Automatically scraping any competitor’s webpage with Decodo. Using OpenAI GPT-4.1-mini to interpret keyword intent, density, and semantic focus. Storing structured keyword insights directly in Google Sheets for ongoing tracking and trend analysis. What this workflow does Trigger — Manually start the workflow or schedule it to run periodically. Input Setup — Define the website URL and target country (e.g., https://dev.to, france). Data Scraping (Decodo) — Fetch competitor web content and metadata. Keyword Analysis (OpenAI GPT-4.1-mini) Extract primary and secondary keywords. Identify focus topics and semantic entities. Generate a keyword density summary and SEO strength score. Recommend optimization and internal linking opportunities. Data Structuring — Clean and convert GPT output into JSON format. Data Storage (Google Sheets) — Append structured keyword data to a Google Sheet for long-term tracking. Setup Prerequisites If you are new to Decode, please signup on this link visit.decodo.com n8n account with workflow editor access Decodo API credentials OpenAI API key Google Sheets account connected via OAuth2 Make sure to install the Decodo Community node. Create a Google Sheet Add columns for: primarykeywords, seostrengthscore, keyworddensity_summary, etc. Share with your n8n Google account. Connect Credentials Add credentials for: Decodo API credentials - You need to register, login and obtain the Basic Authentication Token via Decodo Dashboard OpenAI API (for GPT-4o-mini) Google Sheets OAuth2 Configure Input Fields Edit the “Set Input Fields” node to set your target site and region. Run the Workflow Click Execute Workflow in n8n. View structured results in your connected Google Sheet. How to customize this workflow Track Multiple Competitors → Use a Google Sheet or CSV list of URLs; loop through them using the Split In Batches node. Add Language Detection → Add a Gemini or GPT node before keyword analysis to detect content language and adjust prompts. Enhance the SEO Report → Expand the GPT prompt to include backlink insights, metadata optimization, or readability checks. Integrate Visualization → Connect your Google Sheet to Looker Studio for SEO performance dashboards. Schedule Auto-Runs → Use the Cron Node to run weekly or monthly for competitor keyword refreshes. Summary This workflow automates competitor keyword research using: Decodo for intelligent web scraping OpenAI GPT-4.1-mini for keyword and SEO analysis Google Sheets for live tracking and reporting It’s a complete AI-powered SEO intelligence pipeline ideal for teams that want actionable insights on keyword gaps, optimization opportunities, and content focus trends, without relying on expensive SEO SaaS tools.

By Ranjan Dailata

161

Generate song lyrics and music from text prompts using OpenAI and Fal.ai Minimax

Spark your creativity instantly in any chat—turn a simple prompt like "heartbreak ballad" into original, full-length lyrics and a professional AI-generated music track, all without leaving your conversation. 📋 What This Template Does This chat-triggered workflow harnesses AI to generate detailed, genre-matched song lyrics (at least 600 characters) from user messages, then queues them for music synthesis via Fal.ai's minimax-music model. It polls asynchronously until the track is ready, delivering lyrics and audio URL back in chat. Crafts original, structured lyrics with verses, choruses, and bridges using OpenAI Submits to Fal.ai for melody, instrumentation, and vocals aligned to the style Handles long-running generations with smart looping and status checks Returns complete song package (lyrics + audio link) for seamless sharing 🔧 Prerequisites n8n account (self-hosted or cloud with chat integration enabled) OpenAI account with API access for GPT models Fal.ai account for AI music generation 🔑 Required Credentials OpenAI API Setup Go to platform.openai.com → API keys (sidebar) Click "Create new secret key" → Name it (e.g., "n8n Songwriter") Copy the key and add to n8n as "OpenAI API" credential type Test by sending a simple chat completion request Fal.ai HTTP Header Auth Setup Sign up at fal.ai → Dashboard → API Keys Generate a new API key → Copy it In n8n, create "HTTP Header Auth" credential: Name="Fal.ai", Header Name="Authorization", Header Value="Key [Your API Key]" Test with a simple GET to their queue endpoint (e.g., /status) ⚙️ Configuration Steps Import the workflow JSON into your n8n instance Assign OpenAI API credentials to the "OpenAI Chat Model" node Assign Fal.ai HTTP Header Auth to the "Generate Music Track", "Check Generation Status", and "Fetch Final Result" nodes Activate the workflow—chat trigger will appear in your n8n chat interface Test by messaging: "Create an upbeat pop song about road trips" 🎯 Use Cases Content Creators: YouTubers generating custom jingles for videos on the fly, streamlining production from idea to audio export Educators: Music teachers using chat prompts to create era-specific folk tunes for classroom discussions, fostering interactive learning Gift Personalization: Friends crafting anniversary R&B tracks from shared memories via quick chats, delivering emotional audio surprises Artist Brainstorming: Songwriters prototyping hip-hop beats in real-time during sessions, accelerating collaboration and iteration ⚠️ Troubleshooting Invalid JSON from AI Agent: Ensure the system prompt stresses valid JSON; test the agent standalone with a sample query Music Generation Fails (401/403): Verify Fal.ai API key has minimax-music access; check usage quotas in dashboard Status Polling Loops Indefinitely: Bump wait time to 45-60s for complex tracks; inspect fal.ai queue logs for bottlenecks Lyrics Under 600 Characters: Tweak agent prompt to enforce fuller structures like [V1][C][V2][B][C]; verify output length in executions

By Daniel Nkencho

601

Automate Dutch Public Procurement Data Collection with TenderNed

TenderNed Public Procurement What This Workflow Does This workflow automates the collection of public procurement data from TenderNed (the official Dutch tender platform). It: Fetches the latest tender publications from the TenderNed API Retrieves detailed information in both XML and JSON formats for each tender Parses and extracts key information like organization names, titles, descriptions, and reference numbers Filters results based on your custom criteria Stores the data in a database for easy querying and analysis Setup Instructions This template comes with sticky notes providing step-by-step instructions in Dutch and various query options you can customize. Prerequisites TenderNed API Access - Register at TenderNed for API credentials Configuration Steps Set up TenderNed credentials: Add HTTP Basic Auth credentials with your TenderNed API username and password Apply these credentials to the three HTTP Request nodes: "Tenderned Publicaties" "Haal XML Details" "Haal JSON Details" Customize filters: Modify the "Filter op ..." node to match your specific requirements Examples: specific organizations, contract values, regions, etc. How It Works Step 1: Trigger The workflow can be triggered either manually for testing or automatically on a daily schedule. Step 2: Fetch Publications Makes an API call to TenderNed to retrieve a list of recent publications (up to 100 per request). Step 3: Process & Split Extracts the tender array from the response and splits it into individual items for processing. Step 4: Fetch Details For each tender, the workflow makes two parallel API calls: XML endpoint - Retrieves the complete tender documentation in XML format JSON endpoint - Fetches metadata including reference numbers and keywords Step 5: Parse & Merge Parses the XML data and merges it with the JSON metadata and batch information into a single data structure. Step 6: Extract Fields Maps the raw API data to clean, structured fields including: Publication ID and date Organization name Tender title and description Reference numbers (kenmerk, TED number) Step 7: Filter Applies your custom filter criteria to focus on relevant tenders only. Step 8: Store Inserts the processed data into your database for storage and future analysis. Customization Tips Modify API Parameters In the "Tenderned Publicaties" node, you can adjust: offset: Starting position for pagination size: Number of results per request (max 100) Add query parameters for date ranges, status filters, etc. Add More Fields Extend the "Splits Alle Velden" node to extract additional fields from the XML/JSON data, such as: Contract value estimates Deadline dates CPV codes (procurement classification) Contact information Integrate Notifications Add a Slack, Email, or Discord node after the filter to get notified about new matching tenders. Incremental Updates Modify the workflow to only fetch new tenders by: Storing the last execution timestamp Adding date filters to the API query Only processing publications newer than the last run Troubleshooting No data returned? Verify your TenderNed API credentials are correct Check that you have setup youre filter proper Need help setting this up or interested in a complete tender analysis solution? Get in touch 🔗 LinkedIn – Wessel Bulte

By Wessel Bulte

247