Extract and analyze web data with Bright Data & Google Gemini

233 views

2/3/2026

Webhook Contact Management Data Integration Salesforce CRM

This workflow performs structured data extraction and data mining from a web page by combining the capabilities of Bright Data and Google Gemini.

How it Works

This workflow focuses on extracting structured data from a web page using Bright Data's Web Unlocker Product. It then uses n8n's AI capabilities, specifically Google Gemini Flash Exp, for information extraction and custom sentiment analysis. The results are sent to webhooks and saved as local files.

Use Cases

Data Mining: Automating the process of extracting and analyzing data from websites.
Web Scraping: Gathering structured data for market research, competitive analysis, or content aggregation.
Sentiment Analysis: Performing custom sentiment analysis on unstructured text.

Setup Instructions

Bright Data Credentials: You need to have an account and a Web Unlocker zone with Bright Data. Update the Header Auth account credentials in the Perform Bright Data Web Request node.
Google Gemini Credentials: Provide your Google Gemini(PaLM) Api account credentials for the AI-related nodes.
Configure URL and Zone: In the Set URL and Bright Data Zone node, set the web URL you want to scrape and your Bright Data zone.
Update Webhook: Update the Webhook Notification URL in the relevant HTTP Request nodes.

Workflow Logic

Trigger: The workflow is triggered manually.
Set Parameters: It sets the target URL and the Bright Data zone.
Web Request: The workflow performs a web request to the specified URL using Bright Data's Web Unlocker. The output is formatted as markdown.
Data Extraction & Analysis: The markdown content is then processed by multiple AI nodes to:
- Extract textual data from the markdown.
- Perform topic analysis with a structured response.
- Analyze trends by location and category with a structured response.
Output: The extracted data and analysis are sent to webhooks and saved as JSON files on disk.

Node Descriptions

| Node Name | Description | |-----------|-------------| | When clicking 'Test workflow' | A manual trigger node to start the workflow. | | Set URL and Bright Data Zone | A Set node to define the URL to be scraped and the Bright Data zone to be used. | | Perform Bright Data Web Request | An httpRequest node that performs the web request to Bright Data's API to retrieve the content. | | Markdown to Textual Data Extractor | An AI node that uses Google Gemini to convert markdown content into plain text. | | Google Gemini Chat Model | A node representing the Google Gemini model used for the data extraction. | | Topic Extractor with the structured response | An AI node that performs topic analysis and outputs the results in a structured JSON format. | | Trends by location and category with the structured response | An AI node that analyzes and clusters emerging trends by location and category, outputting a structured JSON. | | Initiate a Webhook Notification... | These nodes send the output of the AI analysis to a webhook. | | Create a binary file... | Function nodes that convert the JSON output into binary format for writing to a file. | | Write the topics/trends file to disk | readWriteFile nodes that save the binary data to a local file (d:\topics.json and d:\trends.json). |

Customization Tips

Change the web URL in the Set URL and Bright Data Zone node to scrape different websites.
Modify the AI prompts in the AI nodes to customize the analysis (e.g., change the sentiment analysis criteria).
Adjust the output path in the readWriteFile nodes to save the files to a different location.

Suggested Sticky Notes for Workflow

Note: "This workflow deals with the structured data extraction by utilizing Bright Data Web Unlocker Product... Please make sure to set the web URL of your interest within the 'Set URL and Bright Data Zone' node and update the Webhook Notification URL".
LLM Usages: "Google Gemini Flash Exp model is being used... Information Extraction is being used for the handling the custom sentiment analysis with the structured response".

Required Files

1GOrjyc9mtZCMvCr_Structured_Data_Extract,Data_Mining_with_Bright_Data&_Google_Gemini.json: The main n8n workflow export for this automation.

Testing Tips

Run the workflow and check the webhook to verify that the extracted data is being sent correctly.
Confirm that the d:\topics.json and d:\trends.json files are created on your disk with the expected structured data.

Suggested Tags & Categories

Engineering
AI

Extract and Analyze Web Data with Bright Data & Google Gemini

This n8n workflow demonstrates how to extract web data using an HTTP Request node (simulating a Bright Data or similar web scraping service) and then analyze that data using Google Gemini via Langchain nodes for information extraction.

What it does

This workflow automates the following steps:

Starts Manually: The workflow is triggered manually for demonstration purposes.
Simulates Web Data Extraction: An HTTP Request node is used to fetch data from a placeholder URL (https://n8n.io). In a real-world scenario, this would be configured to interact with a web scraping service like Bright Data, providing the scraped HTML content.
Prepares Data for AI: A Function node processes the raw HTML output from the HTTP Request, extracting the data property and passing it to the next step.
Extracts Information with Google Gemini: A "Basic LLM Chain" node, configured with a "Google Gemini Chat Model", takes the extracted web content.
Structures Extracted Information: An "Information Extractor" node (part of the Langchain integration) then processes the output from the LLM chain to extract specific, structured information from the text.
Saves Extracted Data (Placeholder): A "Read/Write Files from Disk" node is included, which could be used to save the extracted information to a file. This node is currently not connected in the provided JSON, but serves as an example of a potential next step.
Sets Output Fields: An "Edit Fields (Set)" node is present, which could be used to transform or rename the final output fields. This node is also not connected in the provided JSON.

Prerequisites/Requirements

n8n Instance: A running instance of n8n.
Google Gemini API Key: You will need a Google Gemini API key configured as a credential in your n8n instance for the "Google Gemini Chat Model" node.
Bright Data Account (Optional, for real-world use): If you intend to use this for actual web scraping, you would need an account with Bright Data or a similar proxy/scraping service. The HTTP Request node would then be configured to interact with that service's API.

Setup/Usage

Import the workflow: Download the provided JSON and import it into your n8n instance.
Configure Credentials:
- Set up a credential for the "Google Gemini Chat Model" node using your Google Gemini API key.
Review HTTP Request Node: The "HTTP Request" node is currently set to https://n8n.io. To use a web scraping service:
- Modify the URL to your Bright Data (or other service) API endpoint.
- Configure the HTTP method, headers, and body according to the scraping service's API documentation.
- Ensure the output format of the scraping service provides the HTML content in a way that the subsequent "Function" node can process it.
Configure Information Extractor: Adjust the "Information Extractor" node's prompt and schema to define what specific information you want to extract from the web content (e.g., product names, prices, descriptions, article titles, dates).
Activate and Execute: Once configured, activate the workflow and execute it manually to test. You can inspect the output of each node to verify the data flow and extraction.
Further Actions (Optional): Connect the "Read/Write Files from Disk" or "Edit Fields (Set)" nodes to store or further process the extracted data as needed (e.g., save to a database, send to a spreadsheet, post to a messaging service).

Related Templates

Generate song lyrics and music from text prompts using OpenAI and Fal.ai Minimax

Spark your creativity instantly in any chat—turn a simple prompt like "heartbreak ballad" into original, full-length lyrics and a professional AI-generated music track, all without leaving your conversation. 📋 What This Template Does This chat-triggered workflow harnesses AI to generate detailed, genre-matched song lyrics (at least 600 characters) from user messages, then queues them for music synthesis via Fal.ai's minimax-music model. It polls asynchronously until the track is ready, delivering lyrics and audio URL back in chat. Crafts original, structured lyrics with verses, choruses, and bridges using OpenAI Submits to Fal.ai for melody, instrumentation, and vocals aligned to the style Handles long-running generations with smart looping and status checks Returns complete song package (lyrics + audio link) for seamless sharing 🔧 Prerequisites n8n account (self-hosted or cloud with chat integration enabled) OpenAI account with API access for GPT models Fal.ai account for AI music generation 🔑 Required Credentials OpenAI API Setup Go to platform.openai.com → API keys (sidebar) Click "Create new secret key" → Name it (e.g., "n8n Songwriter") Copy the key and add to n8n as "OpenAI API" credential type Test by sending a simple chat completion request Fal.ai HTTP Header Auth Setup Sign up at fal.ai → Dashboard → API Keys Generate a new API key → Copy it In n8n, create "HTTP Header Auth" credential: Name="Fal.ai", Header Name="Authorization", Header Value="Key [Your API Key]" Test with a simple GET to their queue endpoint (e.g., /status) ⚙️ Configuration Steps Import the workflow JSON into your n8n instance Assign OpenAI API credentials to the "OpenAI Chat Model" node Assign Fal.ai HTTP Header Auth to the "Generate Music Track", "Check Generation Status", and "Fetch Final Result" nodes Activate the workflow—chat trigger will appear in your n8n chat interface Test by messaging: "Create an upbeat pop song about road trips" 🎯 Use Cases Content Creators: YouTubers generating custom jingles for videos on the fly, streamlining production from idea to audio export Educators: Music teachers using chat prompts to create era-specific folk tunes for classroom discussions, fostering interactive learning Gift Personalization: Friends crafting anniversary R&B tracks from shared memories via quick chats, delivering emotional audio surprises Artist Brainstorming: Songwriters prototyping hip-hop beats in real-time during sessions, accelerating collaboration and iteration ⚠️ Troubleshooting Invalid JSON from AI Agent: Ensure the system prompt stresses valid JSON; test the agent standalone with a sample query Music Generation Fails (401/403): Verify Fal.ai API key has minimax-music access; check usage quotas in dashboard Status Polling Loops Indefinitely: Bump wait time to 45-60s for complex tracks; inspect fal.ai queue logs for bottlenecks Lyrics Under 600 Characters: Tweak agent prompt to enforce fuller structures like [V1][C][V2][B][C]; verify output length in executions

By Daniel Nkencho

601

Automate Dutch Public Procurement Data Collection with TenderNed

TenderNed Public Procurement What This Workflow Does This workflow automates the collection of public procurement data from TenderNed (the official Dutch tender platform). It: Fetches the latest tender publications from the TenderNed API Retrieves detailed information in both XML and JSON formats for each tender Parses and extracts key information like organization names, titles, descriptions, and reference numbers Filters results based on your custom criteria Stores the data in a database for easy querying and analysis Setup Instructions This template comes with sticky notes providing step-by-step instructions in Dutch and various query options you can customize. Prerequisites TenderNed API Access - Register at TenderNed for API credentials Configuration Steps Set up TenderNed credentials: Add HTTP Basic Auth credentials with your TenderNed API username and password Apply these credentials to the three HTTP Request nodes: "Tenderned Publicaties" "Haal XML Details" "Haal JSON Details" Customize filters: Modify the "Filter op ..." node to match your specific requirements Examples: specific organizations, contract values, regions, etc. How It Works Step 1: Trigger The workflow can be triggered either manually for testing or automatically on a daily schedule. Step 2: Fetch Publications Makes an API call to TenderNed to retrieve a list of recent publications (up to 100 per request). Step 3: Process & Split Extracts the tender array from the response and splits it into individual items for processing. Step 4: Fetch Details For each tender, the workflow makes two parallel API calls: XML endpoint - Retrieves the complete tender documentation in XML format JSON endpoint - Fetches metadata including reference numbers and keywords Step 5: Parse & Merge Parses the XML data and merges it with the JSON metadata and batch information into a single data structure. Step 6: Extract Fields Maps the raw API data to clean, structured fields including: Publication ID and date Organization name Tender title and description Reference numbers (kenmerk, TED number) Step 7: Filter Applies your custom filter criteria to focus on relevant tenders only. Step 8: Store Inserts the processed data into your database for storage and future analysis. Customization Tips Modify API Parameters In the "Tenderned Publicaties" node, you can adjust: offset: Starting position for pagination size: Number of results per request (max 100) Add query parameters for date ranges, status filters, etc. Add More Fields Extend the "Splits Alle Velden" node to extract additional fields from the XML/JSON data, such as: Contract value estimates Deadline dates CPV codes (procurement classification) Contact information Integrate Notifications Add a Slack, Email, or Discord node after the filter to get notified about new matching tenders. Incremental Updates Modify the workflow to only fetch new tenders by: Storing the last execution timestamp Adding date filters to the API query Only processing publications newer than the last run Troubleshooting No data returned? Verify your TenderNed API credentials are correct Check that you have setup youre filter proper Need help setting this up or interested in a complete tender analysis solution? Get in touch 🔗 LinkedIn – Wessel Bulte

By Wessel Bulte

247

🎓 How to transform unstructured email data into structured format with AI agent

This workflow automates the process of extracting structured, usable information from unstructured email messages across multiple platforms. It connects directly to Gmail, Outlook, and IMAP accounts, retrieves incoming emails, and sends their content to an AI-powered parsing agent built on OpenAI GPT models. The AI agent analyzes each email, identifies relevant details, and returns a clean JSON structure containing key fields: From – sender’s email address To – recipient’s email address Subject – email subject line Summary – short AI-generated summary of the email body The extracted information is then automatically inserted into an n8n Data Table, creating a structured database of email metadata and summaries ready for indexing, reporting, or integration with other tools. --- Key Benefits ✅ Full Automation: Eliminates manual reading and data entry from incoming emails. ✅ Multi-Source Integration: Handles data from different email providers seamlessly. ✅ AI-Driven Accuracy: Uses advanced language models to interpret complex or unformatted content. ✅ Structured Storage: Creates a standardized, query-ready dataset from previously unstructured text. ✅ Time Efficiency: Processes emails in real time, improving productivity and response speed. *✅ Scalability: Easily extendable to handle additional sources or extract more data fields. --- How it works This workflow automates the transformation of unstructured email data into a structured, queryable format. It operates through a series of connected steps: Email Triggering: The workflow is initiated by one of three different email triggers (Gmail, Microsoft Outlook, or a generic IMAP account), which constantly monitor for new incoming emails. AI-Powered Parsing & Structuring: When a new email is detected, its raw, unstructured content is passed to a central "Parsing Agent." This agent uses a specified OpenAI language model to intelligently analyze the email text. Data Extraction & Standardization: Following a predefined system prompt, the AI agent extracts key information from the email, such as the sender, recipient, subject, and a generated summary. It then forces the output into a strict JSON structure using a "Structured Output Parser" node, ensuring data consistency. Data Storage: Finally, the clean, structured data (the from, to, subject, and summarize fields) is inserted as a new row into a specified n8n Data Table, creating a searchable and reportable database of email information. --- Set up steps To implement this workflow, follow these configuration steps: Prepare the Data Table: Create a new Data Table within n8n. Define the columns with the following names and string type: From, To, Subject, and Summary. Configure Email Credentials: Set up the credential connections for the email services you wish to use (Gmail OAuth2, Microsoft Outlook OAuth2, and/or IMAP). Ensure the accounts have the necessary permissions to read emails. Configure AI Model Credentials: Set up the OpenAI API credential with a valid API key. The workflow is configured to use the model, but this can be changed in the respective nodes if needed. Connect the Nodes: The workflow canvas is already correctly wired. Visually confirm that the email triggers are connected to the "Parsing Agent," which is connected to the "Insert row" (Data Table) node. Also, ensure the "OpenAI Chat Model" and "Structured Output Parser" are connected to the "Parsing Agent" as its AI model and output parser, respectively. Activate the Workflow: Save the workflow and toggle the "Active" switch to ON. The triggers will begin polling for new emails according to their schedule (e.g., every minute), and the automation will start processing incoming messages. --- Need help customizing? Contact me for consulting and support or add me on Linkedin.

By Davide

1616