Back to Catalog

Templates by Ranjan Dailata

Extract & transform HackerNews data to Google Docs using Gemini 2.0 flash

Description This workflow automates the process of scraping the latest discussions from HackerNews, transforming raw threads into human readable content using Google Gemini, and exporting the final content into a well-formatted Google Doc. Overview This n8n workflow is responsible for extracting trending posts from the HackerNews API. It loops through each item, performs HTTP data extraction, utilizes Google Gemini to generate human-readable insights, and then exports the enriched content into Google Docs for distribution, archiving, or content creation. Who this workflow is for Tech Newsletter Writers: Automate the collection and summarization of trending HackerNews posts for inclusion in weekly or daily newsletters. Content Creators & Bloggers: Quickly generate structured summaries and insights from HackerNews threads to use as inspiration or supporting content for blog posts, videos, or social media. Startup Founders & Product Builders: Monitor HackerNews for discussions relevant to your niche or competitors, and keep a pulse on the community’s opinions. Investors & Analysts: Surface early signals from the tech ecosystem by identifying what’s trending and how the community is reacting. Researchers & Students: Analyze popular discussions and emerging trends in technology, programming, and startups—enriched with AI-generated insights. Digital Agencies & Consultants: Offer HackerNews monitoring and insight reports as a value-added service to clients interested in the tech space. Tools Used n8n: The core automation engine that manages the trigger, transformation, and export. HackerNews API: Provides access to trending or new HN posts. Google Gemini: Enriches HackerNews content with structured insights and human-like summaries. Google Docs: Automatically creates and updates a document with the enriched content, ready for sharing or publishing. How to Install Import the Workflow: Download the .json file and import it into your n8n instance. Set Up HackerNews Source: Choose whether to use the HN API (via HTTP Request node) or RSS Feed node. Configure Gemini API: Add your Google Gemini API key and design the prompt to extract pros/cons, key themes, or insights. Set Up Google Docs Integration: Connect your Google account and configure the Google Docs node to create/update a document. Test and Deploy: Run a test job to ensure data flows correctly and outputs are formatted as expected. Use Cases Tech Newsletter Authors: Generate ready-to-use summaries of trending HackerNews threads. Startup Founders: Stay informed on key discussions, product launches, and community feedback. Investors & Analysts: Spot early trends, technical insights, and startup momentum directly from HN. Researchers: Track community reactions to new technologies or frameworks. Content Creators: Use the enriched data to spark blog posts, YouTube scripts, or LinkedIn updates. Connect with Me Email: ranjancse@gmail.com LinkedIn: https://www.linkedin.com/in/ranjan-dailata/ Get Bright Data: Bright Data (Supports free workflows with a small commission) n8n automation hackernews contentcuration aiwriting geminiapi googlegemini techtrends newsletterautomation googleworkspace rssautomation nocode structureddata webscraping contentautomation hninsights aiworkflow googleintegration webmonitoring hnnews aiassistant gdocs automationtools gptlike geminiwriter

Ranjan DailataBy Ranjan Dailata
18740

Real estate intelligence tracker with Bright Data & OpenAI

Who this is for The Real Estate Intelligence Tracker is a powerful automated workflow designed for real estate analysts, investors, proptech startups, and market researchers who need to collect and analyze structured data from real estate listings across the web at scale. This workflow is tailored for: Real Estate Analysts - Tracking property prices, locations, and market trends Investment Firms - Sourcing high-opportunity listings for portfolio decisions PropTech Developers - Automating listing insights for SaaS platforms Market Researchers - Extracting insights from competitive housing data Growth Teams - Monitoring geographic property trends and pricing fluctuations What problem is this workflow solving? Collecting structured real estate listing data from property websites is difficult due to bot protections and unstructured HTML content. Manual data collection is slow and error-prone, and traditional scrapers often get blocked or miss context. This workflow solves: Automated bypass of anti-bot protection using Bright Data Web Unlocker Conversion of unstructured HTML content into clean text using a Markdown-to-text LLM pipeline Structured extraction of key listing data like price, location, property type, and features using OpenAI Aggregation and delivery of insights to Google Sheets, local storage, and webhook-based alerts What this workflow does Convert to Text: Transforms scraped HTML/markdown into clean text using a Basic LLM Chain Structured Data Extraction: Uses OpenAI GPT-4o with the Information Extractor node to parse property attributes (price, address, area, type, etc.) Aggregate & Merge: Combines data from multiple pages or listings into a cohesive structure Outbound Data Handling: Google Sheets – Appends the structured real estate data for further analysis Save to Disk – Persists structured JSON/text data locally Webhook Notification – Sends data alerts or summaries to any third-party platform Pre-conditions You need to have a Bright Data account and do the necessary setup as mentioned in the "Setup" section below. You need to have an OpenAI Account. Setup Sign up at Bright Data. Navigate to Proxies & Scraping and create a new Web Unlocker zone by selecting Web Unlocker API under Scraping Solutions. In n8n, configure the Header Auth account under Credentials (Generic Auth Type: Header Authentication). The Value field should be set with the Bearer XXXXXXXXXXXXXX. The XXXXXXXXXXXXXX should be replaced by the Web Unlocker Token. In n8n, Configure the Google Sheet Credentials with your own account. Follow this documentation - Set Google Sheet Credential In n8n, configure the OpenAi account credentials. Ensure the URL and Bright Data zone name are correctly set in the Set URL, Filename and Bright Data Zone node. Set the desired local path in the Write a file to disk node to save the responses. How to customize this workflow to your needs Target Multiple Sites or Locations Update the Bright Data URL node dynamically with a list of regional real estate websites Loop through different city/state filter URLs Customize Extracted Fields Modify the Information Extractor prompt to extract fields like: Property size, number of bedrooms/bathrooms Days on market Nearby amenities or schools Agent contact details Integrate with More Destinations Add nodes to export data to Notion, Airtable, HubSpot, or your custom database Generate automated reports using PDF generators and email them Data Quality and Logging Add validation checks (e.g., missing price or address) Save intermediate files (markdown, raw HTML, JSON output) to disk for audit purposes

Ranjan DailataBy Ranjan Dailata
10441

Create structured eBooks in minutes with Google Gemini Flash 2.0 to Google Docs

This workflow contains community nodes that are only compatible with the self-hosted version of n8n. Description This workflow automates the creation of structured eBooks by generating chapters, table of contents, and content using Google Gemini Flash 2.0. Overview This n8n workflow allows users to input a topic or outline, which is then processed by Google Gemini Flash 2.0 to generate chapter titles, structured table of contents, and detailed section-wise content. The final output is formatted and exported into a Google Document, ready for review and further publishing. Who This Workflow Is For Authors & Writers Save time by auto-generating chapter ideas, summaries, and full-length content based on a topic or outline—great for fiction and nonfiction alike. Content Marketers Rapidly create downloadable eBooks, whitepapers, or lead magnets for campaigns without relying on long production cycles. Educators & Course Creators Convert your syllabus, course modules, or learning outcomes into structured, well-formatted educational eBooks. Agencies & Freelancers Offer AI-powered eBook creation as a value-added service to clients in need of fast, professional content. Entrepreneurs & Coaches Turn your knowledge, frameworks, or training material into publish-ready books to promote your brand or monetize content. Technical Writers & Documentarians Generate structured documentation or guides from outlines, simplifying the technical writing process with the help of AI. Tools Used n8n: Orchestrates input handling, AI processing, formatting, and export. Google Gemini Flash 2.0: Generates high-quality, structured content, including chapters, summaries, and body text. Google Docs: Used to compile and format the full eBook in a collaborative document. Google Drive / Email: Optional nodes for storing or delivering the final output. How to Install Import the Workflow: Download and import the .json file into your n8n instance. Configure Gemini Flash 2.0: Add your API key and set the desired creativity, length, and tone options. Provide Input: Use a webhook or manual trigger to define the eBook topic or structure. Customize Format: Modify prompts or Gemini instructions to match your eBook format, voice, or domain (e.g., fiction, business, technical). Export to Google Docs: Authenticate and configure the Google Docs node to write the output chapter-wise into a new or existing document. Optional Distribution: Connect to Google Drive or Gmail to store or send the final eBook. Use Cases Writers & Authors: Quickly draft entire eBooks based on minimal input. Marketers: Generate lead magnets, guides, and product documentation at scale. Educators: Produce structured learning materials or course eBooks. Agencies: Offer eBook creation services powered by AI. Entrepreneurs: Turn knowledge into content assets without hiring ghostwriters. Connect with Me Email: ranjancse@gmail.com LinkedIn: https://www.linkedin.com/in/ranjan-dailata/ Get Bright Data: Bright Data (Supports free workflows with a small commission) n8n automation ebookcreation googleai geminiflash aiwriting gdocs contentautomation ebookworkflow nocode contentmarketing gemini aiwriter automatedpublishing aicontent bookcreation geminiworkflow ebookgenerator gptalternative flash20 geminiflash2 authorautomation educationalcontent aiinmarketing n8nworkflow

Ranjan DailataBy Ranjan Dailata
6150

Google Maps business scraper & lead enricher with Bright Data & Google Gemini

Notice Community nodes can only be installed on self-hosted instances of n8n. Description This workflow automates the process of scraping local business data from Google Maps and enriching it using AI to generate lead profiles. It's designed to help sales, marketing, and outreach teams collect high-quality B2B leads from Google Maps and enrich them with contextual insights without manual data entry. Overview This workflow scrapes business listings from Google Maps, extracts critical information like name, category, phone, address, and website using Bright Data, and passes the results to Google Gemini to generate enriched summaries and lead insights such as company description, potential services offered, and engagement score. The data is then structured and stored in spreadsheets for outreach. Tools Used n8n: The core automation engine to manage flow and trigger actions. Bright Data: Scrapes business information from Google Maps at scale with proxy rotation and CAPTCHA-solving. Google Gemini: Enriches the raw scraped data with smart business summaries, categorization, and lead scoring. Google Sheets : For storing and acting upon the enriched leads. How to Install Import the Workflow: Download the .json file and import it into your n8n instance. Set Up Bright Data: Insert your Bright Data credentials and configure the Google Maps scraping proxy endpoint. Configure Gemini API: Add your Google Gemini API key (or use via Make.com plugin). Customize the Inputs: Choose your target location, business category, and number of results per query. Choose Storage: Connect to your preferred storage like Google Sheets. Test and Deploy: Run a test scrape and enrichment before deploying for bulk runs. Use Cases Sales Teams: Auto-generate warm B2B lead lists with company summaries and relevance scores. Marketing Agencies: Identify local business prospects for SEO, web development, or ads services. Freelancers: Find high-potential clients in specific niches or cities. Business Consultants: Collect and categorize local businesses for competitive analysis or partnerships. Recruitment Firms: Identify and score potential company clients for talent acquisition. Connect with Me Email: ranjancse@gmail.com LinkedIn: https://www.linkedin.com/in/ranjan-dailata/ Get Bright Data: Bright Data (Supports free workflows with a small commission) n8n automation leadscraping googlemaps brightdata leadgen b2bleads salesautomation nocode leadprospecting marketingautomation googlemapsdata geminiapi googlegemini aiworkflow scrapingworkflow businessleads datadrivenoutreach crm workflowautomation salesintelligence b2bmarketing

Ranjan DailataBy Ranjan Dailata
6010

Scrape Web Data with Bright Data, Google Gemini and MCP Automated AI Agent

Disclaimer This template is only available on n8n self-hosted as it's making use of the community node for MCP Client. Who this is for? The Scrape Web Data with Bright Data and MCP Automated AI Agent workflow is built for professionals who need to automate large-scale, intelligent data extraction by utilizing the Bright Data MCP Server and Google Gemini. This solution is ideal for: Data Analysts - Who require structured, enriched datasets for analysis and reporting. Marketing Researchers - Seeking fresh market intelligence from dynamic web sources. Product Managers - Who want competitive product and feature insights from various websites. AI Developers - Aiming to feed web data into downstream machine learning models. Growth Hackers - Looking for high-quality data to fuel campaigns, research, or strategic targeting. What problem is this workflow solving? Manually scraping websites, cleaning raw HTML data, and generating useful insights from it can be slow, error-prone, and non-scalable. This workflow solves these problems by: Automating complex web data extraction through Bright Data’s MCP Server. Reducing the human effort needed for cleaning, parsing, and analyzing unstructured web content. Allowing seamless integration into further automation processes. What this workflow does? This n8n workflow performs the following steps: Trigger: Start manually. Input URL(s): Specify the URL to perform the web scrapping. Web Scraping (Bright Data): Use Bright Data’s MCP Server tools to accomplish the web data scrapping with markdown and html format. Store / Output: Save results into disk and also performs a Webhook notification. Setup Please make sure to setup n8n locally with MCP Servers by navigating to n8n-nodes-mcp Please make sure to install the Bright Data MCP Server @brightdata/mcp on your local machine. Sign up at Bright Data. Create a Web Unlocker proxy zone called mcp_unlocker on Bright Data control panel. Navigate to Proxies & Scraping and create a new Web Unlocker zone by selecting Web Unlocker API under Scraping Solutions. In n8n, configure the Google Gemini(PaLM) Api account with the Google Gemini API key (or access through Vertex AI or proxy). In n8n, configure the credentials to connect with MCP Client (STDIO) account with the Bright Data MCP Server as shown below. Make sure to copy the Bright Data APITOKEN within the Environments textbox above as APITOKEN=<your-token>. Update the LinkedIn URL person and company workflow. Update the Webhook HTTP Request node with the Webhook endpoint of your choice. Update the file name and path to persist on disk. How to customize this workflow to your needs Different Inputs: Instead of static URLs, accept URLs dynamically via webhook or form submissions. Outputs: Update the Webhook endpoints to send the response to Slack channels, Airtable, Notion, CRM systems, etc.

Ranjan DailataBy Ranjan Dailata
3911

Google search engine results page extraction and summarization with Bright Data

Who this is for? This workflow is designed for professionals and teams who need real-time, structured insights from Google Search results without manual effort. What problem is this workflow solving? This n8n workflow solves the problem of automating Google Search result extraction, cleanup, summarization, and AI-enhanced formatting for downstream use like sending the results to a webhook or another system. What this workflow does Automates Google Search via Bright Data Uses Bright Data’s proxy-based SERP API to run a Google Search query programmatically. Makes the process repeatable and scriptable with different search terms and regions/zones. Cleans and Extracts Useful Content The Google Search Data Extractor uses LLM based cleaning to remove HTML/CSS/JS from the response and extract pure text data. Converts messy, unstructured web content into structured, machine-readable format. Summarizes Search Results Through the Gemini Flash + Summarization Chain, it generates a concise summary of the search results. Ideal for users who don’t have time to read full pages of search results. Formats Data Using AI Agent The AI Agent acts like a virtual assistant that: Understands search results Formats them in a readable, JSON-compatible form Prepares them for webhook delivery Delivers Results to Webhook Sends the final summary + structured search result to a webhook (could be your app, a Slack bot, Google Sheets, or CRM). Setup Sign up at Bright Data. Navigate to Proxies & Scraping and create a new Web Unlocker zone by selecting Web Unlocker API under Scraping Solutions. In n8n, configure the Header Auth account under Credentials (Generic Auth Type: Header Authentication). The Value field should be set with the Bearer XXXXXXXXXXXXXX. The XXXXXXXXXXXXXX should be replaced by the Web Unlocker Token. A Google Gemini API key (or access through Vertex AI or proxy). Update the Google Search query as you wish by navigating to the Set Google Search Query node. Update the Webhook HTTP Request node with the Webhook endpoint of your choice. How to customize This Workflow to your needs Change the Search Input Default: It searches a fixed query or dataset. Customize: Accept input from a Google Sheet, Airtable, or a form. Auto-trigger searches based on keywords or schedules. Customize Summarization Style (LLM Output) Default: General summary using Google Gemini or OpenAI. Customize: Add tone: formal, casual, technical, executive-summary, etc. Focus on specific sections: pricing, competitors, FAQs, etc. Translate the summaries into multiple languages. Add bullet points, pros/cons, or insight tags. 3.Choose Where the Results Go Options: Email, Slack, Notion, Airtable, Google Docs, or a dashboard. Auto-create content drafts for WordPress or newsletters. Feed into CRM notes or attach to Salesforce leads.

Ranjan DailataBy Ranjan Dailata
3620

Google trend data extract & summarization with Bright Data & Google Gemini

Who this is for The Google Trend Data Extract & Summarization workflow is ideal for trend researchers, digital marketers, content strategists, and AI developers who want to automate the extraction, summarization, and distribution of Google Trends data. This end-to-end solution helps transform trend signals into human-readable insights and delivers them across multiple channels. It is built for: Market Researchers - Tracking trends by topic or region Content Strategists - Identifying content opportunities from trending data SEO Analysts - Monitoring search volume and shifts in keyword popularity Growth Hackers - Reacting quickly to real-time search behavior AI & Automation Engineers - Creating automated trend monitoring systems What problem is this workflow solving? Google Trends data can provide rich insights into user interests, but the raw data is not always structured or easily interpretable at scale. Manually extracting, cleaning, and summarizing trends from multiple regions or categories is time-consuming. This workflow solves the following problems: Automates the conversion of markdown or scraped HTML into clean textual input Transforms unstructured data into structured format ready for processing Uses AI summarization to generate easy-to-read insights from Google Trends Distributes summaries via email and webhook notifications Persists responses to disk for archiving, auditing, or future analytics What this workflow does Receives input: Sets an URL for the data extraction and analysis. Uses Bright Data’s Web Unlocker to extract content from relevant site. Markdown to Textual Data Extractor: Converts markdown content into plaintext using n8n’s Function or Markdown nodes Structured Data Extract: Parses the plaintext into structured JSON suitable for AI processing Summarize Google Trends: Sends structured data to Google Gemini with a summarization prompt to extract key takeaways Send Summary via Gmail: Composes an email with the AI-generated summary and sends it to a designated recipient Persist to Disk: Writes the AI structured data to disk Webhook Notification: Sends the summarized response to an external system (e.g., Slack, Notion, Zapier) using a webhook Setup Sign up at Bright Data. Navigate to Proxies & Scraping and create a new Web Unlocker zone by selecting Web Unlocker API under Scraping Solutions. In n8n, configure the Header Auth account under Credentials (Generic Auth Type: Header Authentication). The Value field should be set with the Bearer XXXXXXXXXXXXXX. The XXXXXXXXXXXXXX should be replaced by the Web Unlocker Token. A Google Gemini API key (or access through Vertex AI or proxy). Update the Set URL and Bright Data Zone for setting the brand content URL and the Bright Data Zone name. Update the Webhook HTTP Request node with the Webhook endpoint of your choice. How to customize this workflow to your needs Update Source : Update the workflow input to read from Google Sheet or Airbase etc. Gemini Prompt Tuning : Customize prompts to extract summaries like: Summarize the most significant trend shifts Generate content ideas from the trending search topics Email Personalization : Configure Gmail node to: Use dynamic subject lines like: Weekly Google Trends Summary – {{date}} Send to multiple stakeholders or mailing lists File Storage Customization : Save with timestamps, e.g., trendssummary2025-04-29.json Extend to S3 or cloud drive integrations Webhook Use Cases : Send summary to: Internal dashboards Slack channels Automation tools like Make, Zapier etc.

Ranjan DailataBy Ranjan Dailata
3341

Automated resume job matching engine with Bright Data MCP & OpenAI 4o mini

Notice Community nodes can only be installed on self-hosted instances of n8n. Who this is for The Automated Resume Job Matching Engine is an intelligent workflow designed for career platforms, HR tech startups, recruiting firms, and AI developers who want to streamline job-resume matching using real-time data from LinkedIn and job boards. This workflow is tailored for: HR Tech Founders - Building next-gen recruiting products Recruiters & Talent Sourcers - Seeking automated candidate-job fit evaluation Job Boards & Portals - Enriching user experience with AI-driven job recommendations Career Coaches & Resume Writers - Offering personalized job fit analysis AI Developers - Automating large-scale matching tasks using LinkedIn and job data What problem is this workflow solving? Manually matching a resume to job description is time-consuming, biased, and inefficient. Additionally, accessing live job postings and candidate profiles requires overcoming web scraping limitations. This workflow solves: Automated LinkedIn profile and job post data extraction using Bright Data MCP infrastructure Semantic matching between job requirements and candidate resume using OpenAI 4o mini Pagination handling for high-volume job data End-to-end automation from scraping to delivery via webhook and persisting the job matched response to disk What this workflow does Bright Data MCP for Job Data Extraction Uses Bright Data MCP Clients to extract multiple job listings (supports pagination) Pulls job data from LinkedIn with the pre-defined filtering criteria's OpenAI 4o mini LLM Matching Engine Extracts paginated job data from the Bright Data MCP extracted info via the MCP scrapeashtml tool. Extracts textual job description information via the scraped job information by leveraging the Bright Data MCP scrapeashtml tool. AI Job Matching node handles the job description and the candidate resume compare to generate match scores with insights Data Delivery Sends final match report to a Webhook Notification endpoint Persistence of AI matched job response to disk Pre-conditions Knowledge of Model Context Protocol (MCP) is highly essential. Please read this blog post - model-context-protocol You need to have the Bright Data account and do the necessary setup as mentioned in the Setup section below. You need to have the Google Gemini API Key. Visit Google AI Studio You need to install the Bright Data MCP Server @brightdata/mcp You need to install the n8n-nodes-mcp Setup Please make sure to setup n8n locally with MCP Servers by navigating to n8n-nodes-mcp Please make sure to install the Bright Data MCP Server @brightdata/mcp on your local machine. Sign up at Bright Data. Navigate to Proxies & Scraping and create a new Web Unlocker zone by selecting Web Unlocker API under Scraping Solutions. Create a Web Unlocker proxy zone called mcp_unlocker on Bright Data control panel. In n8n, configure the OpenAi account credentials. In n8n, configure the credentials to connect with MCP Client (STDIO) account with the Bright Data MCP Server as shown below. Make sure to copy the Bright Data APITOKEN within the Environments textbox above as APITOKEN=<your-token>. Update the Set input fields for candidate resume, keywords and other filtering criteria's. Update the Webhook HTTP Request node with the Webhook endpoint of your choice. Update the file name and path to persist on disk. How to customize this workflow to your needs Target Different Job Boards Set input fields with the sites like Indeed, ZipRecruiter, or Monster Customize Matching Criteria Adjust the prompt inside the AI Job Match node Include scoring metrics like skills match %, experience relevance, or cultural fit Automate Scheduling Use a Cron Node to periodically check for new jobs matching a profile Set triggers based on webhook or input form submissions Output Customization Add Markdown/PDF formatting for report summaries Extend with Google Sheets export for internal analytics Enhance Data Security Mask personal info before sending to external endpoints

Ranjan DailataBy Ranjan Dailata
2888

Create AI-ready vector datasets for LLMs with Bright Data, Gemini & Pinecone

Who this is for? This workflow enables automated, scalable collection of high-quality, AI-ready data from websites using Bright Data’s Web Unlocker, with a focus on preparing that data for LLM training. Leveraging LLM Chains and AI agents, the system formats and extracts key information, then stores the structured embeddings in a Pinecone vector database. This workflow is tailored for:​ ML Engineers & Researchers building or fine-tuning domain-specific LLMs. AI Startups needing clean, structured content for product training. Data Teams preparing knowledge bases for enterprise-grade AI apps. LLM-as-a-Service Providers sourcing dynamic web content across niches. What problem is this workflow solving? Training a large language model (LLM) requires vast amounts of clean, relevant, and structured data. Manual collection is slow, error-prone, and lacks scalability. This workflow: Automatically extracts web data from specified URLs. Bypasses anti-bot measures using Bright Data’s Web Unlocker. Formats, cleans, and transforms raw content using LLM agents. Stores semantically searchable vectors in Pinecone. Makes datasets AI-ready for fine-tuning, RAG, or domain-specific training. What this workflow does This workflow automates the process of collecting, cleaning, and vectorizing web content to create structured, high-quality datasets that are ready to be used for LLM (Large Language Model) training or retrieval-augmented generation (RAG). Web Crawling with Bright Data Web Unlocker. AI Information Extraction and Data Formatting. AI Data Formatting to produce a JSON structured data. Persistence in Pinecone Vector DB. Handle Webhook notification of structured data. Setup Sign up at Bright Data. Navigate to Proxies & Scraping and create a new Web Unlocker zone by selecting Web Unlocker API under Scraping Solutions. In n8n, configure the Header Auth account under Credentials (Generic Auth Type: Header Authentication). The Value field should be set with the Bearer XXXXXXXXXXXXXX. The XXXXXXXXXXXXXX should be replaced by the Web Unlocker Token. A Google Gemini API key (or access through Vertex AI or proxy). Update the LinkedIn URL by navigating to the Set LinkedIn URL node. Update the Set Fields - URL and Webhook URL node with the URL for web data extraction and the Webhook notification URL. How to customize this workflow to your needs Set Your Target URLs. Target sites that are high-quality, domain-specific, and relevant to your LLM's purpose. Adjust Bright Data Web Unlocker Settings. Geo-location, Headers / User-Agent strings, Retry rules and proxies. Modify the Information Extraction Logic. Change prompts to extract specific attributes. Use structured templates or few-shot examples in prompts. Swap the Embedding Model. Use OpenAI, Hugging Face or other your own hosted embedding model API. Customize Pinecone Metadata Fields. Store extra fields in Pinecone for better filtering & semantic querying. Add Data Validation or Deduplication. Skip duplicates or low-quality content.

Ranjan DailataBy Ranjan Dailata
2395

Search & summarize web data with Perplexity, Gemini AI & Bright Data to webhooks

Who this is for? This workflow is designed for professionals and teams who need real-time, structured insights from Perplexity Search results without manual effort. What problem is this workflow solving? This n8n workflow solves the problem of automating Perplexity Search result extraction, cleanup, summarization, and AI-enhanced formatting for downstream use like sending the results to a webhook or another system. What this workflow does Automates Perplexity Search via Bright Data Uses Bright Data’s proxy-based SERP API to run a Google Search query programmatically. Makes the process repeatable and scriptable with different search terms and regions/zones. Cleans and Extracts Useful Content The Readable Data Extractor uses LLM-based cleaning to remove HTML/CSS/JS from the response and extract pure text data. Converts messy, unstructured web content into structured, machine-readable format. Summarizes Search Results Through the Gemini Flash + Summarization Chain, it generates a concise summary of the search results. Ideal for users who don’t have time to read full pages of search results. Formats Data Using AI Agent The AI Agent acts like a virtual assistant that: - Understands search results Formats them in a readable, JSON-compatible form Prepares them for webhook delivery Delivers Results to Webhook Sends the final summary + structured search result to a webhook (could be your app, a Slack bot, Google Sheets, or CRM). Setup Sign up at Bright Data. Navigate to Proxies & Scraping and create a new Web Unlocker zone by selecting Web Unlocker API under Scraping Solutions. In n8n, configure the Header Auth account under Credentials (Generic Auth Type: Header Authentication). The Value field should be set with the Bearer XXXXXXXXXXXXXX. The XXXXXXXXXXXXXX should be replaced by the Web Unlocker Token. In n8n, configure the Google Gemini(PaLM) Api account with the Google Gemini API key (or access through Vertex AI or proxy). Update the Perplexity Search Request node with the prompt you wish to perform the search. Update the Webhook HTTP Request node with the Webhook endpoint of your choice. How to customize this workflow to your needs Change the Perplexity Search Input Default: It searches a fixed query or dataset. Customize: Accept input from a Google Sheet, Airtable, or a form. Auto-trigger searches based on keywords or schedules. Customize Summarization Style (LLM Output) Default: General summary using Google Gemini or OpenAI. Customize: Add tone: formal, casual, technical, executive-summary, etc. Focus on specific sections: pricing, competitors, FAQs, etc. Translate the summaries into multiple languages. Add bullet points, pros/cons, or insight tags. 3.Choose Where the Results Go Options: Email, Slack, Notion, Airtable, Google Docs, or a dashboard. Auto-create content drafts for WordPress or newsletters. Feed into CRM notes or attach to Salesforce leads.

Ranjan DailataBy Ranjan Dailata
2087

Legal Case Research Extractor, Data Miner with Bright Data MCP & Google Gemini

Notice Community nodes can only be installed on self-hosted instances of n8n. Who this is for The Legal Case Research Extractor is a powerful automated workflow designed for legal tech teams, researchers, law firms, and data scientists focused on transforming unstructured legal case data into actionable, structured insights. This workflow is tailored for: Legal Researchers automating case law data mining Litigation Support Teams handling large volumes of case records LawTech Startups building AI-powered legal research assistants Compliance Analysts extracting case-specific insights AI Developers working on legal NLP, summarization, and search engines What problem is this workflow solving? Legal case data is often locked in semi-structured or raw HTML formats, scattered across jurisdiction-specific websites. Manually extracting and processing this data is tedious and inefficient. This workflow automates: Extraction of legal case data via Bright Data's powerful MCP infrastructure Parsing of HTML into clean, readable text using Google Gemini LLM Structuring and delivering the output through webhook and file storage What this workflow does Input Set the Legal Case Research URL node is responsible for setting the legal case URL for the data extraction. Bright Data MCP Data Extractor Bright Data MCP Client For Legal Case Research node is responsible for the legal case extraction via the Bright Data MCP tool - scrapeashtml Case Extractor Google Gemini based Case Extractor is responsible for producing a paginated list of cases Loop through Legal Case URLs Receives a collection of legal case links to process Each URL represents a different case from a target legal website Bright Data MCP Scraping Utilizes Bright Data’s scrapeashtml MCP mode Retrieves raw HTML content of each legal case Google Gemini LLM Extraction Transforms raw HTML into clean, structured text Performs additional information extraction if required (e.g., case summary, court, jurisdiction etc.) Webhook Notification Sends extracted legal case content to a configurable webhook URL Enables downstream processing or storage in legal databases Binary Conversion & File Persistence Converts the structured text to binary format Saves the final response to disk for archival or further processing Pre-conditions Knowledge of Model Context Protocol (MCP) is highly essential. Please read this blog post - model-context-protocol You need to have the Bright Data account and do the necessary setup as mentioned in the Setup section below. You need to have the Google Gemini API Key. Visit Google AI Studio You need to install the Bright Data MCP Server @brightdata/mcp You need to install the n8n-nodes-mcp Setup Please make sure to setup n8n locally with MCP Servers by navigating to n8n-nodes-mcp Please make sure to install the Bright Data MCP Server @brightdata/mcp on your local machine. Sign up at Bright Data. Create a Web Unlocker proxy zone called mcp_unlocker on Bright Data control panel. Navigate to Proxies & Scraping and create a new Web Unlocker zone by selecting Web Unlocker API under Scraping Solutions. In n8n, configure the Google Gemini(PaLM) Api account with the Google Gemini API key (or access through Vertex AI or proxy). In n8n, configure the credentials to connect with MCP Client (STDIO) account with the Bright Data MCP Server as shown below. Make sure to copy the Bright Data APITOKEN within the Environments textbox above as APITOKEN=<your-token> How to customize this workflow to your needs Target New Legal Portals Modify the legal case input URLs to scrape from different state or federal case databases Customize LLM Extraction Modify the prompt to extract specific fields: case number, plaintiff, case summary, outcome, legal precedents etc. Add a summarization step if needed Enhance Loop Handling Integrate with a Google Sheet or API to dynamically fetch case URLs Add error handling logic to skip failed cases and log them Improve Security & Compliance Redact sensitive information before sending via webhook Store processed case data in encrypted cloud storage Output Formats Save as PDF, JSON, or Markdown Enable output to cloud storage (S3, Google Drive) or legal document management systems

Ranjan DailataBy Ranjan Dailata
1744

TrustPilot SaaS product review tracker with Bright Data & OpenAI

Who this is for The TrustPilot SaaS Product Review Tracker is designed for product managers, SaaS growth teams, customer experience analysts, and marketing teams who need to extract, summarize, and analyze customer feedback at scale from TrustPilot. This workflow is tailored for: Product Managers - Monitoring feedback to drive feature improvements Customer Support & CX Teams - Identifying sentiment trends or recurring issues Marketing & Growth Teams - Leveraging testimonials and market perception Data Analysts - Tracking competitor reviews and benchmarking Founders & Executives - Wanting aggregated insights into customer satisfaction What problem is this workflow solving? Manually monitoring, extracting, and summarizing TrustPilot reviews is time-consuming, fragmented, and hard to scale across multiple SaaS products. This workflow automates that process from unlocking the data behind anti-bot layers to summarizing and storing customer insights enabling teams to respond faster, spot trends, and make data-backed product decisions. This workflow solves: The challenge of scraping protected review data (using Bright Data Web Unlocker) The need for structured insights from unstructured review content The lack of automated delivery to storage and alerting systems like Google Sheets or webhooks What this workflow does Extract TrustPilot Reviews: Uses Bright Data Web Unlocker to bypass anti-bot protections and pull markdown-based content from product review pages Convert Markdown to Text: Leverages a basic LLM chain to clean and convert scraped markdown into plain text Structured Information Extraction: Uses OpenAI GPT-4o via the Information Extractor node to extract fields like product name, review date, rating, and reviewer sentiment Summarization Chain: Generates concise summaries of overall review sentiment and themes using OpenAI Merge & Aggregate Output: Consolidates individual extracted records into a structured batch output Outbound Data Delivery: Google Sheets – Appends summary and structured review data Write to Disk – Persists raw and processed content locally Webhook Notification – Sends a real-time alert with summarized insights Pre-conditions You need to have a Bright Data account and do the necessary setup as mentioned in the "Setup" section below. You need to have an OpenAI Account. Setup Sign up at Bright Data. Navigate to Proxies & Scraping and create a new Web Unlocker zone by selecting Web Unlocker API under Scraping Solutions. In n8n, configure the Header Auth account under Credentials (Generic Auth Type: Header Authentication). The Value field should be set with the Bearer XXXXXXXXXXXXXX. The XXXXXXXXXXXXXX should be replaced by the Web Unlocker Token. In n8n, Configure the Google Sheet Credentials with your own account. Follow this documentation - Set Google Sheet Credential In n8n, configure the OpenAi account credentials. Ensure the URL and Bright Data zone name are correctly set in the Set URL, Filename and Bright Data Zone node. Set the desired local path in the Write a file to disk node to save the responses. How to customize this workflow to your needs Target Multiple Products : Configure the Bright Data input URL dynamically for different SaaS product TrustPilot URLs Loop through a product list and run parallel jobs for each Customize Extraction Fields : Update the prompt in the Information Extractor to include: Review title Response from company Specific feature mentions Competitor references Tune Summarization Style Change tone: executive summary, customer pain-point focus, or marketing quote extract Enable sentiment aggregation (e.g., 30% negative, 50% neutral, 20% positive) Expand Output Destinations Push to Notion, Airtable, or CRM tools using additional webhook nodes Generate and send PDF reports (via PDFKit or HTML-to-PDF nodes) Schedule summary digests via Gmail or Slack

Ranjan DailataBy Ranjan Dailata
1128