Yelp business scraper by URL with Bright Data API and Google Sheets

323 views

2/3/2026

Google Sheets Lead Management Data Sync Salesforce CRM

Yelp Business Scraper by URL via Bright Data API with Google Sheets Storage

Overview

This n8n workflow automates the process of scraping comprehensive business information from Yelp using individual business URLs. It integrates with Bright Data for professional web scraping and Google Sheets for centralized data storage, providing detailed business intelligence for market research, competitor analysis, and lead generation.

Workflow Components

1. 📥 Form Trigger

Type: Form Trigger
Purpose: Initiates the workflow with user-submitted Yelp business URL
Input Fields:
- URL (Yelp business page URL)
Function: Captures target business URL to start the scraping process

2. 🔍 Trigger Bright Data Scrape

Type: HTTP Request (POST)
Purpose: Sends scraping request to Bright Data API for Yelp business data
Endpoint: https://api.brightdata.com/datasets/v3/trigger
Parameters:
- Dataset ID: gd_lgugwl0519h1p14rwk
- Include errors: true
- Limit multiple results: 5
- Limit per input: 20
Function: Initiates comprehensive business data extraction from Yelp

3. 📡 Monitor Snapshot Status

Type: HTTP Request (GET)
Purpose: Monitors the progress of the Yelp scraping job
Endpoint: https://api.brightdata.com/datasets/v3/progress/{snapshot_id}
Function: Checks if the business data scraping is complete

4. ⏳ Wait 30 Sec for Snapshot

Type: Wait Node
Purpose: Implements intelligent polling mechanism
Duration: 30 seconds
Function: Pauses workflow before rechecking scraping status to optimize API usage

5. 🔁 Retry Until Ready

Type: IF Condition
Purpose: Evaluates scraping completion status
Condition: status === "ready"
Logic:
- True: Proceeds to data retrieval
- False: Loops back to status monitoring with wait

6. 📥 Fetch Scraped Business Data

Type: HTTP Request (GET)
Purpose: Retrieves the final scraped business information
Endpoint: https://api.brightdata.com/datasets/v3/snapshot/{snapshot_id}
Format: JSON
Function: Downloads completed Yelp business data with comprehensive details

7. 📊 Store to Google Sheet

Type: Google Sheets Node
Purpose: Stores scraped business data for analysis and storage
Operation: Append rows
Target: "Yelp scraper data by URL" sheet
Data Mapping:
- Business Name, Overall Rating, Reviews Count
- Business URL, Images/Videos URLs
- Additional business metadata fields

Workflow Flow

Form Input → Trigger Scrape → Monitor Status → Wait 30s → Check Ready
                                    ↑              ↓
                                    └─── Loop ─────┘
                                           ↓
                              Fetch Data → Store to Sheet

Configuration Requirements

API Keys & Credentials

Bright Data API Key: Required for Yelp business scraping
Google Sheets OAuth2: For data storage and export access
n8n Form Webhook: For user input collection

Setup Parameters

Google Sheet ID: Target spreadsheet identifier
Dataset ID: gd_lgugwl0519h1p14rwk (Yelp business scraper)
Form Webhook ID: User input form identifier
Google Sheets Credential ID: OAuth2 authentication

Key Features

Comprehensive Business Data Extraction

Complete business profile information
Customer ratings and review counts
Contact details and business hours
Photo and video content URLs
Location and category information

Intelligent Status Monitoring

Real-time scraping progress tracking
Automatic retry mechanisms with 30-second intervals
Status validation before data retrieval
Error handling and timeout management

Centralized Data Storage

Automatic Google Sheets export
Organized business data format
Historical scraping records
Easy sharing and collaboration

URL-Based Processing

Direct Yelp business URL input
Single business deep-dive analysis
Flexible input through web form
Real-time workflow triggering

Use Cases

Market Research

Competitor business analysis
Local market intelligence gathering
Industry benchmark establishment
Service offering comparison

Lead Generation

Business contact information extraction
Potential client identification
Market opportunity assessment
Sales prospect development

Business Intelligence

Customer sentiment analysis through ratings
Competitor performance monitoring
Market positioning research
Brand reputation tracking

Location Analysis

Geographic business distribution
Local competition assessment
Market saturation evaluation
Expansion opportunity identification

Data Output Fields

| Field | Description | Example | |-------|-------------|---------| | Name | Business name | "Joe's Pizza Restaurant" | | Overall Rating | Average customer rating | "4.5" | | Reviews Count | Total number of reviews | "247" | | URL | Original Yelp business URL | "https://www.yelp.com/biz/joes-pizza..." | | Images/Videos URLs | Media content links | "https://s3-media1.fl.yelpcdn.com/..." |

Technical Notes

Polling Interval: 30-second status checks for optimal API usage
Result Limiting: Maximum 20 businesses per input, 5 multiple results
Data Format: JSON with structured field mapping
Error Handling: Comprehensive error tracking in all API requests
Retry Logic: Automatic status rechecking until completion
Form Input: Single URL field with validation
Storage Format: Structured Google Sheets with predefined columns

Setup Instructions

Step 1: Import Workflow

Copy the JSON workflow configuration
Import into n8n: Workflows → Import from JSON
Paste configuration and save

Step 2: Configure Bright Data

Set up credentials:
- Navigate to Credentials → Add Bright Data API
- Enter your Bright Data API key
- Test connection
Update API key references:
- Replace BRIGHT_DATA_API_KEY in all HTTP request nodes
- Verify dataset access for gd_lgugwl0519h1p14rwk

Step 3: Configure Google Sheets

Create target spreadsheet:
- Create new Google Sheet named "Yelp Business Data" or similar
- Copy the Sheet ID from URL
Set up OAuth2 credentials:
- Add Google Sheets OAuth2 credential in n8n
- Complete authentication process
Update workflow references:
- Replace YOUR_GOOGLE_SHEET_ID with actual Sheet ID
- Update YOUR_GOOGLE_SHEETS_CREDENTIAL_ID with credential reference

Step 4: Test and Activate

Test with sample URL:
- Use a known Yelp business URL
- Monitor execution progress
- Verify data appears in Google Sheet
Activate workflow:
- Toggle workflow to "Active"
- Share form URL with users

Sample Business Data

The workflow captures comprehensive business information including:

Basic Information: Name, category, location
Performance Metrics: Ratings, review counts, popularity
Contact Details: Phone, website, address
Visual Content: Photos, videos, gallery URLs
Operational Data: Hours, services, amenities
Customer Feedback: Review summaries, sentiment indicators

Advanced Configuration

Batch Processing

Modify the input to accept multiple URLs:

[
  {"url": "https://www.yelp.com/biz/business-1"},
  {"url": "https://www.yelp.com/biz/business-2"},
  {"url": "https://www.yelp.com/biz/business-3"}
]

Enhanced Data Fields

Add more extraction fields by updating the dataset configuration:

Business hours and schedule
Menu items and pricing
Customer photos and reviews
Special offers and promotions

Notification Integration

Add alert mechanisms:

Email notifications for completed scrapes
Slack messages for team updates
Webhook triggers for external systems

Error Handling

Common Issues

Invalid URL: Ensure URL is a valid Yelp business page
Rate Limiting: Bright Data API usage limits exceeded
Authentication: Google Sheets or Bright Data credential issues
Data Format: Unexpected response structure from Yelp

Troubleshooting Steps

Verify URLs: Ensure Yelp business URLs are correctly formatted
Check Credentials: Validate all API keys and OAuth tokens
Monitor Logs: Review n8n execution logs for detailed errors
Test Connectivity: Verify network access to all external services

Performance Specifications

Processing Time: 2-5 minutes per business URL
Data Accuracy: 95%+ for publicly available business information
Success Rate: 90%+ for valid Yelp business URLs
Concurrent Processing: Depends on Bright Data plan limits
Storage Capacity: Unlimited (Google Sheets based)

**For any questions or support, please contact:
info@incrementors.com
or fill out this form: https://www.incrementors.com/contact-us/

n8n Yelp Business Scraper by URL with Bright Data API and Google Sheets

This n8n workflow automates the process of scraping Yelp business data using a provided URL, leveraging the Bright Data API for web scraping, and then storing the extracted information into a Google Sheet. It's designed to streamline the collection of business details from Yelp for analysis, lead generation, or other data-driven tasks.

What it does

This workflow simplifies the following steps:

Triggers on Form Submission: The workflow starts when a new submission is made to an n8n form. This form is expected to contain the Yelp business URL(s) to be scraped.
Scrapes Yelp Business Data: It sends an HTTP request to the Bright Data API, passing the Yelp business URL from the form submission. Bright Data then handles the actual scraping of the Yelp page, extracting relevant business information.
Appends Data to Google Sheet: The scraped data received from Bright Data is then appended as new rows to a specified Google Sheet, organizing the collected information for easy access and further processing.

Prerequisites/Requirements

To use this workflow, you will need:

n8n Instance: A running n8n instance (cloud or self-hosted).
Google Account: A Google account with access to Google Sheets. You will need to set up Google Sheets credentials in n8n.
Bright Data Account: An active Bright Data account with API access. You will need to configure HTTP Request credentials in n8n for Bright Data.

Setup/Usage

Import the Workflow:
- Download the provided JSON file for this workflow.
- In your n8n instance, click on "Workflows" in the left sidebar.
- Click "New" -> "Import from JSON" and paste the workflow JSON or upload the file.
Configure Credentials:
- Google Sheets: Locate the "Google Sheets" node. You will need to create or select an existing Google Sheets OAuth2 credential. Follow the n8n documentation for setting up Google Sheets credentials if you haven't already.
- Bright Data API: Locate the "HTTP Request" node. You will need to create or select an existing HTTP Basic Auth or API Key credential for Bright Data. Configure it with your Bright Data API key and secret.
Configure the n8n Form Trigger:
- Open the "On form submission" node.
- Customize the form fields if necessary to ensure it collects the Yelp business URL(s) you intend to scrape. The default setup expects a field that provides the URL.
- Activate the workflow by toggling the "Active" switch in the top right corner.
Test the Workflow:
- Access the public URL of the n8n form trigger (you can find this in the "On form submission" node settings).
- Submit a Yelp business URL through the form.
- Observe the workflow execution in n8n and check your Google Sheet for the newly added data.
Specify Google Sheet:
- In the "Google Sheets" node, ensure you specify the correct "Spreadsheet ID" and "Sheet Name" where you want the scraped data to be appended.

Related Templates

Generate song lyrics and music from text prompts using OpenAI and Fal.ai Minimax

Spark your creativity instantly in any chat—turn a simple prompt like "heartbreak ballad" into original, full-length lyrics and a professional AI-generated music track, all without leaving your conversation. 📋 What This Template Does This chat-triggered workflow harnesses AI to generate detailed, genre-matched song lyrics (at least 600 characters) from user messages, then queues them for music synthesis via Fal.ai's minimax-music model. It polls asynchronously until the track is ready, delivering lyrics and audio URL back in chat. Crafts original, structured lyrics with verses, choruses, and bridges using OpenAI Submits to Fal.ai for melody, instrumentation, and vocals aligned to the style Handles long-running generations with smart looping and status checks Returns complete song package (lyrics + audio link) for seamless sharing 🔧 Prerequisites n8n account (self-hosted or cloud with chat integration enabled) OpenAI account with API access for GPT models Fal.ai account for AI music generation 🔑 Required Credentials OpenAI API Setup Go to platform.openai.com → API keys (sidebar) Click "Create new secret key" → Name it (e.g., "n8n Songwriter") Copy the key and add to n8n as "OpenAI API" credential type Test by sending a simple chat completion request Fal.ai HTTP Header Auth Setup Sign up at fal.ai → Dashboard → API Keys Generate a new API key → Copy it In n8n, create "HTTP Header Auth" credential: Name="Fal.ai", Header Name="Authorization", Header Value="Key [Your API Key]" Test with a simple GET to their queue endpoint (e.g., /status) ⚙️ Configuration Steps Import the workflow JSON into your n8n instance Assign OpenAI API credentials to the "OpenAI Chat Model" node Assign Fal.ai HTTP Header Auth to the "Generate Music Track", "Check Generation Status", and "Fetch Final Result" nodes Activate the workflow—chat trigger will appear in your n8n chat interface Test by messaging: "Create an upbeat pop song about road trips" 🎯 Use Cases Content Creators: YouTubers generating custom jingles for videos on the fly, streamlining production from idea to audio export Educators: Music teachers using chat prompts to create era-specific folk tunes for classroom discussions, fostering interactive learning Gift Personalization: Friends crafting anniversary R&B tracks from shared memories via quick chats, delivering emotional audio surprises Artist Brainstorming: Songwriters prototyping hip-hop beats in real-time during sessions, accelerating collaboration and iteration ⚠️ Troubleshooting Invalid JSON from AI Agent: Ensure the system prompt stresses valid JSON; test the agent standalone with a sample query Music Generation Fails (401/403): Verify Fal.ai API key has minimax-music access; check usage quotas in dashboard Status Polling Loops Indefinitely: Bump wait time to 45-60s for complex tracks; inspect fal.ai queue logs for bottlenecks Lyrics Under 600 Characters: Tweak agent prompt to enforce fuller structures like [V1][C][V2][B][C]; verify output length in executions

By Daniel Nkencho

601

Synchronizing WooCommerce inventory and creating products with Google Gemini AI and BrowserAct

Synchronize WooCommerce Inventory & Create Products with Gemini AI & BrowserAct This sophisticated n8n template automates WooCommerce inventory management by scraping supplier data, updating existing products, and intelligently creating new ones with AI-formatted descriptions. This workflow is essential for e-commerce operators, dropshippers, and inventory managers who need to ensure their product pricing and stock levels are synchronized with multiple third-party suppliers, minimizing overselling and maximizing profit. --- Self-Hosted Only This Workflow uses a community contribution and is designed and tested for self-hosted n8n instances only. --- How it works The workflow is typically run by a Schedule Trigger (though a Manual Trigger is also shown) to check stock automatically. It reads a list of suppliers and their inventory page URLs from a central Google Sheet. The workflow loops through each supplier: A BrowserAct node scrapes the current stock and price data from the supplier's inventory page. A Code node parses this bulk data into individual product items. It then loops through each individual product found. The workflow checks WooCommerce to see if the product already exists based on its name. If the product exists: It proceeds to update the existing product's price and stock quantity. If the product DOES NOT exist: An If node checks if the missing product's category matches a predefined type (optional filtering). If it passes the filter, a second BrowserAct workflow scrapes detailed product attributes from a dedicated product page (e.g., DigiKey). An AI Agent (Gemini) transforms these attributes into a specific, styled HTML table for the product description. Finally, the product is created in WooCommerce with all scraped details and the AI-generated description. Error Handling: Multiple Slack nodes are configured to alert your team immediately if any scraping task fails or if the product update/creation process encounters an issue. Note: This workflow does not support image uploads for new products. To enable this functionality, you must modify both the n8n and BrowserAct workflows. --- Requirements BrowserAct API account for web scraping BrowserAct n8n Community Node -> (n8n Nodes BrowserAct) BrowserAct templates named “WooCommerce Inventory & Stock Synchronization” and “WooCommerce Product Data Reconciliation” Google Sheets credentials for the supplier list WooCommerce credentials for product management Google Gemini account for the AI Agent Slack credentials for error alerts --- Need Help? How to Find Your BrowseAct API Key & Workflow ID How to Connect n8n to Browseract How to Use & Customize BrowserAct Templates How to Use the BrowserAct N8N Community Node --- Workflow Guidance and Showcase STOP Overselling! Auto-Sync WooCommerce Inventory from ANY Supplier

By Madame AI Team | Kai

600

Automate RSS to social media pipeline with AI, Airtable & GetLate for multiple platforms

Overview Automates your complete social media content pipeline: sources articles from Wallabag RSS, generates platform-specific posts with AI, creates contextual images, and publishes via GetLate API. Built with 63 nodes across two workflows to handle LinkedIn, Instagram, and Bluesky—with easy expansion to more platforms. Ideal for: Content marketers, solo creators, agencies, and community managers maintaining a consistent multi-platform presence with minimal manual effort. How It Works Two-Workflow Architecture: Content Aggregation Workflow Monitors Wallabag RSS feeds for tagged articles (to-share-linkedin, to-share-instagram, etc.) Extracts and converts content from HTML to Markdown Stores structured data in Airtable with platform assignment AI Generation & Publishing Workflow Scheduled trigger queries Airtable for unpublished content Routes to platform-specific sub-workflows (LinkedIn, Instagram, Bluesky) LLM generates optimized post text and image prompts based on custom brand parameters Optionally generates AI images and hosts them on Imgbb CDN Publishes via GetLate API (immediate or draft mode) Updates Airtable with publication status and metadata Key Features: Tag-based content routing using Wallabag's native system Swappable AI providers (Groq, OpenAI, Anthropic) Platform-specific optimization (tone, length, hashtags, CTAs) Modular design—duplicate sub-workflows to add new platforms in \~30 minutes Centralized Airtable tracking with 17 data points per post Set Up Steps Setup time: \~45-60 minutes for initial configuration Create accounts and get API keys (\~15 min) Wallabag (with RSS feeds enabled) GetLate (social media publishing) Airtable (create base with provided schema—see sticky notes) LLM provider (Groq, OpenAI, or Anthropic) Image service (Hugging Face, Fal.ai, or Stability AI) Imgbb (image hosting) Configure n8n credentials (\~10 min) Add all API keys in n8n's credential manager Detailed credential setup instructions in workflow sticky notes Set up Airtable database (\~10 min) Create "RSS Feed - Content Store" base Add 19 required fields (schema provided in workflow sticky notes) Get Airtable base ID and API key Customize brand prompts (\~15 min) Edit "Set Custom SMCG Prompt" node for each platform Define brand voice, tone, goals, audience, and image preferences Platform-specific examples provided in sticky notes Configure platform settings (\~10 min) Set GetLate account IDs for each platform Enable/disable image generation per platform Choose immediate publish vs. draft mode Adjust schedule trigger frequency Test and deploy Tag test articles in Wallabag Monitor the first few executions in draft mode Activate workflows when satisfied with the output Important: This is a proof-of-concept template. Test thoroughly with draft mode before production use. Detailed setup instructions, troubleshooting tips, and customization guidance are in the workflow's sticky notes. Technical Details 63 nodes: 9 Airtable operations, 8 HTTP requests, 7 code nodes, 3 LangChain LLM chains, 3 RSS triggers, 3 GetLate publishers Supports: Multiple LLM providers, multiple image generation services, unlimited platforms via modular architecture Tracking: 17 metadata fields per post, including publish status, applied parameters, character counts, hashtags, image URLs Prerequisites n8n instance (self-hosted or cloud) Accounts: Wallabag, GetLate, Airtable, LLM provider, image generation service, Imgbb Basic understanding of n8n workflows and credential configuration Time to customize prompts for your brand voice Detailed documentation, Airtable schema, prompt examples, and troubleshooting guides are in the workflow's sticky notes. Category Tags social-media-automation, ai-content-generation, rss-to-social, multi-platform-posting, getlate-api, airtable-database, langchain, workflow-automation, content-marketing

By Mikal Hayden-Gates

188