Yelp business scraper by URL with Bright Data API and Google Sheets
Yelp Business Scraper by URL via Bright Data API with Google Sheets Storage
Overview
This n8n workflow automates the process of scraping comprehensive business information from Yelp using individual business URLs. It integrates with Bright Data for professional web scraping and Google Sheets for centralized data storage, providing detailed business intelligence for market research, competitor analysis, and lead generation.
Workflow Components
1. 📥 Form Trigger
- Type: Form Trigger
- Purpose: Initiates the workflow with user-submitted Yelp business URL
- Input Fields:
- URL (Yelp business page URL)
- Function: Captures target business URL to start the scraping process
2. 🔍 Trigger Bright Data Scrape
- Type: HTTP Request (POST)
- Purpose: Sends scraping request to Bright Data API for Yelp business data
- Endpoint:
https://api.brightdata.com/datasets/v3/trigger - Parameters:
- Dataset ID:
gd_lgugwl0519h1p14rwk - Include errors: true
- Limit multiple results: 5
- Limit per input: 20
- Dataset ID:
- Function: Initiates comprehensive business data extraction from Yelp
3. 📡 Monitor Snapshot Status
- Type: HTTP Request (GET)
- Purpose: Monitors the progress of the Yelp scraping job
- Endpoint:
https://api.brightdata.com/datasets/v3/progress/{snapshot_id} - Function: Checks if the business data scraping is complete
4. ⏳ Wait 30 Sec for Snapshot
- Type: Wait Node
- Purpose: Implements intelligent polling mechanism
- Duration: 30 seconds
- Function: Pauses workflow before rechecking scraping status to optimize API usage
5. 🔁 Retry Until Ready
- Type: IF Condition
- Purpose: Evaluates scraping completion status
- Condition:
status === "ready" - Logic:
- True: Proceeds to data retrieval
- False: Loops back to status monitoring with wait
6. 📥 Fetch Scraped Business Data
- Type: HTTP Request (GET)
- Purpose: Retrieves the final scraped business information
- Endpoint:
https://api.brightdata.com/datasets/v3/snapshot/{snapshot_id} - Format: JSON
- Function: Downloads completed Yelp business data with comprehensive details
7. 📊 Store to Google Sheet
- Type: Google Sheets Node
- Purpose: Stores scraped business data for analysis and storage
- Operation: Append rows
- Target: "Yelp scraper data by URL" sheet
- Data Mapping:
- Business Name, Overall Rating, Reviews Count
- Business URL, Images/Videos URLs
- Additional business metadata fields
Workflow Flow
Form Input → Trigger Scrape → Monitor Status → Wait 30s → Check Ready
↑ ↓
└─── Loop ─────┘
↓
Fetch Data → Store to Sheet
Configuration Requirements
API Keys & Credentials
- Bright Data API Key: Required for Yelp business scraping
- Google Sheets OAuth2: For data storage and export access
- n8n Form Webhook: For user input collection
Setup Parameters
- Google Sheet ID: Target spreadsheet identifier
- Dataset ID:
gd_lgugwl0519h1p14rwk(Yelp business scraper) - Form Webhook ID: User input form identifier
- Google Sheets Credential ID: OAuth2 authentication
Key Features
Comprehensive Business Data Extraction
- Complete business profile information
- Customer ratings and review counts
- Contact details and business hours
- Photo and video content URLs
- Location and category information
Intelligent Status Monitoring
- Real-time scraping progress tracking
- Automatic retry mechanisms with 30-second intervals
- Status validation before data retrieval
- Error handling and timeout management
Centralized Data Storage
- Automatic Google Sheets export
- Organized business data format
- Historical scraping records
- Easy sharing and collaboration
URL-Based Processing
- Direct Yelp business URL input
- Single business deep-dive analysis
- Flexible input through web form
- Real-time workflow triggering
Use Cases
Market Research
- Competitor business analysis
- Local market intelligence gathering
- Industry benchmark establishment
- Service offering comparison
Lead Generation
- Business contact information extraction
- Potential client identification
- Market opportunity assessment
- Sales prospect development
Business Intelligence
- Customer sentiment analysis through ratings
- Competitor performance monitoring
- Market positioning research
- Brand reputation tracking
Location Analysis
- Geographic business distribution
- Local competition assessment
- Market saturation evaluation
- Expansion opportunity identification
Data Output Fields
| Field | Description | Example | |-------|-------------|---------| | Name | Business name | "Joe's Pizza Restaurant" | | Overall Rating | Average customer rating | "4.5" | | Reviews Count | Total number of reviews | "247" | | URL | Original Yelp business URL | "https://www.yelp.com/biz/joes-pizza..." | | Images/Videos URLs | Media content links | "https://s3-media1.fl.yelpcdn.com/..." |
Technical Notes
- Polling Interval: 30-second status checks for optimal API usage
- Result Limiting: Maximum 20 businesses per input, 5 multiple results
- Data Format: JSON with structured field mapping
- Error Handling: Comprehensive error tracking in all API requests
- Retry Logic: Automatic status rechecking until completion
- Form Input: Single URL field with validation
- Storage Format: Structured Google Sheets with predefined columns
Setup Instructions
Step 1: Import Workflow
- Copy the JSON workflow configuration
- Import into n8n: Workflows → Import from JSON
- Paste configuration and save
Step 2: Configure Bright Data
-
Set up credentials:
- Navigate to Credentials → Add Bright Data API
- Enter your Bright Data API key
- Test connection
-
Update API key references:
- Replace
BRIGHT_DATA_API_KEYin all HTTP request nodes - Verify dataset access for
gd_lgugwl0519h1p14rwk
- Replace
Step 3: Configure Google Sheets
-
Create target spreadsheet:
- Create new Google Sheet named "Yelp Business Data" or similar
- Copy the Sheet ID from URL
-
Set up OAuth2 credentials:
- Add Google Sheets OAuth2 credential in n8n
- Complete authentication process
-
Update workflow references:
- Replace
YOUR_GOOGLE_SHEET_IDwith actual Sheet ID - Update
YOUR_GOOGLE_SHEETS_CREDENTIAL_IDwith credential reference
- Replace
Step 4: Test and Activate
-
Test with sample URL:
- Use a known Yelp business URL
- Monitor execution progress
- Verify data appears in Google Sheet
-
Activate workflow:
- Toggle workflow to "Active"
- Share form URL with users
Sample Business Data
The workflow captures comprehensive business information including:
- Basic Information: Name, category, location
- Performance Metrics: Ratings, review counts, popularity
- Contact Details: Phone, website, address
- Visual Content: Photos, videos, gallery URLs
- Operational Data: Hours, services, amenities
- Customer Feedback: Review summaries, sentiment indicators
Advanced Configuration
Batch Processing
Modify the input to accept multiple URLs:
[
{"url": "https://www.yelp.com/biz/business-1"},
{"url": "https://www.yelp.com/biz/business-2"},
{"url": "https://www.yelp.com/biz/business-3"}
]
Enhanced Data Fields
Add more extraction fields by updating the dataset configuration:
- Business hours and schedule
- Menu items and pricing
- Customer photos and reviews
- Special offers and promotions
Notification Integration
Add alert mechanisms:
- Email notifications for completed scrapes
- Slack messages for team updates
- Webhook triggers for external systems
Error Handling
Common Issues
- Invalid URL: Ensure URL is a valid Yelp business page
- Rate Limiting: Bright Data API usage limits exceeded
- Authentication: Google Sheets or Bright Data credential issues
- Data Format: Unexpected response structure from Yelp
Troubleshooting Steps
- Verify URLs: Ensure Yelp business URLs are correctly formatted
- Check Credentials: Validate all API keys and OAuth tokens
- Monitor Logs: Review n8n execution logs for detailed errors
- Test Connectivity: Verify network access to all external services
Performance Specifications
- Processing Time: 2-5 minutes per business URL
- Data Accuracy: 95%+ for publicly available business information
- Success Rate: 90%+ for valid Yelp business URLs
- Concurrent Processing: Depends on Bright Data plan limits
- Storage Capacity: Unlimited (Google Sheets based)
**For any questions or support, please contact:
info@incrementors.com
or fill out this form: https://www.incrementors.com/contact-us/
n8n Yelp Business Scraper by URL with Bright Data API and Google Sheets
This n8n workflow automates the process of scraping Yelp business data using a provided URL, leveraging the Bright Data API for web scraping, and then storing the extracted information into a Google Sheet. It's designed to streamline the collection of business details from Yelp for analysis, lead generation, or other data-driven tasks.
What it does
This workflow simplifies the following steps:
- Triggers on Form Submission: The workflow starts when a new submission is made to an n8n form. This form is expected to contain the Yelp business URL(s) to be scraped.
- Scrapes Yelp Business Data: It sends an HTTP request to the Bright Data API, passing the Yelp business URL from the form submission. Bright Data then handles the actual scraping of the Yelp page, extracting relevant business information.
- Appends Data to Google Sheet: The scraped data received from Bright Data is then appended as new rows to a specified Google Sheet, organizing the collected information for easy access and further processing.
Prerequisites/Requirements
To use this workflow, you will need:
- n8n Instance: A running n8n instance (cloud or self-hosted).
- Google Account: A Google account with access to Google Sheets. You will need to set up Google Sheets credentials in n8n.
- Bright Data Account: An active Bright Data account with API access. You will need to configure HTTP Request credentials in n8n for Bright Data.
Setup/Usage
- Import the Workflow:
- Download the provided JSON file for this workflow.
- In your n8n instance, click on "Workflows" in the left sidebar.
- Click "New" -> "Import from JSON" and paste the workflow JSON or upload the file.
- Configure Credentials:
- Google Sheets: Locate the "Google Sheets" node. You will need to create or select an existing Google Sheets OAuth2 credential. Follow the n8n documentation for setting up Google Sheets credentials if you haven't already.
- Bright Data API: Locate the "HTTP Request" node. You will need to create or select an existing HTTP Basic Auth or API Key credential for Bright Data. Configure it with your Bright Data API key and secret.
- Configure the n8n Form Trigger:
- Open the "On form submission" node.
- Customize the form fields if necessary to ensure it collects the Yelp business URL(s) you intend to scrape. The default setup expects a field that provides the URL.
- Activate the workflow by toggling the "Active" switch in the top right corner.
- Test the Workflow:
- Access the public URL of the n8n form trigger (you can find this in the "On form submission" node settings).
- Submit a Yelp business URL through the form.
- Observe the workflow execution in n8n and check your Google Sheet for the newly added data.
- Specify Google Sheet:
- In the "Google Sheets" node, ensure you specify the correct "Spreadsheet ID" and "Sheet Name" where you want the scraped data to be appended.
Related Templates
Generate song lyrics and music from text prompts using OpenAI and Fal.ai Minimax
Spark your creativity instantly in any chat—turn a simple prompt like "heartbreak ballad" into original, full-length lyrics and a professional AI-generated music track, all without leaving your conversation. 📋 What This Template Does This chat-triggered workflow harnesses AI to generate detailed, genre-matched song lyrics (at least 600 characters) from user messages, then queues them for music synthesis via Fal.ai's minimax-music model. It polls asynchronously until the track is ready, delivering lyrics and audio URL back in chat. Crafts original, structured lyrics with verses, choruses, and bridges using OpenAI Submits to Fal.ai for melody, instrumentation, and vocals aligned to the style Handles long-running generations with smart looping and status checks Returns complete song package (lyrics + audio link) for seamless sharing 🔧 Prerequisites n8n account (self-hosted or cloud with chat integration enabled) OpenAI account with API access for GPT models Fal.ai account for AI music generation 🔑 Required Credentials OpenAI API Setup Go to platform.openai.com → API keys (sidebar) Click "Create new secret key" → Name it (e.g., "n8n Songwriter") Copy the key and add to n8n as "OpenAI API" credential type Test by sending a simple chat completion request Fal.ai HTTP Header Auth Setup Sign up at fal.ai → Dashboard → API Keys Generate a new API key → Copy it In n8n, create "HTTP Header Auth" credential: Name="Fal.ai", Header Name="Authorization", Header Value="Key [Your API Key]" Test with a simple GET to their queue endpoint (e.g., /status) ⚙️ Configuration Steps Import the workflow JSON into your n8n instance Assign OpenAI API credentials to the "OpenAI Chat Model" node Assign Fal.ai HTTP Header Auth to the "Generate Music Track", "Check Generation Status", and "Fetch Final Result" nodes Activate the workflow—chat trigger will appear in your n8n chat interface Test by messaging: "Create an upbeat pop song about road trips" 🎯 Use Cases Content Creators: YouTubers generating custom jingles for videos on the fly, streamlining production from idea to audio export Educators: Music teachers using chat prompts to create era-specific folk tunes for classroom discussions, fostering interactive learning Gift Personalization: Friends crafting anniversary R&B tracks from shared memories via quick chats, delivering emotional audio surprises Artist Brainstorming: Songwriters prototyping hip-hop beats in real-time during sessions, accelerating collaboration and iteration ⚠️ Troubleshooting Invalid JSON from AI Agent: Ensure the system prompt stresses valid JSON; test the agent standalone with a sample query Music Generation Fails (401/403): Verify Fal.ai API key has minimax-music access; check usage quotas in dashboard Status Polling Loops Indefinitely: Bump wait time to 45-60s for complex tracks; inspect fal.ai queue logs for bottlenecks Lyrics Under 600 Characters: Tweak agent prompt to enforce fuller structures like [V1][C][V2][B][C]; verify output length in executions
Synchronizing WooCommerce inventory and creating products with Google Gemini AI and BrowserAct
Synchronize WooCommerce Inventory & Create Products with Gemini AI & BrowserAct This sophisticated n8n template automates WooCommerce inventory management by scraping supplier data, updating existing products, and intelligently creating new ones with AI-formatted descriptions. This workflow is essential for e-commerce operators, dropshippers, and inventory managers who need to ensure their product pricing and stock levels are synchronized with multiple third-party suppliers, minimizing overselling and maximizing profit. --- Self-Hosted Only This Workflow uses a community contribution and is designed and tested for self-hosted n8n instances only. --- How it works The workflow is typically run by a Schedule Trigger (though a Manual Trigger is also shown) to check stock automatically. It reads a list of suppliers and their inventory page URLs from a central Google Sheet. The workflow loops through each supplier: A BrowserAct node scrapes the current stock and price data from the supplier's inventory page. A Code node parses this bulk data into individual product items. It then loops through each individual product found. The workflow checks WooCommerce to see if the product already exists based on its name. If the product exists: It proceeds to update the existing product's price and stock quantity. If the product DOES NOT exist: An If node checks if the missing product's category matches a predefined type (optional filtering). If it passes the filter, a second BrowserAct workflow scrapes detailed product attributes from a dedicated product page (e.g., DigiKey). An AI Agent (Gemini) transforms these attributes into a specific, styled HTML table for the product description. Finally, the product is created in WooCommerce with all scraped details and the AI-generated description. Error Handling: Multiple Slack nodes are configured to alert your team immediately if any scraping task fails or if the product update/creation process encounters an issue. Note: This workflow does not support image uploads for new products. To enable this functionality, you must modify both the n8n and BrowserAct workflows. --- Requirements BrowserAct API account for web scraping BrowserAct n8n Community Node -> (n8n Nodes BrowserAct) BrowserAct templates named “WooCommerce Inventory & Stock Synchronization” and “WooCommerce Product Data Reconciliation” Google Sheets credentials for the supplier list WooCommerce credentials for product management Google Gemini account for the AI Agent Slack credentials for error alerts --- Need Help? How to Find Your BrowseAct API Key & Workflow ID How to Connect n8n to Browseract How to Use & Customize BrowserAct Templates How to Use the BrowserAct N8N Community Node --- Workflow Guidance and Showcase STOP Overselling! Auto-Sync WooCommerce Inventory from ANY Supplier
Automate RSS to social media pipeline with AI, Airtable & GetLate for multiple platforms
Overview Automates your complete social media content pipeline: sources articles from Wallabag RSS, generates platform-specific posts with AI, creates contextual images, and publishes via GetLate API. Built with 63 nodes across two workflows to handle LinkedIn, Instagram, and Bluesky—with easy expansion to more platforms. Ideal for: Content marketers, solo creators, agencies, and community managers maintaining a consistent multi-platform presence with minimal manual effort. How It Works Two-Workflow Architecture: Content Aggregation Workflow Monitors Wallabag RSS feeds for tagged articles (to-share-linkedin, to-share-instagram, etc.) Extracts and converts content from HTML to Markdown Stores structured data in Airtable with platform assignment AI Generation & Publishing Workflow Scheduled trigger queries Airtable for unpublished content Routes to platform-specific sub-workflows (LinkedIn, Instagram, Bluesky) LLM generates optimized post text and image prompts based on custom brand parameters Optionally generates AI images and hosts them on Imgbb CDN Publishes via GetLate API (immediate or draft mode) Updates Airtable with publication status and metadata Key Features: Tag-based content routing using Wallabag's native system Swappable AI providers (Groq, OpenAI, Anthropic) Platform-specific optimization (tone, length, hashtags, CTAs) Modular design—duplicate sub-workflows to add new platforms in \~30 minutes Centralized Airtable tracking with 17 data points per post Set Up Steps Setup time: \~45-60 minutes for initial configuration Create accounts and get API keys (\~15 min) Wallabag (with RSS feeds enabled) GetLate (social media publishing) Airtable (create base with provided schema—see sticky notes) LLM provider (Groq, OpenAI, or Anthropic) Image service (Hugging Face, Fal.ai, or Stability AI) Imgbb (image hosting) Configure n8n credentials (\~10 min) Add all API keys in n8n's credential manager Detailed credential setup instructions in workflow sticky notes Set up Airtable database (\~10 min) Create "RSS Feed - Content Store" base Add 19 required fields (schema provided in workflow sticky notes) Get Airtable base ID and API key Customize brand prompts (\~15 min) Edit "Set Custom SMCG Prompt" node for each platform Define brand voice, tone, goals, audience, and image preferences Platform-specific examples provided in sticky notes Configure platform settings (\~10 min) Set GetLate account IDs for each platform Enable/disable image generation per platform Choose immediate publish vs. draft mode Adjust schedule trigger frequency Test and deploy Tag test articles in Wallabag Monitor the first few executions in draft mode Activate workflows when satisfied with the output Important: This is a proof-of-concept template. Test thoroughly with draft mode before production use. Detailed setup instructions, troubleshooting tips, and customization guidance are in the workflow's sticky notes. Technical Details 63 nodes: 9 Airtable operations, 8 HTTP requests, 7 code nodes, 3 LangChain LLM chains, 3 RSS triggers, 3 GetLate publishers Supports: Multiple LLM providers, multiple image generation services, unlimited platforms via modular architecture Tracking: 17 metadata fields per post, including publish status, applied parameters, character counts, hashtags, image URLs Prerequisites n8n instance (self-hosted or cloud) Accounts: Wallabag, GetLate, Airtable, LLM provider, image generation service, Imgbb Basic understanding of n8n workflows and credential configuration Time to customize prompts for your brand voice Detailed documentation, Airtable schema, prompt examples, and troubleshooting guides are in the workflow's sticky notes. Category Tags social-media-automation, ai-content-generation, rss-to-social, multi-platform-posting, getlate-api, airtable-database, langchain, workflow-automation, content-marketing