🌐 Firecrawl website content extractor

677 views

2/3/2026

Google Sheets Data Transformation Data Sync HTTP Request API

🌐 Firecrawl Website Content Extractor (n8n Workflow)

This n8n automation workflow uses Firecrawl API to extract structured data (e.g., quotes and authors) from web pages — such as Quotes to Scrape — and handles retries in case of delayed extraction.

🔁 Workflow Overview

🎯 Purpose:

Crawl and extract structured web data using Firecrawl
Wait for asynchronous scraping to complete
Retrieve and validate results
Support retries if content is not ready

🔧 Step-by-Step Node Breakdown

1. 🧪 Manual Trigger

Node: When clicking ‘Test workflow’
Used to manually test or execute the workflow during setup or debugging.

2. 📤 Firecrawl Extract API Request

Node: Extract
Sends a POST request to https://api.firecrawl.dev/v1/extract
Payload includes:
- urls: List of pages to crawl (https://quotes.toscrape.com/*)
- prompt: "Extract all quotes and their corresponding authors from the website."
- schema: JSON schema defining expected structure (quotes[], each with text and author)

> 📌 Uses an HTTP Header Auth credential for Firecrawl API

3. ⏱️ Wait for 30 Seconds

Node: 30 Secs
Gives Firecrawl time to finish processing in the background
Prevents hitting the API before results are ready

4. 📥 Get Results

Node: Get Results
Performs a GET request to the status URL using {{ $('Extract').item.json.id }} to retrieve extraction results.

5. ✅❌ Condition Check

Node: If
Checks if the data array is empty (i.e., no results yet)
If data is empty:
- Waits 10 more seconds and retries
If data is available:
- Passes data to the next step (e.g., processing or storage)

6. 🔁 Retry Delay

Node: 10 Seconds
Waits briefly before sending another GET request to Firecrawl

7. 🛠️ Edit Fields (Optional Output Formatting)

Node: Edit Fields
Placeholder to structure or format the extracted results (quotes and authors)

🧾 Sticky Note: Firecrawl Setup Guide

Included as an embedded reference:

🔗 10% Firecrawl Discount
🧰 Instructions to:
- Add Firecrawl API credentials in n8n
- Use Firecrawl Community Node for self-hosted instances
- Set up the schema and prompt for targeted data extraction

✅ Key Features

🔌 API-based crawling with schema-structured output
⏱️ Smart waiting + retry mechanism
🧠 AI prompt integration for intelligent data parsing
⚙️ Flexible for different URLs, prompts, and schemas

📦 Sample Output Schema

{
  "quotes": [
    {
      "text": "The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.",
      "author": "Albert Einstein"
    },
    {
      "text": "It is our choices, Harry, that show what we truly are, far more than our abilities.",
      "author": "J.K. Rowling"
    }
  ]
}

n8n Firecrawl Website Content Extractor

This n8n workflow provides a basic structure for interacting with an external API, potentially for extracting website content, and then conditionally processing the response. While the workflow itself is a starting point, it demonstrates how to make an HTTP request and use an If node for conditional logic.

What it does

This workflow, when manually triggered, performs the following steps:

Starts Manually: The workflow is initiated by a manual trigger.
Makes an HTTP Request: It executes an HTTP request to an external API. The specifics of this request (URL, method, headers, body) are not defined in the provided JSON, but this is where the interaction with the Firecrawl API (or any other web content extraction service) would be configured.
Applies Conditional Logic: It then passes the response from the HTTP request to an If node. This node is designed to evaluate a condition and route the workflow based on whether the condition is true or false. The specific condition is not defined in the JSON.
Sets Fields (Edit Fields): If the condition in the If node evaluates to true, the workflow proceeds to an Edit Fields (Set) node. This node can be used to add, remove, or modify fields in the data, preparing it for subsequent steps.
Waits (Wait): If the condition in the If node evaluates to false, the workflow proceeds to a Wait node, which pauses the execution for a configurable duration.

Prerequisites/Requirements

n8n Instance: An active n8n instance to import and run the workflow.
API Endpoint: The URL and any necessary authentication/parameters for the external API you intend to call (e.g., Firecrawl API).

Setup/Usage

Import the Workflow: Import the provided JSON into your n8n instance.
Configure HTTP Request:
- Locate the "HTTP Request" node.
- Configure the URL, Method (e.g., GET, POST), Headers (e.g., API Key), and Body (if applicable) to interact with your desired API (e.g., Firecrawl).
Configure If Node:
- Locate the "If" node.
- Define the condition(s) you want to evaluate based on the output of the "HTTP Request" node. For example, you might check for a specific status code, a value in the response body, or the presence of certain data.
Configure Edit Fields (Set) Node:
- If the "If" node's true branch is used, configure the "Edit Fields (Set)" node to transform the data as needed.
Configure Wait Node:
- If the "If" node's false branch is used, configure the "Wait" node with the desired delay duration.
Execute the Workflow: Click the "Execute Workflow" button in the n8n editor to run the workflow manually. You can also activate the workflow to run on a schedule or via a webhook if you replace the "Manual Trigger" with a suitable trigger node.

Related Templates

AI-powered code review with linting, red-marked corrections in Google Sheets & Slack

Advanced Code Review Automation (AI + Lint + Slack) Who’s it for For software engineers, QA teams, and tech leads who want to automate intelligent code reviews with both AI-driven suggestions and rule-based linting — all managed in Google Sheets with instant Slack summaries. How it works This workflow performs a two-layer review system: Lint Check: Runs a lightweight static analysis to find common issues (e.g., use of var, console.log, unbalanced braces). AI Review: Sends valid code to Gemini AI, which provides human-like review feedback with severity classification (Critical, Major, Minor) and visual highlights (red/orange tags). Formatter: Combines lint and AI results, calculating an overall score (0–10). Aggregator: Summarizes results for quick comparison. Google Sheets Writer: Appends results to your review log. Slack Notification: Posts a concise summary (e.g., number of issues and average score) to your team’s channel. How to set up Connect Google Sheets and Slack credentials in n8n. Replace placeholders (<YOURSPREADSHEETID>, <YOURSHEETGIDORNAME>, <YOURSLACKCHANNEL_ID>). Adjust the AI review prompt or lint rules as needed. Activate the workflow — reviews will start automatically whenever new code is added to the sheet. Requirements Google Sheets and Slack integrations enabled A configured AI node (Gemini, OpenAI, or compatible) Proper permissions to write to your target Google Sheet How to customize Add more linting rules (naming conventions, spacing, forbidden APIs) Extend the AI prompt for project-specific guidelines Customize the Slack message formatting Export analytics to a dashboard (e.g., Notion or Data Studio) Why it’s valuable This workflow brings realistic, team-oriented AI-assisted code review to n8n — combining the speed of automated linting with the nuance of human-style feedback. It saves time, improves code quality, and keeps your team’s review history transparent and centralized.

By higashiyama

Automate RSS to social media pipeline with AI, Airtable & GetLate for multiple platforms

Overview Automates your complete social media content pipeline: sources articles from Wallabag RSS, generates platform-specific posts with AI, creates contextual images, and publishes via GetLate API. Built with 63 nodes across two workflows to handle LinkedIn, Instagram, and Bluesky—with easy expansion to more platforms. Ideal for: Content marketers, solo creators, agencies, and community managers maintaining a consistent multi-platform presence with minimal manual effort. How It Works Two-Workflow Architecture: Content Aggregation Workflow Monitors Wallabag RSS feeds for tagged articles (to-share-linkedin, to-share-instagram, etc.) Extracts and converts content from HTML to Markdown Stores structured data in Airtable with platform assignment AI Generation & Publishing Workflow Scheduled trigger queries Airtable for unpublished content Routes to platform-specific sub-workflows (LinkedIn, Instagram, Bluesky) LLM generates optimized post text and image prompts based on custom brand parameters Optionally generates AI images and hosts them on Imgbb CDN Publishes via GetLate API (immediate or draft mode) Updates Airtable with publication status and metadata Key Features: Tag-based content routing using Wallabag's native system Swappable AI providers (Groq, OpenAI, Anthropic) Platform-specific optimization (tone, length, hashtags, CTAs) Modular design—duplicate sub-workflows to add new platforms in \~30 minutes Centralized Airtable tracking with 17 data points per post Set Up Steps Setup time: \~45-60 minutes for initial configuration Create accounts and get API keys (\~15 min) Wallabag (with RSS feeds enabled) GetLate (social media publishing) Airtable (create base with provided schema—see sticky notes) LLM provider (Groq, OpenAI, or Anthropic) Image service (Hugging Face, Fal.ai, or Stability AI) Imgbb (image hosting) Configure n8n credentials (\~10 min) Add all API keys in n8n's credential manager Detailed credential setup instructions in workflow sticky notes Set up Airtable database (\~10 min) Create "RSS Feed - Content Store" base Add 19 required fields (schema provided in workflow sticky notes) Get Airtable base ID and API key Customize brand prompts (\~15 min) Edit "Set Custom SMCG Prompt" node for each platform Define brand voice, tone, goals, audience, and image preferences Platform-specific examples provided in sticky notes Configure platform settings (\~10 min) Set GetLate account IDs for each platform Enable/disable image generation per platform Choose immediate publish vs. draft mode Adjust schedule trigger frequency Test and deploy Tag test articles in Wallabag Monitor the first few executions in draft mode Activate workflows when satisfied with the output Important: This is a proof-of-concept template. Test thoroughly with draft mode before production use. Detailed setup instructions, troubleshooting tips, and customization guidance are in the workflow's sticky notes. Technical Details 63 nodes: 9 Airtable operations, 8 HTTP requests, 7 code nodes, 3 LangChain LLM chains, 3 RSS triggers, 3 GetLate publishers Supports: Multiple LLM providers, multiple image generation services, unlimited platforms via modular architecture Tracking: 17 metadata fields per post, including publish status, applied parameters, character counts, hashtags, image URLs Prerequisites n8n instance (self-hosted or cloud) Accounts: Wallabag, GetLate, Airtable, LLM provider, image generation service, Imgbb Basic understanding of n8n workflows and credential configuration Time to customize prompts for your brand voice Detailed documentation, Airtable schema, prompt examples, and troubleshooting guides are in the workflow's sticky notes. Category Tags social-media-automation, ai-content-generation, rss-to-social, multi-platform-posting, getlate-api, airtable-database, langchain, workflow-automation, content-marketing

By Mikal Hayden-Gates

188

Ai website scraper & company intelligence

AI Website Scraper & Company Intelligence Description This workflow automates the process of transforming any website URL into a structured, intelligent company profile. It's triggered by a form, allowing a user to submit a website and choose between a "basic" or "deep" scrape. The workflow extracts key information (mission, services, contacts, SEO keywords), stores it in a structured Supabase database, and archives a full JSON backup to Google Drive. It also features a secondary AI agent that automatically finds and saves competitors for each company, building a rich, interconnected database of company intelligence. --- Quick Implementation Steps Import the Workflow: Import the provided JSON file into your n8n instance. Install Custom Community Node: You must install the community node from: https://www.npmjs.com/package/n8n-nodes-crawl-and-scrape FIRECRAWL N8N Documentation https://docs.firecrawl.dev/developer-guides/workflow-automation/n8n Install Additional Nodes: n8n-nodes-crawl-and-scrape and n8n-nodes-mcp fire crawl mcp . Set up Credentials: Create credentials in n8n for FIRE CRAWL API,Supabase, Mistral AI, and Google Drive. Configure API Key (CRITICAL): Open the Web Search tool node. Go to Parameters → Headers and replace the hardcoded Tavily AI API key with your own. Configure Supabase Nodes: Assign your Supabase credential to all Supabase nodes. Ensure table names (e.g., companies, competitors) match your schema. Configure Google Drive Nodes: Assign your Google Drive credential to the Google Drive2 and save to Google Drive1 nodes. Select the correct Folder ID. Activate Workflow: Turn on the workflow and open the Webhook URL in the “On form submission” node to access the form. --- What It Does Form Trigger Captures user input: “Website URL” and “Scraping Type” (basic or deep). Scraping Router A Switch node routes the flow: Deep Scraping → AI-based MCP Firecrawler agent. Basic Scraping → Crawlee node. Deep Scraping (Firecrawl AI Agent) Uses Firecrawl and Tavily Web Search. Extracts a detailed JSON profile: mission, services, contacts, SEO keywords, etc. Basic Scraping (Crawlee) Uses Crawl and Scrape node to collect raw text. A Mistral-based AI extractor structures the data into JSON. Data Storage Stores structured data in Supabase tables (companies, company_basicprofiles). Archives a full JSON backup to Google Drive. Automated Competitor Analysis Runs after a deep scrape. Uses Tavily web search to find competitors (e.g., from Crunchbase). Saves competitor data to Supabase, linked by company_id. --- Who's It For Sales & Marketing Teams: Enrich leads with deep company info. Market Researchers: Build structured, searchable company databases. B2B Data Providers: Automate company intelligence collection. Developers: Use as a base for RAG or enrichment pipelines. --- Requirements n8n instance (self-hosted or cloud) Supabase Account: With tables like companies, competitors, social_links, etc. Mistral AI API Key Google Drive Credentials Tavily AI API Key (Optional) Custom Nodes: n8n-nodes-crawl-and-scrape --- How It Works Flow Summary Form Trigger: Captures “Website URL” and “Scraping Type”. Switch Node: deep → MCP Firecrawler (AI Agent). basic → Crawl and Scrape node. Scraping & Extraction: Deep path: Firecrawler → JSON structure. Basic path: Crawlee → Mistral extractor → JSON. Storage: Save JSON to Supabase. Archive in Google Drive. Competitor Analysis (Deep Only): Finds competitors via Tavily. Saves to Supabase competitors table. End: Finishes with a No Operation node. --- How To Set Up Import workflow JSON. Install community nodes (especially n8n-nodes-crawl-and-scrape from npm). Configure credentials (Supabase, Mistral AI, Google Drive). Add your Tavily API key. Connect Supabase and Drive nodes properly. Fix disconnected “basic” path if needed. Activate workflow. Test via the webhook form URL. --- How To Customize Change LLMs: Swap Mistral for OpenAI or Claude. Edit Scraper Prompts: Modify system prompts in AI agent nodes. Change Extraction Schema: Update JSON Schema in extractor nodes. Fix Relational Tables: Add Items node before Supabase inserts for arrays (social links, keywords). Enhance Automation: Add email/slack notifications, or replace form trigger with a Google Sheets trigger. --- Add-ons Automated Trigger: Run on new sheet rows. Notifications: Email or Slack alerts after completion. RAG Integration: Use the Supabase database as a chatbot knowledge source. --- Use Case Examples Sales Lead Enrichment: Instantly get company + competitor data from a URL. Market Research: Collect and compare companies in a niche. B2B Database Creation: Build a proprietary company dataset. --- WORKFLOW IMAGE --- Troubleshooting Guide | Issue | Possible Cause | Solution | |-------|----------------|-----------| | Form Trigger 404 | Workflow not active | Activate the workflow | | Web Search Tool fails | Missing Tavily API key | Replace the placeholder key | | FIRECRAWLER / find competitor fails | Missing MCP node | Install n8n-nodes-mcp | | Basic scrape does nothing | Switch node path disconnected | Reconnect “basic” output | | Supabase node error | Wrong table/column names | Match schema exactly | --- Need Help or More Workflows? Want to customize this workflow for your business or integrate it with your existing tools? Our team at Digital Biz Tech can tailor it precisely to your use case from automation logic to AI-powered enhancements. Contact: shilpa.raju@digitalbiz.tech For more such offerings, visit us: https://www.digitalbiz.tech ---

By DIGITAL BIZ TECH

923