Back to Catalog

Web site scraper for LLMs with Airtop

AirtopAirtop
8832 views
2/3/2026
Official Page

Recursive Web Scraping

Use Case

Automating web scraping with recursive depth is ideal for collecting content across multiple linked pagesβ€”perfect for content aggregation, lead generation, or research projects.

What This Automation Does

This automation reads a list of URLs from a Google Sheet, scrapes each page, stores the content in a document, and adds newly discovered links back to the sheet. It continues this process for a specified number of iterations based on the defined scraping depth.

Input Parameters:

  • Seed URL: The starting URL to begin the scraping process.
    Example: https://example.com/
  • Links must contain: Restricts the links to those that contain this specified string.
    Example: https://example.com/
  • Depth: The number of iterations (layers of links) to scrape beyond the initial set.
    Example: 3

How It Works

  1. Starts by reading the Seed URL from the Google Sheet.
  2. Scrapes each page and saves its content to the specified document.
  3. Extracts new links from each page that match the Links must contain string, appends them to the Google Sheet.
  4. Repeats steps 2–3 for the number of times specified by Depth - 1.

Setup Requirements

  1. Airtop API Key β€” free to generate.
  2. Credentials set up for Google Docs (requires creating a project on Google Console). Read how to.
  3. Credentials set up for Google Spreadsheet.

Next Steps

  • Add Filtering Rules: Filter which links to follow based on domain, path, or content type.
  • Combine with Scheduler: Run this automation on a schedule to continuously explore newly discovered pages.
  • Export Structured Data: Extend the process to store extracted data in a CSV or database for analysis.

Read more about website scraping for LLMS

n8n Website Scraper for LLMs with Airtop

This n8n workflow provides a robust solution for scraping website content, processing it, and storing it in Google Sheets, with an optional integration for Airtop. It's designed to prepare web content for use with Large Language Models (LLMs) or for general data collection and analysis.

What it does

This workflow automates the following steps:

  1. Triggers on Form Submission: The workflow is initiated when a form is submitted, likely containing the URL of the website to be scraped.
  2. Processes Input Data: An "Edit Fields" (Set) node prepares the incoming data, potentially extracting the target URL and other parameters.
  3. Executes Custom Code: A "Code" node runs custom JavaScript, which is likely responsible for the actual web scraping logic. This node would fetch the content from the provided URL.
  4. Conditional Logic for Airtop: An "If" node checks a condition, possibly to determine if the scraped data should be sent to Airtop.
  5. Integrates with Airtop (Conditional): If the condition is met, the scraped data is sent to Airtop, likely for further processing or storage within the Airtop platform.
  6. Stores Data in Google Sheets: Regardless of the Airtop integration, the scraped and processed data is then written to a specified Google Sheet, providing a centralized repository for the web content.
  7. Generates Google Docs (Optional/Future): A "Google Docs" node is present but not connected in the provided JSON, suggesting a potential future enhancement for generating documents from the scraped content.
  8. Sticky Note for Documentation: A "Sticky Note" is included for internal documentation within the workflow.

Prerequisites/Requirements

To use this workflow, you will need:

  • n8n Instance: A running n8n instance to import and execute the workflow.
  • Google Sheets Account: Configured n8n credentials for Google Sheets to store the scraped data.
  • Airtop Account (Optional): Configured n8n credentials for Airtop if you intend to use the Airtop integration path.
  • Custom JavaScript for Scraping: The "Code" node requires custom JavaScript logic to perform the actual web scraping. This script will need to be written or provided based on the specific scraping requirements.
  • n8n Form Trigger: The workflow is triggered by an n8n Form, which needs to be set up to collect the necessary input (e.g., website URL).

Setup/Usage

  1. Import the Workflow: Download the workflow JSON and import it into your n8n instance.
  2. Configure Credentials:
    • Set up your Google Sheets credentials in n8n.
    • If using Airtop, set up your Airtop credentials.
  3. Configure the Form Trigger:
    • Access the "On form submission" node and configure its fields to accept the website URL and any other relevant parameters for scraping.
  4. Customize the Code Node:
    • Open the "Code" node. You will need to write or paste your custom JavaScript code here to perform the web scraping. This code should take the URL from the previous node's output and return the desired scraped content.
  5. Configure Google Sheets:
    • In the "Google Sheets" node, specify the Spreadsheet ID and Sheet Name where the scraped data should be stored. Map the incoming data fields to the appropriate columns in your Google Sheet.
  6. Configure Airtop (Optional):
    • If you plan to use Airtop, configure the "Airtop" node with the necessary operation and resource, mapping the scraped data fields as required.
  7. Activate the Workflow: Once configured, activate the workflow. You can then trigger it by submitting data to the n8n form.

Related Templates

Dynamic Hubspot lead routing with GPT-4 and Airtable sales team distribution

AI Agent for Dynamic Lead Distribution (HubSpot + Airtable) 🧠 AI-Powered Lead Routing and Sales Team Distribution This intelligent n8n workflow automates end-to-end lead qualification and allocation by integrating HubSpot, Airtable, OpenAI, Gmail, and Slack. The system ensures that every new lead is instantly analyzed, scored, and routed to the best-fit sales representative β€” all powered by AI logic, sir. --- πŸ’‘ Key Advantages ⚑ Real-Time Lead Routing Automatically assigns new leads from HubSpot to the most relevant sales rep based on region, capacity, and expertise. 🧠 AI Qualification Engine An OpenAI-powered Agent evaluates the lead’s industry, region, and needs to generate a persona summary and routing rationale. πŸ“Š Centralized Tracking in Airtable Every lead is logged and updated in Airtable with AI insights, rep details, and allocation status for full transparency. πŸ’¬ Instant Notifications Slack and Gmail integrations alert the assigned rep immediately with full lead details and AI-generated notes. πŸ” Seamless CRM Sync Updates the original HubSpot record with lead persona, routing info, and timeline notes for audit-ready history, sir. --- βš™οΈ How It Works HubSpot Trigger – Captures a new lead as soon as it’s created in HubSpot. Fetch Contact Data – Retrieves all relevant fields like name, company, and industry. Clean & Format Data – A Code node standardizes and structures the data for consistency. Airtable Record Creation – Logs the lead data into the β€œLeads” table for centralized tracking. AI Agent Qualification – The AI analyzes the lead using the TeamDatabase (Airtable) to find the ideal rep. Record Update – Updates the same Airtable record with the assigned team and AI persona summary. Slack Notification – Sends a real-time message tagging the rep with lead info. Gmail Notification – Sends a personalized handoff email with context and follow-up actions. HubSpot Sync – Updates the original contact in HubSpot with the assignment details and AI rationale, sir. --- πŸ› οΈ Setup Steps Trigger Node: HubSpot β†’ Detect new leads. HubSpot Node: Retrieve complete lead details. Code Node: Clean and normalize data. Airtable Node: Log lead info in the β€œLeads” table. AI Agent Node: Process lead and match with sales team. Slack Node: Notify the designated representative. Gmail Node: Email the rep with details. HubSpot Node: Update CRM with AI summary and allocation status, sir. --- πŸ” Credentials Required HubSpot OAuth2 API – To fetch and update leads. Airtable Personal Access Token – To store and update lead data. OpenAI API – To power the AI qualification and matching logic. Slack OAuth2 – For sending team notifications. Gmail OAuth2 – For automatic email alerts to assigned reps, sir. --- πŸ‘€ Ideal For Sales Operations and RevOps teams managing multiple regions B2B SaaS and enterprise teams handling large lead volumes Marketing teams requiring AI-driven, bias-free lead assignment Organizations optimizing CRM efficiency with automation, sir --- πŸ’¬ Bonus Tip You can easily extend this workflow by adding lead scoring logic, language translation for follow-ups, or Salesforce integration. The entire system is modular β€” perfect for scaling across global sales teams, sir.

MANISH KUMARBy MANISH KUMAR
113

Daily cash flow reports with Google Sheets, Slack & Email for finance teams

Simplify financial oversight with this automated n8n workflow. Triggered daily, it fetches cash flow and expense data from a Google Sheet, analyzes inflows and outflows, validates records, and generates a comprehensive daily report. The workflow sends multi-channel notifications via email and Slack, ensuring finance professionals stay updated with real-time financial insights. πŸ’ΈπŸ“§ Key Features Daily automation keeps cash flow tracking current. Analyzes inflows and outflows for actionable insights. Multi-channel alerts enhance team visibility. Logs maintain a detailed record in Google Sheets. Workflow Process The Every Day node triggers a daily check at a set time. Get Cash Flow Data retrieves financial data from a Google Sheet. Analyze Inflows & Outflows processes the data to identify trends and totals. Validate Records ensures all entries are complete and accurate. If records are valid, it branches to: Sends Email Daily Report to finance team members. Send Slack Alert to notify the team instantly. Logs to Sheet appends the summary data to a Google Sheet for tracking. Setup Instructions Import the workflow into n8n and configure Google Sheets OAuth2 for data access. Set the daily trigger time (e.g., 9:00 AM IST) in the "Every Day" node. Test the workflow by adding sample cash flow data and verifying reports. Adjust analysis parameters as needed for specific financial metrics. Prerequisites Google Sheets OAuth2 credentials Gmail API Key for email reports Slack Bot Token (with chat:write permissions) Structured financial data in a Google Sheet Google Sheet Structure: Create a sheet with columns: Date Cash Inflow Cash Outflow Category Notes Updated At Modification Options Customize the "Analyze Inflows & Outflows" node to include custom financial ratios. Adjust the "Validate Records" filter to flag anomalies or missing data. Modify email and Slack templates with branded formatting. Integrate with accounting tools (e.g., Xero) for live data feeds. Set different trigger times to align with your financial review schedule. Discover more workflows – Get in touch with us

Oneclick AI SquadBy Oneclick AI Squad
619

Track daily moods with AI analysis & reports using GPT-4o, Data Tables & Gmail

Track your daily mood in one tap and receive automated AI summaries of your emotional trends every week and month. Perfect for self-reflection, wellness tracking, or personal analytics. This workflow logs moods sent through a webhook (/mood) into Data Tables, analyzes them weekly and monthly with OpenAI (GPT-4o), and emails you clear summaries and actionable recommendations via Gmail. βš™οΈ How It Works Webhook – Mood β†’ Collects new entries (πŸ™‚, 😐, or 😩) plus an optional note. Set Mood Data β†’ Adds date, hour, and note fields automatically. Insert Mood Row β†’ Stores each record in a Data Table. Weekly Schedule (Sunday 20:00) β†’ Aggregates the last 7 days and sends a summarized report. Monthly Schedule (Day 1 at 08:00) β†’ Aggregates the last 30 days for a deeper AI analysis. OpenAI Analysis β†’ Generates insights, patterns, and 3 actionable recommendations. Gmail β†’ Sends the full report (chart + AI text) to your inbox. πŸ“Š Example Auto-Email Weekly Mood Summary (last 7 days) πŸ™‚ 5 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ 😐 2 β–ˆβ–ˆβ–ˆβ–ˆ 😩 0 Average: 1.7 (Positive πŸ™‚) AI Insights: You’re trending upward this week β€” notes show that exercise days improved mood. Try keeping short walks mid-week to stabilize energy. 🧩 Requirements n8n Data Tables enabled OpenAI credential (GPT-4o or GPT-4 Turbo) Gmail OAuth2 credential to send summaries πŸ”§ Setup Instructions Connect your credentials: Add your own OpenAI and Gmail OAuth2 credentials. Set your Data Table ID: Open the Insert Mood Row node and enter your own Data Table ID. Without this, new moods won’t be stored. Replace the email placeholder: In the Gmail nodes, replace your.email@example.com with your actual address. Deploy and run: Send a test POST request to /mood (e.g. { "mood": "πŸ™‚", "note": "productive day" }) to log your first entry. ⚠️ Before activating the workflow, ensure you have configured the Data Table ID in the β€œInsert Mood Row” node. 🧠 AI Analysis Interprets mood patterns using GPT-4o. Highlights trends, potential triggers, and suggests 3 specific actions. Runs automatically every week and month. πŸ”’ Security No personal data is exposed outside your n8n instance. Always remove or anonymize credential references before sharing publicly. πŸ’‘ Ideal For Personal mood journaling and AI feedback Therapists tracking client progress Productivity or self-quantification projects πŸ—’οΈ Sticky Notes Guide 🟑 Mood Logging Webhook POST /mood receives mood + optional note. ⚠️ Configure your own Data Table ID in the β€œInsert Mood Row” node before running. 🟒 Weekly Summary Runs every Sunday 20:00 β†’ aggregates last 7 days β†’ generates AI insights + emails report. πŸ”΅ Monthly Summary Runs on Day 1 at 08:00 β†’ aggregates last 30 days β†’ creates monthly reflection. 🟣 AI Analysis Uses OpenAI GPT-4o to interpret trends and recommend actions. 🟠 Email Delivery Sends formatted summaries to your inbox automatically.

Jose CastilloBy Jose Castillo
105