Scrape any web page into structured JSON data with ScrapeNinja and AI
Disclaimer: This template only works on self-hosted for now, as it uses a community node.
Use Case
Web scrapers often break due to web page layout changes. This workflow attempts to mitigate this problem by auto-generating web scraping data extractor code via LLM.
How It Works
This workflow leverages ScrapeNinja n8n community node to:
- scrape webpage HTML,
- feed it into LLM (Google Gemini) and ask to write a JS extractor function code, then it
- executes the written JS extractor against scraped HTML to extract useful data from webpage (the code is safely executed in a sandbox)
Installation
To install ScrapeNinja n8n node, in your self-hosted instance, go to Settings -> Community nodes, enter "n8n-nodes-scrapeninja", and install.
Make sure you are using at least v0.3.0.
See this in action: https://www.linkedin.com/feed/update/urn:li:activity:7289659870935490560/
n8n Basic LLM Chain with Google Gemini
This n8n workflow demonstrates a simple integration of a Large Language Model (LLM) chain using LangChain with Google Gemini. It provides a basic, manual trigger to initiate a conversation or task with the AI model.
What it does
This workflow performs the following steps:
- Manual Trigger: Initiates the workflow when you manually click "Execute workflow" in the n8n editor.
- Basic LLM Chain: Sets up a foundational LangChain LLM chain, which acts as the orchestrator for interacting with the language model.
- Google Gemini Chat Model: Connects to the Google Gemini chat model, allowing the LLM chain to send prompts and receive responses from Gemini.
Prerequisites/Requirements
To use this workflow, you will need:
- n8n Instance: An active n8n instance.
- Google Gemini API Key: An API key for the Google Gemini model, configured as a credential in your n8n instance.
Setup/Usage
- Import the workflow:
- Copy the JSON content of this workflow.
- In your n8n instance, go to "Workflows" and click "New".
- Click the three dots menu (
...) in the top right and select "Import from JSON". - Paste the JSON and click "Import".
- Configure Credentials:
- Locate the "Google Gemini Chat Model" node.
- Ensure you have a Google Gemini API credential configured in n8n. If not, create one by going to "Credentials" in n8n, clicking "New Credential", searching for "Google Gemini API", and entering your API key.
- Select your Google Gemini credential in the "Google Gemini Chat Model" node's settings.
- Configure the LLM Chain (Optional):
- Open the "Basic LLM Chain" node.
- You can customize the prompt or input for the LLM here to define what task you want Gemini to perform.
- Execute the workflow:
- Click the "Execute Workflow" button in the top right corner of the n8n editor to run the workflow manually.
- Observe the output from the "Google Gemini Chat Model" node to see the AI's response.
Related Templates
Track competitor SEO keywords with Decodo + GPT-4.1-mini + Google Sheets
This workflow automates competitor keyword research using OpenAI LLM and Decodo for intelligent web scraping. Who this is for SEO specialists, content strategists, and growth marketers who want to automate keyword research and competitive intelligence. Marketing analysts managing multiple clients or websites who need consistent SEO tracking without manual data pulls. Agencies or automation engineers using Google Sheets as an SEO data dashboard for keyword monitoring and reporting. What problem this workflow solves Tracking competitor keywords manually is slow and inconsistent. Most SEO tools provide limited API access or lack contextual keyword analysis. This workflow solves that by: Automatically scraping any competitor’s webpage with Decodo. Using OpenAI GPT-4.1-mini to interpret keyword intent, density, and semantic focus. Storing structured keyword insights directly in Google Sheets for ongoing tracking and trend analysis. What this workflow does Trigger — Manually start the workflow or schedule it to run periodically. Input Setup — Define the website URL and target country (e.g., https://dev.to, france). Data Scraping (Decodo) — Fetch competitor web content and metadata. Keyword Analysis (OpenAI GPT-4.1-mini) Extract primary and secondary keywords. Identify focus topics and semantic entities. Generate a keyword density summary and SEO strength score. Recommend optimization and internal linking opportunities. Data Structuring — Clean and convert GPT output into JSON format. Data Storage (Google Sheets) — Append structured keyword data to a Google Sheet for long-term tracking. Setup Prerequisites If you are new to Decode, please signup on this link visit.decodo.com n8n account with workflow editor access Decodo API credentials OpenAI API key Google Sheets account connected via OAuth2 Make sure to install the Decodo Community node. Create a Google Sheet Add columns for: primarykeywords, seostrengthscore, keyworddensity_summary, etc. Share with your n8n Google account. Connect Credentials Add credentials for: Decodo API credentials - You need to register, login and obtain the Basic Authentication Token via Decodo Dashboard OpenAI API (for GPT-4o-mini) Google Sheets OAuth2 Configure Input Fields Edit the “Set Input Fields” node to set your target site and region. Run the Workflow Click Execute Workflow in n8n. View structured results in your connected Google Sheet. How to customize this workflow Track Multiple Competitors → Use a Google Sheet or CSV list of URLs; loop through them using the Split In Batches node. Add Language Detection → Add a Gemini or GPT node before keyword analysis to detect content language and adjust prompts. Enhance the SEO Report → Expand the GPT prompt to include backlink insights, metadata optimization, or readability checks. Integrate Visualization → Connect your Google Sheet to Looker Studio for SEO performance dashboards. Schedule Auto-Runs → Use the Cron Node to run weekly or monthly for competitor keyword refreshes. Summary This workflow automates competitor keyword research using: Decodo for intelligent web scraping OpenAI GPT-4.1-mini for keyword and SEO analysis Google Sheets for live tracking and reporting It’s a complete AI-powered SEO intelligence pipeline ideal for teams that want actionable insights on keyword gaps, optimization opportunities, and content focus trends, without relying on expensive SEO SaaS tools.
Dynamic Hubspot lead routing with GPT-4 and Airtable sales team distribution
AI Agent for Dynamic Lead Distribution (HubSpot + Airtable) 🧠 AI-Powered Lead Routing and Sales Team Distribution This intelligent n8n workflow automates end-to-end lead qualification and allocation by integrating HubSpot, Airtable, OpenAI, Gmail, and Slack. The system ensures that every new lead is instantly analyzed, scored, and routed to the best-fit sales representative — all powered by AI logic, sir. --- 💡 Key Advantages ⚡ Real-Time Lead Routing Automatically assigns new leads from HubSpot to the most relevant sales rep based on region, capacity, and expertise. 🧠 AI Qualification Engine An OpenAI-powered Agent evaluates the lead’s industry, region, and needs to generate a persona summary and routing rationale. 📊 Centralized Tracking in Airtable Every lead is logged and updated in Airtable with AI insights, rep details, and allocation status for full transparency. 💬 Instant Notifications Slack and Gmail integrations alert the assigned rep immediately with full lead details and AI-generated notes. 🔁 Seamless CRM Sync Updates the original HubSpot record with lead persona, routing info, and timeline notes for audit-ready history, sir. --- ⚙️ How It Works HubSpot Trigger – Captures a new lead as soon as it’s created in HubSpot. Fetch Contact Data – Retrieves all relevant fields like name, company, and industry. Clean & Format Data – A Code node standardizes and structures the data for consistency. Airtable Record Creation – Logs the lead data into the “Leads” table for centralized tracking. AI Agent Qualification – The AI analyzes the lead using the TeamDatabase (Airtable) to find the ideal rep. Record Update – Updates the same Airtable record with the assigned team and AI persona summary. Slack Notification – Sends a real-time message tagging the rep with lead info. Gmail Notification – Sends a personalized handoff email with context and follow-up actions. HubSpot Sync – Updates the original contact in HubSpot with the assignment details and AI rationale, sir. --- 🛠️ Setup Steps Trigger Node: HubSpot → Detect new leads. HubSpot Node: Retrieve complete lead details. Code Node: Clean and normalize data. Airtable Node: Log lead info in the “Leads” table. AI Agent Node: Process lead and match with sales team. Slack Node: Notify the designated representative. Gmail Node: Email the rep with details. HubSpot Node: Update CRM with AI summary and allocation status, sir. --- 🔐 Credentials Required HubSpot OAuth2 API – To fetch and update leads. Airtable Personal Access Token – To store and update lead data. OpenAI API – To power the AI qualification and matching logic. Slack OAuth2 – For sending team notifications. Gmail OAuth2 – For automatic email alerts to assigned reps, sir. --- 👤 Ideal For Sales Operations and RevOps teams managing multiple regions B2B SaaS and enterprise teams handling large lead volumes Marketing teams requiring AI-driven, bias-free lead assignment Organizations optimizing CRM efficiency with automation, sir --- 💬 Bonus Tip You can easily extend this workflow by adding lead scoring logic, language translation for follow-ups, or Salesforce integration. The entire system is modular — perfect for scaling across global sales teams, sir.
Create personalized email outreach with AI, Telegram bot & website scraping
Demo Personalized Email This n8n workflow is built for AI and automation agencies to promote their workflows through an interactive demo that prospects can try themselves. The featured system is a deep personalized email demo. --- 🔄 How It Works Prospect Interaction A prospect starts the demo via Telegram. The Telegram bot (created with BotFather) connects directly to your n8n instance. Demo Guidance The RAG agent and instructor guide the user step-by-step through the demo. Instructions and responses are dynamically generated based on user input. Workflow Execution When the user triggers an action (e.g., testing the email demo), n8n runs the workflow. The workflow collects website data using Crawl4AI or standard HTTP requests. Email Demo The system personalizes and sends a demo email through SparkPost, showing the automation’s capability. Logging and Control Each user interaction is logged in your database using their name and id. The workflow checks limits to prevent misuse or spam. Error Handling If a low-CPU scraping method fails, the workflow automatically escalates to a higher-CPU method. ⚙️ Requirements Before setting up, make sure you have the following: n8n — Automation platform to run the workflow Docker — Required to run Crawl4AI Crawl4AI — For intelligent website crawling Telegram Account — To create your Telegram bot via BotFather SparkPost Account — To send personalized demo emails A database (e.g., PostgreSQL, MySQL, or SQLite) — To store log data such as user name and ID 🚀 Features Telegram interface using the BotFather API Instructor and RAG agent to guide prospects through the demo Flow generation limits per user ID to prevent abuse Low-cost yet powerful web scraping, escalating from low- to high-CPU flows if earlier ones fail --- 💡 Development Ideas Replace the RAG logic with your own query-answering and guidance method Remove the flow limit if you’re confident the demo can’t be misused Swap the personalized email demo with any other workflow you want to showcase --- 🧠 Technical Notes Telegram bot created with BotFather Website crawl process: Extract sub-links via /sitemap.xml, sitemap_index.xml, or standard HTTP requests Fall back to Crawl4AI if normal requests fail Fetch sub-link content via HTTPS or Crawl4AI as backup SparkPost used for sending demo emails --- ⚙️ Setup Instructions Create a Telegram Bot Use BotFather on Telegram to create your bot and get the API token. This token will be used to connect your n8n workflow to Telegram. Create a Log Data Table In your database, create a table to store user logs. The table must include at least the following columns: name — to store the user’s name or Telegram username. id — to store the user’s unique identifier. Install Crawl4AI with Docker Follow the installation guide from the official repository: 👉 https://github.com/unclecode/crawl4ai Crawl4AI will handle website crawling and content extraction in your workflow. --- 📦 Notes This setup is optimized for low cost, easy scalability, and real-time interaction with prospects. You can customize each component — Telegram bot behavior, RAG logic, scraping strategy, and email workflow — to fit your agency’s demo needs. 👉 You can try the live demo here: @emaildemobot ---