scrapeless official
Templates by scrapeless official
Automated SEO content engine with Claude AI, Scrapeless, and competitor analysis
This workflow contains community nodes that are only compatible with the self-hosted version of n8n. How it works This n8n workflow helps you build a fully automated SEO content engine using Scrapeless and AI. It’s designed for teams running international websites—such as SaaS products, e-commerce platforms, or content-driven businesses—who want to grow targeted search traffic through high-conversion content, without relying on manual research or hit-or-miss topics. The flow runs in three key phases: 🔍 Phase 1: Topic Discovery Automatically find high-potential long-tail keywords based on a seed keyword using Google Trends via Scrapeless. Each keyword is analyzed for trend strength and categorized by priority (P0–P3) with the help of an AI agent. 🧠 Phase 2: Competitor Research For each P0–P2 keyword, the flow performs a Google Search (via Deep SerpAPI) and extracts the top 3 organic results. Scrapeless then crawls each result to extract full article content in clean Markdown. This gives you a structured, comparable view of how competitors are writing about each topic. ✍️ Phase 3: AI Article Generation Using AI (OpenAI or other LLM), the workflow generates a complete SEO article draft, including: SEO title Slug Meta description Trend-based strategy summary Structured JSON-based article body with H2/H3 blocks Finally, the article is stored in Supabase (or any other supported DB), making it ready for review, API-based publishing, or further automation. Set up steps This flow requires intermediate familiarity with n8n and API key setup. Full configuration may take 30–60 minutes. ✅ Prerequisites Scrapeless account (for Google Trends and web crawling) LLM provider (e.g. OpenAI or Claude) Supabase or Google Sheets (to store keywords & article output) 🧩 Required Credentials in n8n Scrapeless API Key OpenAI (or other LLM) credentials Supabase or Google Sheets credentials --- 🔧 Setup Instructions (Simplified) Input Seed Keyword Edit the “Set Seed Keyword” node to define your niche, e.g., "project management". Google Trends via Scrapeless Use Scrapeless to retrieve “related queries” and their interest-over-time data. Trend Analysis with AI Agent AI evaluates each keyword's trend strength and assigns a priority (P0–P3). Filter & Store Keyword Data Group and sort keywords by priority, then store them in Google Sheets. Competitor Research Use Deep SerpAPI to get top 3 Google results. Crawl each using Scrapeless. AI Content Generation Feed competitor content + trend data into AI. Output a structured SEO blog article. Store Final Article Save full article JSON (title, meta, slug, content) to Supabase.
Create AI-ready vector datasets from web content with Claude, Ollama & Qdrant
AI-Powered Web Data Pipeline with n8n How It Works This n8n workflow builds an AI-powered web data pipeline that automates the entire process of: Extraction Structuring Vectorization Storage It integrates multiple advanced tools to transform messy web pages into clean, searchable vector databases. Integrated Tools Scrapeless Bypasses JavaScript-heavy websites and anti-bot protections to reliably extract HTML content. Claude AI Uses LLMs to analyze unstructured HTML and generate clean, structured JSON data. Ollama Embeddings Generates local vector embeddings from structured text using the all-minilm model. Qdrant Vector DB Stores semantic vector data for fast and meaningful search capabilities. Webhook Notifications Sends real-time updates when workflows complete or errors occur. From messy webpages to structured vector data — this pipeline is perfect for building intelligent agents, knowledge bases, or research automation tools. --- Setup Steps Install n8n > Requires Node.js v18 / v20 / v22 npm install -g n8n n8n After installation, access the n8n interface via: URL: http://localhost:5678 --- Set Up Scrapeless Register at: Scrapeless Copy your API token Paste the token into the HTTP Request node labeled "Scrapeless Web Request" --- Set Up Claude API (Anthropic) Sign up at Anthropic Console Generate your Claude API key Add the API key to the following nodes: Claude Extractor AI Data Checker Claude AI Agent --- Install and Run Ollama macOS brew install ollama Linux curl -fsSL https://ollama.com/install.sh | sh Windows Download the installer from: https://ollama.com Start Ollama Server ollama serve Pull Embedding Model ollama pull all-minilm Install Qdrant (via Docker) docker pull qdrant/qdrant docker run -d \ --name qdrant-server \ -p 6333:6333 -p 6334:6334 \ -v $(pwd)/qdrant_storage:/qdrant/storage \ qdrant/qdrant Test if Qdrant is running: curl http://localhost:6333/healthz Configure the n8n Workflow Modify the Trigger (Manual or Scheduled) Input your Target URLs and Collection Name in the designated nodes Paste all required API Tokens / Keys into their corresponding nodes Ensure your Qdrant and Ollama services are running Ideal Use Cases Custom AI Chatbots Private Search Engines Research Tools Internal Knowledge Bases Content Monitoring Pipelines
Automated job finder agent
This workflow contains community nodes that are only compatible with the self-hosted version of n8n. Brief Overview This automation template helps you track the latest job listings from the Y Combinator Jobs page. By using Scrapeless to scrape job listings, n8n to orchestrate the workflow, and Google Sheets to store the results, you can build a zero-code job tracking solution that runs automatically every 6 hours. How It Works Trigger on a Schedule: Every 6 hours, the workflow kicks off automatically. Scrape Job Listings: Scrapeless crawls the Y Combinator Jobs page and returns structured Markdown data. Extract & Parse Content: JavaScript nodes process the Markdown to extract job titles and links. Flatten Data: Each job becomes a single row with its title and link. Save to Google Sheets: New job listings are appended to your Google Sheet for easy viewing and sharing. Features No-code, automated job listing scraper. Scrapes and structures the latest Y Combinator job posts. Saves data directly to Google Sheets. Easy to schedule and run without manual effort. Extensible: Add Telegram, Slack, or email notifications easily in n8n. Requirements Scrapeless API Key: Scrapeless Service request credentials. Log in to the Scrapeless Dashboard Then click "Setting" on the left -> select "API Key Management" -> click "Create API Key". Finally, click the API Key you created to copy it. n8n Instance: Self-hosted or n8n.cloud account. Google Account: For Google Sheets API access. Target Site: This template is designed for the Y Combinator Jobs page but can be modified for other job boards. Installation Deploy n8n on your preferred platform. Import this workflow JSON file into your n8n workspace. Create and add your Scrapeless API Key in n8n’s credential manager. Connect your Google Sheets account in n8n. Update the target Google Sheet document URL and sheet name. Usage This automated job finder agent is ideal for: | Industry / Role | Use Case | |-------------------------------|--------------------------------------------------------------------------------------------| | Job Seekers | Automatically track newly posted startup jobs without manually visiting job boards. | | Recruitment Agencies | Monitor YC job postings and build a candidate-job matching system. | | Startup Founders / CTOs | Stay aware of which startups are hiring, for networking and market insights. | | Tech Media & Bloggers | Aggregate new job listings for newsletters, blogs, or social media sharing. | | HR & Talent Acquisition Teams | Monitor competitors’ hiring activity. | | Automation Enthusiasts | Example use case for learning web scraping + automation + data storage. | Output
Generate SEO-optimized blog content with Gemini, Scrapeless and Pinecone RAG
This workflow contains community nodes that are only compatible with the self-hosted version of n8n. How it works This advanced automation builds a fully autonomous SEO blog writer using n8n, Scrapeless, LLMs, and Pinecone vector database. It’s powered by a Retrieval-Augmented Generation (RAG) system that collects high-performing blog content, stores it in a vector store, and then generates new blog posts based on that knowledge—endlessly. Part 1: Build a Knowledge Base from Popular Blogs Scrape existing articles from a well-established writer (in this case, Mark Manson) using the Scrapeless node. Extract content from blog pages and store it in Pinecone, a powerful vector database that supports similarity search. Use Gemini Embedding 001 or any other supported embedding model to encode blog content into vectors. Result: You’ll have a searchable vector store of expert-level content, ready to be used for content generation and intelligent search. Part 2: SERP Analysis & AI Blog Generation Use Scrapeless' SERP node to fetch search results based on your keyword and search intent. Send the results to an LLM (like Gemini, OpenRouter, or OpenAI) to generate a keyword analysis report in Markdown → then converted to HTML. Extract long-tail keywords, search intent insights, and content angles from this report. Feed everything into another LLM with access to your Pinecone-stored knowledge base, and generate a fully SEO-optimized blog post. Set up steps Prerequisites Scrapeless API key Pinecone account and index setup An embedding model (Gemini, OpenAI, etc.) n8n instance with Community Node: n8n-nodes-scrapeless installed Credential Configuration Add your Scrapeless and Pinecone credentials in n8n under the "Credentials" tab Choose embedding dimensions according to the model you use (e.g., 768 for Gemini Embedding 001) Key Highlights Clones a real content creator: Replicates knowledge and writing style from top-performing blog authors. Auto-scrapes hundreds of blog posts without being blocked. Stores expert content in a vector DB to build a reusable knowledge base. Performs real-time SERP analysis using Scrapeless to fetch and analyze search data. Generates SEO blog drafts using RAG with detailed keyword intelligence. Output includes: blog title, HTML summary report, long-tail keywords, and AI-written article body. RAG + SEO: The Future of Content Creation This template combines: AI reasoning from large language models Reliable data scraping from Scrapeless Scalable storage via Pinecone vector DB Flexible orchestration using n8n nodes This is not just an automation—it’s a full-stack SEO content machine that enables you to: Build a domain-specific knowledge base Run intelligent keyword research Generate traffic-ready content on autopilot 💡 Use Cases SaaS content teams cloning competitor success Affiliate marketers scaling high-traffic blog production Agencies offering automated SEO content services AI researchers building personal knowledge bots Writers automating first-draft generation with real-world tone
Intelligent B2B lead generation workflow using Scrapeless and Claude
> ⚠️ Disclaimer: This workflow uses Scrapeless and Claude AI via community nodes, which require n8n self-hosted to work properly. --- 🔁 How It Works This intelligent B2B lead generation workflow combines search automation, website crawling, AI analysis, and multi-channel output: It starts by using Scrapeless’s Deep Serp API to find company websites from targeted Google Search queries. Each result is then individually crawled using Scrapeless's Crawler module, retrieving key business information from pages like /about, /contact, /services. The raw web content is processed via a Code node to clean, extract, and prepare structured data. The cleaned data is passed to Claude Sonnet (Anthropic) which analyzes and qualifies the lead based on content richness, contact data, and relevance. A filter step ensures only high-quality leads (e.g. lead score ≥ 6) are kept. Sent via Discord webhook for real-time notification (can be replaced with Slack, email, or CRM tools). > 📌 The result is a fully automated system that finds, qualifies, and organizes B2B leads with high efficiency and minimal manual input. --- ✅ Pre-Conditions Before using this workflow, make sure you have: An n8n self-hosted instance A Scrapeless account and API key (get it here) An Anthropic Claude API key A configured Discord webhook URL (or alternative notification service) ⚙️ Workflow Overview Manual Trigger → Scrapeless Google Search → Item Lists → Scrapeless Crawler → Code (Data Cleaning) → Claude Sonnet → Code (Response Parser) → Filter → Discord Notification 🔨 Step-by-Step Breakdown Manual Trigger – For testing purposes (can be replaced with Cron or Webhook) Scrapeless Google Search – Queries target B2B topics via Scrapeless’s Deep SERP API Item Lists – Splits search results into individual items Scrapeless Crawler – Visits each company domain and scrapes structured content Code Node (Data Cleaner) – Extracts and formats content for LLM input Claude Sonnet (via HTTP Request) – Evaluates lead quality, relevance, and contact info Code Node (Parser) – Parses Claude’s JSON response IF Filter – Filters leads based on score threshold Discord Webhook – Sends formatted message with company info --- 🧩 Customization Guidance You can easily adjust the workflow to match your needs: Lead Criteria: Modify the Claude prompt and scoring logic in the Code node Output Channels: Replace the Discord webhook with Slack, Email, Airtable, or any CRM node Search Topics: Change your query in the Scrapeless SERP node to find leads in different niches or countries Scoring Threshold: Adjust the filter logic (lead_score >= 6) to match your quality tolerance --- 🧪 How to Use Insert your Scrapeless and Claude API credentials in the designated nodes Replace or configure the Discord webhook (or alternative outputs) Run the workflow manually (or schedule it) View qualified leads directly in your chosen notification channel --- 📦 Output Example Each qualified lead includes: 🏢 Company Name 🌐 Website ✉️ Email(s) 📞 Phone(s) 📍 Location 📈 Lead Score 📝 Summary of relevant content --- 👥 Ideal Users This workflow is perfect for: AI SaaS companies targeting mid-market & enterprise leads Marketing agencies looking for B2B-qualified leads Automation consultants building scraping solutions No-code developers working with n8n, Make, Pipedream Sales teams needing enriched prospecting data