Extract website intelligence & classify ecommerce URLs with Gemini & Firecrawl to Google Sheets
Description
This n8n template automates website analysis and ecommerce URL classification using AI. It scrapes a website, extracts business intelligence, maps all internal pages, and categorises them into products, categories, or non-commerce pages. All outputs are saved in Google Sheets for easy access.
Use cases
- Lead enrichment for sales and marketing teams
- Ecommerce product & category discovery
- Competitor website analysis
- Website audits and content mapping
- Market and industry research
How it works
- A user submits a website URL via an n8n form.
- The homepage is scraped and cleaned.
- AI extracts company insights (value proposition, industry, audience, B2B/B2C).
- Firecrawl maps all internal URLs.
- URLs are enriched with metadata.
- AI classifies each URL as product, category, or other.
- Results are written into structured Google Sheets tabs.
How to use
- Import the workflow into n8n.
- Connect Google Sheets, Firecrawl, and AI credentials.
- Update the Google Sheets document links.
- Open the form URL and submit a website.
- Let the workflow run and review the results in Sheets.
Requirements
- n8n (self-hosted or cloud)
- Firecrawl API key
- Google Gemini or compatible LLM credentials
- Google Sheets account
Customising this workflow
- Change AI prompts to match your niche (SaaS, ecommerce, services).
- Add filters to exclude unwanted URLs (blogs, legal pages, etc.).
- Extend Sheets with scoring, tagging, or lead qualification logic.
- Replace the LLM with another supported model if needed.
What this template demonstrates
- End-to-end website intelligence extraction
- Safe, rule-based AI classification (no hallucinations)
- Scalable URL processing with batching
- Clean data pipelines into Google Sheets
- Practical AI usage for real business workflows
This template is designed to work out-of-the-box for website intelligence, ecommerce mapping, and lead research.
Feel free to reach out for custom implementation or enhancements:
๐ง Email: @dinakars2003@gmail.com
n8n Website Intelligence and E-commerce URL Classifier
This n8n workflow extracts information from submitted URLs, classifies them using Google Gemini, and then stores the results in a Google Sheet. It's designed to automate the process of gathering website intelligence and categorizing e-commerce URLs.
What it does
This workflow automates the following steps:
- Triggers on Form Submission: Initiates when a URL is submitted via an n8n form.
- Fetches Website Content: Uses an HTTP Request node to retrieve the HTML content of the submitted URL.
- Extracts Key Information: Employs an HTML Extract node to pull specific data from the website's HTML.
- Prepares Data for AI: A Code node processes the extracted data, likely formatting it for the AI model.
- Classifies with Google Gemini: Sends the prepared data to Google Gemini for classification, potentially determining if it's an e-commerce site and categorizing its type.
- Stores Results in Google Sheets: Appends the original URL, extracted data, and Gemini's classification to a Google Sheet.
Prerequisites/Requirements
To use this workflow, you will need:
- n8n Instance: A running n8n instance.
- Google Sheets Account: To store the extracted and classified data.
- Google Gemini API Key: For the AI classification step.
- Basic understanding of CSS Selectors: For configuring the HTML Extract node to target specific website elements.
Setup/Usage
- Import the Workflow:
- Download the provided JSON file.
- In your n8n instance, go to "Workflows" and click "New".
- Click the three dots menu (โฎ) in the top right and select "Import from JSON".
- Paste the workflow JSON or upload the file.
- Configure Credentials:
- Google Sheets: Set up a Google Sheets credential to allow n8n to write data to your spreadsheet.
- Google Gemini: Configure a Google Gemini credential with your API key.
- Configure Nodes:
- On form submission (Form Trigger): No specific configuration needed beyond the default, unless you want to customize the form fields.
- HTTP Request: This node will dynamically fetch the URL provided by the form trigger. No changes are typically needed here.
- HTML Extract: Configure the CSS selectors to extract the specific data points you need from the websites (e.g., product titles, descriptions, prices, categories).
- Code: Review the JavaScript code to ensure it processes the extracted data as expected for your classification needs. You might need to adjust it based on the output of the HTML Extract node.
- Google Gemini: Configure the prompt for Gemini to accurately classify the website content (e.g., "Classify this website as 'E-commerce', 'Blog', 'Company Homepage', etc. If e-commerce, provide 3 potential product categories.").
- Google Sheets: Specify the Spreadsheet ID and Sheet Name where the data should be written. Map the input fields from previous nodes to the columns in your Google Sheet.
- Activate the Workflow: Once configured, activate the workflow.
- Submit URLs: Use the n8n Form Trigger URL to submit website URLs for processing.
Related Templates
Automate AI video creation & multi-platform publishing with GPT-4, Veo 3.1 & Blotato
๐ฅ Automate AI Video Creation & Multi-Platform Publishing with Veo 3.1 & Blotato ๐ฏ Who is this for? This workflow is designed for content creators, marketers, and automation enthusiasts who want to produce professional AI-generated videos and publish them automatically on social media โ without editing or manual uploads. Perfect for those using Veo 3.1, GPT-4, and Blotato to scale video creation. --- ๐ก What problem is this workflow solving? Creating short-form content (TikTok, Instagram Reels, YouTube Shorts) is time-consuming โ from writing scripts to video editing and posting. This workflow eliminates the manual steps by combining AI storytelling + video generation + automated publishing, letting you focus on creativity while your system handles production and distribution. --- โ๏ธ What this workflow does Reads new ideas from Google Sheets Generates story scripts using GPT-4 Creates cinematic videos using Veo 3.1 (fal.ai/veo3.1/reference-to-video) with 3 input reference images Uploads the final video automatically to Google Drive Publishes the video across multiple platforms (TikTok, Instagram, Facebook, X, LinkedIn, YouTube) via Blotato Updates Google Sheets with video URL and status (Completed / Failed) --- ๐งฉ Setup Required accounts: OpenAI โ GPT-4 API key fal.ai โ Veo 3.1 API key Google Cloud Console โ Sheets & Drive connection Blotato โ API key for social media publishing Configuration steps: Copy the Google Sheets structure: A: id_video B: niche C: idea D: url_1 E: url_2 F: url_3 G: url_final H: status Add your API keys to the Workflow Configuration node. Insert three image URLs and a short idea into your sheet. Wait for the automation to process and generate your video. --- ๐ง How to customize this workflow Change duration or aspect ratio โ Edit the Veo 3.1 node JSON body (duration, aspect_ratio) Modify prompt style โ Adjust the โOptimize Prompt for Veoโ node for your desired tone or cinematic look Add more platforms โ Extend Blotato integration to publish on Pinterest, Reddit, or Threads Enable Telegram Trigger โ Allow users to submit ideas and images directly via Telegram --- ๐ Expected Outcome Within 2โ3 minutes, your idea is transformed into a full cinematic AI video โ complete with storytelling, visuals, and automatic posting to your social media channels. Save hours of editing and focus on strategy, creativity, and growth. --- ๐ Need help or want to customize this? ๐ฉ Contact: LinkedIn ๐บ YouTube: @DRFIRASS ๐ Workshops: Mes Ateliers n8n --- ๐ Documentation: Notion Guide Need help customizing? Contact me for consulting and support : Linkedin / Youtube / ๐ Mes Ateliers n8n
AI orchestrator: dynamically selects models based on input type
This workflow is designed to intelligently route user queries to the most suitable large language model (LLM) based on the type of request received in a chat environment. It uses structured classification and model selection to optimize both performance and cost-efficiency in AI-driven conversations. It dynamically routes requests to specialized AI models based on content type, optimizing response quality and efficiency. --- Benefits Smart Model Routing: Reduces costs by using lighter models for general tasks and reserving heavier models for complex needs. Scalability: Easily expandable by adding more request types or LLMs. Maintainability: Clear logic separation between classification, model routing, and execution. Personalization: Can be integrated with session IDs for per-user memory, enabling personalized conversations. Speed Optimization: Fast models like GPT-4.1 mini or Gemini Flash are chosen for tasks where speed is a priority. --- How It Works Input Handling: The workflow starts with the "When chat message received" node, which triggers the process when a chat message is received. The input includes the chat message (chatInput) and a session ID (sessionId). Request Classification: The "Request Type" node uses an OpenAI model (gpt-4.1-mini) to classify the incoming request into one of four categories: general: For general queries. reasoning: For reasoning-based questions. coding: For code-related requests. search: For queries requiring search tools. The classification is structured using the "Structured Output Parser" node, which enforces a consistent output format. Model Selection: The "Model Selector" node routes the request to one of four AI models based on the classification: Opus 4 (Claude 4 Sonnet): Used for coding requests. Gemini Thinking Pro: Used for reasoning requests. GPT 4.1 mini: Used for general requests. Perplexity: Used for search (Google-related) requests. AI Processing: The selected model processes the request via the "AI Agent" node, which includes intermediate steps for complex tasks. The "Simple Memory" node retains session context using the provided sessionId, enabling multi-turn conversations. Output: The final response is generated by the chosen model and returned to the user. --- Set Up Steps Configure Trigger: Ensure the "When chat message received" node is set up with the correct webhook ID to receive chat inputs. Define Classification Logic: Adjust the prompt in the "Request Type" node to refine classification accuracy. Verify the output schema in the "Structured Output Parser" node matches expected categories (general, reasoning, coding, search). Connect AI Models: Link each model node (Opus 4, Gemini Thinking Pro, GPT 4.1 mini, Perplexity) to the "Model Selector" node. Ensure credentials (API keys) for each model are correctly configured in their respective nodes. Set Up Memory: Configure the "Simple Memory" node to use the sessionId from the input for context retention. Test Workflow: Send test inputs to verify classification and model routing. Check intermediate outputs (e.g., request_type) to ensure correct model selection. Activate Workflow: Toggle the workflow to "Active" in n8n after testing. --- Need help customizing? Contact me for consulting and support or add me on Linkedin.
Generate multi-platform content with OpenAI, Tavily Research & Supabase Storage
Automated Content Page Generator with AI, Tavily Research, and Supabase Storage > โ ๏ธ Self-Hosted Disclaimer: This template requires self-hosted n8n installation and external service credentials (OpenAI, Tavily, Google Drive, NextCloud, Supabase). It cannot run on n8n Cloud due to dependency requirements. Overview Transform simple topic inputs into professional, multi-platform content automatically. This workflow combines AI-powered content generation with intelligent research and seamless storage integration to create website content, blog articles, and landing pages optimized for different audiences. Key Features Automated Research: Uses Tavily's advanced search to gather relevant, up-to-date information Multi-Platform Content: Generates optimized content for websites, blogs, and landing pages Image Management: Downloads from Google Drive and uploads to NextCloud with public URL generation Database Integration: Stores all content in Supabase for easy retrieval Error Handling: Built-in error management workflow for reliability Content Optimization: AI-driven content strategy with trend analysis and SEO optimization Required Services & APIs Core Services n8n: Self-hosted instance (required) OpenAI: GPT-4 API access for content generation Tavily: Research API for content discovery Google Drive: Image storage and retrieval Google Sheets: Content input and workflow triggering NextCloud: Image hosting and public URL generation Supabase: Database storage for generated content Setup Instructions Prerequisites Before setting up this workflow, ensure you have: Self-hosted n8n installation API credentials for all required services Database table created in Supabase Step 1: Service Account Configuration OpenAI Setup Create an OpenAI account at platform.openai.com Generate API key from the API Keys section In n8n, create new OpenAI credentials using your API key Test connection to ensure GPT-4 access Tavily Research Setup Sign up at tavily.com Get your API key from the dashboard Add Tavily credentials in n8n Configure search depth to "advanced" for best results Google Services Setup Create Google Cloud Project Enable Google Drive API and Google Sheets API Create OAuth2 credentials Configure Google Drive and Google Sheets credentials in n8n Share your input spreadsheet with the service account NextCloud Setup Install NextCloud or use hosted solution Create application password for API access Configure NextCloud credentials in n8n Create /images/ folder for content storage Supabase Setup Create Supabase project at supabase.com Create table with the following structure: sql CREATE TABLE works ( id SERIAL PRIMARY KEY, title TEXT NOT NULL, content TEXT NOT NULL, image_url TEXT, category TEXT, created_at TIMESTAMP DEFAULT NOW() ); Get project URL and service key from settings Configure Supabase credentials in n8n Step 2: Google Sheets Input Setup Create a Google Sheets document with the following columns: TITLE: Topic or title for content generation IMAGE_URL: Google Drive sharing URL for associated image Example format: TITLE | IMAGE_URL AI Chatbot Implementation | https://drive.google.com/file/d/your-file-id/view Digital Marketing Trends 2024 | https://drive.google.com/file/d/another-file-id/view Step 3: Workflow Import and Configuration Import the workflow JSON into your n8n instance Configure all credential connections: Link OpenAI credentials to "OpenAIGPT4Model" node Link Tavily credentials to "TavilyResearchAgent" node Link Google credentials to "GoogleSheetsTrigger" and "GoogleDriveImage_Downloader" nodes Link NextCloud credentials to "NextCloudImageUploader" and "NextCloudPublicURL_Generator" nodes Link Supabase credentials to "SupabaseContentStorage" node Update the Google Sheets Trigger node: Set your spreadsheet ID in the documentId field Configure polling frequency (default: every minute) Test each node connection individually before activating Step 4: Error Handler Setup (Optional) The workflow references an error handler workflow (GWQ4UI1i3Z0jp3GF). Either: Create a simple error notification workflow with this ID Remove the error handling references if not needed Update the workflow ID to match your error handler Step 5: Workflow Activation Save all node configurations Test the workflow with a sample row in your Google Sheet Verify content generation and storage in Supabase Activate the workflow for continuous monitoring How It Works Workflow Process Trigger: Google Sheets monitors for new rows with content topics Research: Tavily searches for 3 relevant articles about the topic Content Generation: AI agent creates multi-platform content (website, blog, landing page) Content Cleaning: Text processing removes formatting artifacts Image Processing: Downloads image from Google Drive, uploads to NextCloud URL Generation: Creates public sharing links for images Storage: Saves final content package to Supabase database Content Output Structure Each execution generates: Optimized Title: SEO-friendly, platform-appropriate headline Multi-Platform Content: Website content (professional, authority-building) Blog content (educational, SEO-optimized) Landing page content (conversion-focused) Category Classification: Automated content categorization Image Assets: Processed and publicly accessible images Customization Options Content Strategy Modification Edit the AI agent's system message to change content style Adjust character limits for different platform requirements Modify category classifications for your industry Research Parameters Change Tavily search depth (basic, advanced) Adjust number of research sources (1-10) Modify search topic focus Storage Configuration Update Supabase table structure for additional fields Change NextCloud folder organization Modify image naming conventions Troubleshooting Common Issues Workflow not triggering: Check Google Sheets permissions Verify polling frequency settings Ensure spreadsheet format matches requirements Content generation errors: Verify OpenAI API key and credits Check GPT-4 model access Review system message formatting Image processing failures: Confirm Google Drive sharing permissions Check NextCloud storage space and permissions Verify file formats are supported Database storage issues: Validate Supabase table structure Check API key permissions Review field mapping in storage node Performance Optimization Adjust polling frequency based on your content volume Monitor API usage to stay within limits Consider batch processing for high-volume scenarios Support and Updates This template is designed for self-hosted n8n environments and requires technical setup. For issues: Check n8n community forums Review service-specific documentation Test individual nodes in isolation Monitor execution logs for detailed error information