Ai website scraper & company intelligence
AI Website Scraper & Company Intelligence Description This workflow automates the process of transforming any website URL into a structured, intelligent company profile. It's triggered by a form, allowing a user to submit a website and choose between a "basic" or "deep" scrape. The workflow extracts key information (mission, services, contacts, SEO keywords), stores it in a structured Supabase database, and archives a full JSON backup to Google Drive. It also features a secondary AI agent that automatically finds and saves competitors for each company, building a rich, interconnected database of company intelligence. --- Quick Implementation Steps Import the Workflow: Import the provided JSON file into your n8n instance. Install Custom Community Node: You must install the community node from: https://www.npmjs.com/package/n8n-nodes-crawl-and-scrape FIRECRAWL N8N Documentation https://docs.firecrawl.dev/developer-guides/workflow-automation/n8n Install Additional Nodes: n8n-nodes-crawl-and-scrape and n8n-nodes-mcp fire crawl mcp . Set up Credentials: Create credentials in n8n for FIRE CRAWL API,Supabase, Mistral AI, and Google Drive. Configure API Key (CRITICAL): Open the Web Search tool node. Go to Parameters → Headers and replace the hardcoded Tavily AI API key with your own. Configure Supabase Nodes: Assign your Supabase credential to all Supabase nodes. Ensure table names (e.g., companies, competitors) match your schema. Configure Google Drive Nodes: Assign your Google Drive credential to the Google Drive2 and save to Google Drive1 nodes. Select the correct Folder ID. Activate Workflow: Turn on the workflow and open the Webhook URL in the “On form submission” node to access the form. --- What It Does Form Trigger Captures user input: “Website URL” and “Scraping Type” (basic or deep). Scraping Router A Switch node routes the flow: Deep Scraping → AI-based MCP Firecrawler agent. Basic Scraping → Crawlee node. Deep Scraping (Firecrawl AI Agent) Uses Firecrawl and Tavily Web Search. Extracts a detailed JSON profile: mission, services, contacts, SEO keywords, etc. Basic Scraping (Crawlee) Uses Crawl and Scrape node to collect raw text. A Mistral-based AI extractor structures the data into JSON. Data Storage Stores structured data in Supabase tables (companies, company_basicprofiles). Archives a full JSON backup to Google Drive. Automated Competitor Analysis Runs after a deep scrape. Uses Tavily web search to find competitors (e.g., from Crunchbase). Saves competitor data to Supabase, linked by company_id. --- Who's It For Sales & Marketing Teams: Enrich leads with deep company info. Market Researchers: Build structured, searchable company databases. B2B Data Providers: Automate company intelligence collection. Developers: Use as a base for RAG or enrichment pipelines. --- Requirements n8n instance (self-hosted or cloud) Supabase Account: With tables like companies, competitors, social_links, etc. Mistral AI API Key Google Drive Credentials Tavily AI API Key (Optional) Custom Nodes: n8n-nodes-crawl-and-scrape --- How It Works Flow Summary Form Trigger: Captures “Website URL” and “Scraping Type”. Switch Node: deep → MCP Firecrawler (AI Agent). basic → Crawl and Scrape node. Scraping & Extraction: Deep path: Firecrawler → JSON structure. Basic path: Crawlee → Mistral extractor → JSON. Storage: Save JSON to Supabase. Archive in Google Drive. Competitor Analysis (Deep Only): Finds competitors via Tavily. Saves to Supabase competitors table. End: Finishes with a No Operation node. --- How To Set Up Import workflow JSON. Install community nodes (especially n8n-nodes-crawl-and-scrape from npm). Configure credentials (Supabase, Mistral AI, Google Drive). Add your Tavily API key. Connect Supabase and Drive nodes properly. Fix disconnected “basic” path if needed. Activate workflow. Test via the webhook form URL. --- How To Customize Change LLMs: Swap Mistral for OpenAI or Claude. Edit Scraper Prompts: Modify system prompts in AI agent nodes. Change Extraction Schema: Update JSON Schema in extractor nodes. Fix Relational Tables: Add Items node before Supabase inserts for arrays (social links, keywords). Enhance Automation: Add email/slack notifications, or replace form trigger with a Google Sheets trigger. --- Add-ons Automated Trigger: Run on new sheet rows. Notifications: Email or Slack alerts after completion. RAG Integration: Use the Supabase database as a chatbot knowledge source. --- Use Case Examples Sales Lead Enrichment: Instantly get company + competitor data from a URL. Market Research: Collect and compare companies in a niche. B2B Database Creation: Build a proprietary company dataset. --- WORKFLOW IMAGE --- Troubleshooting Guide | Issue | Possible Cause | Solution | |-------|----------------|-----------| | Form Trigger 404 | Workflow not active | Activate the workflow | | Web Search Tool fails | Missing Tavily API key | Replace the placeholder key | | FIRECRAWLER / find competitor fails | Missing MCP node | Install n8n-nodes-mcp | | Basic scrape does nothing | Switch node path disconnected | Reconnect “basic” output | | Supabase node error | Wrong table/column names | Match schema exactly | --- Need Help or More Workflows? Want to customize this workflow for your business or integrate it with your existing tools? Our team at Digital Biz Tech can tailor it precisely to your use case from automation logic to AI-powered enhancements. Contact: shilpa.raju@digitalbiz.tech For more such offerings, visit us: https://www.digitalbiz.tech ---
Convert PDF to PDFA using ConvertAPI
Who is this for? For developers and organizations that need to convert PDF files to PDFA for long term archiving. What problem is this workflow solving? The file format conversion problem. What this workflow does Downloads the PDF file from the web. Converts the PDF file to PDFA. Stores the PDFA file in the local file system. How to customize this workflow to your needs Open the HTTP Request node. Adjust the URL parameter (all endpoints can be found here). Use your API Token for authentication. Pass the token in the Authorization header as a Bearer token. You can manage your API Tokens in the User panel → Authentication. Optionally, additional Body Parameters can be added for the converter.
Filter breaking geopolitical news with AI scoring & Telegram alerts
Geopolitics Breaking News Alert System Workflow Name: Geopolitics Breaking News Alert System Author: Devjothi Dutta Category: Productivity, News & Media, AI/Machine Learning Complexity: Medium Setup Time: 45-60 minutes --- --- 📖 Description An intelligent geopolitical monitoring system that filters 200+ daily news articles down to only the critical breaking news that matters to you. This workflow uses smart keyword filtering and AI-powered scoring to eliminate noise, reduce AI costs, and deliver only high-priority geopolitical alerts to Telegram. The Problem: Traditional news monitoring is overwhelming - hundreds of articles per hour, 95% irrelevant to your region of interest, no urgency prioritization, and critical breaking news gets buried in noise. The Solution: This workflow combines dual-layer filtering (primary + secondary keywords) with AI scoring to distinguish actual breaking news from general news coverage. By filtering first and scoring second, you reduce AI API costs by 80-90% while ensuring you never miss critical geopolitical developments. Switch between monitoring India, China, Middle East, Russia-Ukraine, or any region by simply changing a configuration file. Perfect for government analysts, corporate security teams, investment research firms, news organizations, or anyone who needs to stay informed about geopolitical developments without information overload. 👥 Who's it for For Government & Defense Analysts: Monitor specific regions for military actions, diplomatic developments, and security threats Filter by mission-critical keywords to eliminate irrelevant news AI scoring identifies genuine breaking news vs routine coverage Reduce analyst workload by 90% through intelligent automation For Corporate Security & Risk Teams: Track geopolitical risks affecting global supply chains and operations Custom keyword filters for industry-specific concerns (e.g., "semiconductor", "tariff", "sanctions") Real-time alerts for events impacting business continuity Cost-efficient monitoring with minimal AI API usage For Investment Research Firms: Monitor emerging market geopolitical risks affecting portfolio companies AI scoring differentiates market-moving events from background noise Configurable alert thresholds based on investment strategy (conservative vs aggressive) Track multiple regions simultaneously with different configs For News Organizations & Journalists: Monitor breaking geopolitical developments for editorial coverage Filter by urgency to prioritize assignment desk resources Aggregate multiple international news sources in one place Extend alerts to newsroom Slack channels or email ✨ Key Features 🎯 Smart Dual-Layer Filtering - Primary keywords ensure regional relevance, secondary keywords filter by event type (military, diplomatic, economic) 🤖 AI-Powered Urgency Scoring - GPT-4o-mini scores articles 1-10 based on geopolitical urgency, distinguishing breaking news from routine coverage 💰 Cost-Efficient Design - Filter first, score second approach reduces AI API calls by 80-90% (only ~5 articles analyzed out of 200) 🌍 Multi-Region Support - Monitor India, China, Middle East, Russia-Ukraine, or any region by switching config files 📰 Multi-Source RSS Aggregation - Combines 6 international news sources (NYT, BBC, Al Jazeera, SCMP, regional feeds) 🔄 Duplicate Detection - Persistent storage prevents re-analyzing same articles across multiple executions 📊 Consolidated Alerts - Single Telegram message with all breaking news, grouped by urgency score ⏰ Flexible Scheduling - Configure trigger interval per your needs (15min for active conflicts, 3hr for routine monitoring) 💾 Config-Driven Architecture - All filters, keywords, and scoring rules in Google Drive JSON file 🔒 Production Ready - Tested end-to-end with real-world India and China configurations 📈 Scalable Design - Run multiple regional configs in parallel, extend to Slack/WhatsApp/Email delivery 🛠️ Requirements Required Services: n8n (version 1.0+) - Workflow automation platform Free tier: n8n cloud or self-hosted Docker Required feature: Data Tables (for duplicate tracking) OpenAI API (GPT-4o-mini) - AI scoring engine Cost: ~$0.10/day for 30min intervals Free tier: $5 credit for new accounts Telegram Bot - Alert delivery Free: Create via @BotFather on Telegram Get chat ID via @userinfobot Google Drive - Config file storage Free: Any Google account Used for publicly shared JSON config files Required Credentials: OpenAI API Key - Get from platform.openai.com (GPT-4o-mini access) Telegram Bot Token - Create bot via @BotFather, get token n8n Data Table - Built-in n8n feature (no external credential) Optional: Slack Webhook URL (for extending alerts to Slack) SMTP credentials (for email alerts) Twilio account (for WhatsApp/SMS alerts) 📦 What's Included This workflow package includes: Complete n8n workflow JSON (ready to import) Complete setup guide - Detailed configuration with Data Table setup, troubleshooting Technical architecture documentation Use cases and customization guide 4 pre-built regional configs (India, China, Middle East, Russia-Ukraine) 🚀 Quick Start Full setup takes 45-60 minutes. For detailed step-by-step instructions, see SETUP_GUIDE.md Overview Create n8n Data Table (analyzed_articles with 2 columns) Upload config to Google Drive (choose region, share publicly, get file ID) Import workflow (22 nodes ready to configure) Configure nodes: Update Google Drive config URL with your file ID Update 6 RSS Feed URLs for your region Link 3 Data Table nodes to analyzed_articles table Add credentials (OpenAI API, Telegram Bot) Set schedule (15min-daily based on monitoring needs) Test workflow (verify filtering, scoring, alerts work) Activate (workflow runs automatically on schedule) Quick Start Result: ✅ 200+ articles processed → 5-7 filtered → 3-5 scored → 1-3 alerts sent ✅ Telegram receives consolidated breaking news message ✅ Workflow runs every 30min (or your chosen interval) ✅ Total monthly cost: $3-5 (OpenAI API only) Need help? See detailed SETUP_GUIDE.md for complete instructions with screenshots and troubleshooting. 📊 Workflow Stats Nodes: 22 Complexity: Medium Execution Time: ~30-60 seconds per run Monthly Cost: $3-5 (OpenAI API usage only) Maintenance: Minimal (update RSS feeds if sources change) Scalability: Handles 200+ articles per execution, easily scales to 10+ RSS feeds 🎨 Customization Options Add more regions: Create new config JSON files for North Korea, Taiwan, Africa, Latin America, etc. Multi-channel alerts: Extend to Slack, WhatsApp, Email, Discord, Microsoft Teams, SMS Severity-based routing: Send critical alerts (score 9-10) via SMS, others to Telegram Custom scoring models: Switch between GPT-4o-mini, GPT-4o, Claude based on config Exclude keywords: Add "exclude_keywords" array to filter out sports, entertainment, weather Alert digest mode: Aggregate alerts into daily/weekly summary emails instead of real-time Dashboard integration: Connect to Grafana or Metabase for visual trend analysis Webhook triggers: Use workflow output to trigger other n8n workflows or external systems Custom RSS feeds: Add industry-specific or regional news sources Adjust alert threshold: Change from score >= 6 to higher/lower based on notification preferences 🔧 How it Works Schedule Trigger (Configurable): Workflow runs at your configured interval (15min, 30min, 1hr, 3hr, daily, etc.) Trigger frequency depends on use case: active conflicts need more frequent monitoring Config Loading: HTTP Request node fetches JSON config from Google Drive Config contains: keywords, scoring rules, AI role, alert threshold, Telegram chat ID RSS Aggregation: 6 RSS Feed nodes fetch articles from international news sources Merge node combines all feeds (~200 articles per execution) RSS Cleanup node strips HTML and normalizes to 5 fields (60-75% size reduction) Smart Filtering (Cost Optimization Layer 1): Dynamic Filter checks PRIMARY keywords (geographic/entity: "india", "modi", "delhi") Also checks SECONDARY keywords (event type: "military", "conflict", "trade deal") Both conditions required: Article must mention at least one primary AND one secondary Result: 200 articles reduced to ~5-7 relevant articles (95% reduction) Why this matters: Eliminates noise BEFORE expensive AI scoring Duplicate Detection (Cost Optimization Layer 2): Queries Data Table for previously analyzed article links Filters out articles already scored in last 7 days Result: ~5-7 filtered articles reduced to ~3-5 new articles Why this matters: Prevents redundant AI API calls (saves 80% on repeat articles) Dynamic AI Prompt Generation: Code node builds system prompt from config.airole and config.scoringcriteria Instructs AI: "You are a geopolitical analyst for [REGION]. Score articles 1-10..." Includes scoring rubric: 9-10 = Military Action, 7-8 = Trade/Economic, etc. AI Urgency Scoring (Breaking News Detection): Breaking News Analyzer (GPT-4o-mini) evaluates geopolitical urgency Scores 1-10: Distinguishes genuine breaking news from routine coverage Returns: score, category, reasoning, should_alert (true/false based on threshold) Cost: ~$0.002 per article (only ~3-5 articles scored per execution) Alert Decision: IF node checks: shouldalert === true (score >= config.alertthreshold) Only high-priority alerts proceed to Telegram Articles below threshold are logged but not sent Alert Aggregation: Consolidates multiple breaking news alerts into single Telegram message Groups by urgency score with color-coded emojis (🔴 9-10, 🟠 7-8, 🟡 6-7) Includes: score, category, title, link for each alert Telegram Delivery: Sends consolidated alert to configured Telegram chat Uses HTML formatting for bold text and clickable links Chat ID dynamically loaded from config (different regions → different chats) 💡 Pro Tips Start with Higher Threshold: Begin with alert_threshold = 7 to avoid alert fatigue, lower to 6 after tuning keywords Regional RSS Matters: Use region-specific news sources for better coverage (e.g., Times of India for India, not just BBC/NYT) Test Keywords First: Run workflow manually with "Test Workflow" to verify keyword filtering before activating schedule Monitor AI Costs: Check OpenAI usage dashboard after first week to confirm ~$0.10/day cost estimate Tune Secondary Keywords: Add domain-specific terms to secondary keywords (e.g., "semiconductor" for tech supply chain monitoring) Use Separate Configs for Critical Regions: Clone workflow for high-priority regions instead of switching configs manually Schedule Based on Time Zones: Align execution intervals with business hours in monitored region (e.g., 9AM-6PM IST for India) Clear Duplicates for Testing: Manually clear analyzed_articles Data Table when testing new configs for fresh results Backup Working Configs: Export and version control config files before making major keyword changes Consider Alert Fatigue: Score 9-10 events are rare (0-1 per day), score 6-8 events are common (2-5 per day) - set threshold accordingly 🔗 Related Workflows Multi-Region Geopolitics Dashboard - Combine multiple regional configs into single monitoring dashboard Geopolitical Risk Scoring for Portfolios - Integrate with stock portfolio data to assess investment risk Automated Geopolitical Intelligence Reports - Generate daily/weekly PDF reports from breaking news data Conflict Escalation Tracker - Track score trends over time to detect escalating tensions Supply Chain Risk Alerting - Focus on trade/sanctions news affecting global supply chains 📧 Support & Feedback For questions, issues, or feature requests: GitHub: n8n-geopolitics-breaking-news-alert Repository n8n Community Forum: Tag @devdutta Email: devjothi@gmail.com 📄 License MIT License - Free to use, modify, and distribute --- ⭐ If you find this workflow useful, please share your feedback and star the workflow! ---
Discord to Google Sheets task manager with GPT prioritization and deep work focus
AI-Powered Discord Task Manager with Priority Intelligence Mission-Aligned Task Tracker: Discord + AI + Google Sheets Opening Summary This n8n template demonstrates how to automate task management by syncing tasks from a Discord channel to Google Sheets, enriching them with AI-driven prioritization, and delivering a daily prioritized digest back to Discord. It streamlines task organization aligned with personal mission and productivity frameworks. Use cases are many: Try managing your team’s project tasks by automatically prioritizing them based on strategic goals! Try personal task tracking with AI-powered prioritization for optimized daily productivity! Try automating follow-ups and completed task archiving seamlessly between Discord and Google Sheets! Good to know Using OpenAI GPT-4.1 and GPT-5 mini models may incur API costs based on usage (check your OpenAI pricing plan). Google Sheets API has rate limits; large task volumes may require batch adjustments to avoid quota errors. Discord API OAuth2 authentication is needed with permissions to read messages, add reactions, and post messages. The workflow requires shared Google Sheets with specific sheets named Tasks and completed tasks (template link provided). Reaction emojis in Discord (✍️ for processed, ✅ for completed) are used to track task status within Discord. AI-driven prioritization follows mission alignment based on Eisenhower Matrix, energy levels, and impact scoring. Uses concepts from: Deep work by Cal Newport Essentialism: The Disciplined Pursuit of Less – Greg McKeown (2014) Getting Results the Agile Way – J.D. Meier Hyperfocus – Chris Bailey (2018) Slow Productivity – Cal Newport (2024) Newport’s newest book. Explicitly about doing fewer things, working at a natural pace, and obsessing over quality—basically Deep Work 2.0 How it works Schedule Trigger fires hourly to initiate task syncing. Set discord IDs here node defines Discord server and channel IDs for input/output. get data - tasks Channel fetches all messages from the Discord input channel. Loop Over Items1 and if message is recorded already prevent reprocessing tasks. clean data prepares message information to uniform structure. ai task organizer node sends each task text to OpenAI GPT-4.1 mini agent, which analyzes and assigns priority, impact, energy level, category, and other metadata aligned to the user's mission. Tasks are appended to Google Sheets Tasks sheet using Append row in task sheet. react to confirm adds a reaction on Discord to mark the message as processed. Get tasks to do retrieves all in-progress tasks from Google Sheets for daily prioritization. Aggregated task data is analyzed by the AI Agent with GPT-5 mini to select top 6 tasks (3 high-energy, 3 low-energy). The daily prioritized list is split into acceptable message sizes and sent back to a Discord output channel by Send a message node. The workflow checks for tasks with ✅ reactions in Discord (get checked ones), updates their status to "Completed" in Google Sheets (Update row in sheet). Completed tasks are moved to a separate completed tasks sheet (move completed rows to completed sheet) and deleted from active list (delete completed rows) in a loop until none remain. Wait nodes and limits are used to control API call pacing and batch sizes. How to use Set your Discord server and channel IDs for input (tasks-to-do) and output (my-prio-tasks-today) in the Set discord IDs here node. Connect your Google Sheets account and set the Spreadsheet ID in all relevant nodes (the sheet must have Tasks and completed tasks sheets with expected columns). Add your OpenAI API credentials for GPT-4.1 mini (task processing) and GPT-5 mini (daily digest). Ensure your Discord app has OAuth2 tokens with message read, react, and post permissions. Post tasks as messages in the configured Discord input channel. Run the workflow or activate it; it will sync, process, prioritize, and update tasks automatically on schedule. Customize the schedule trigger if you want more frequent or different syncing intervals. Requirements Discord account and bot/app with OAuth2 app credentials for message read, react, and post permissions. Google Sheets account for task data storage, with a spreadsheet structured as specified (Sheets: Tasks, completed tasks). OpenAI API account with access to GPT-4.1 mini and GPT-5 mini models for AI task analysis and summarization. Google Sheets OAuth2 credentials configured in n8n. Properly set Discord server and channel IDs in the workflow. Customising this workflow Try adding support for multiple Discord servers or channels to centralize tasks from different teams or projects. Extend AI prompts to include deadlines parsing or automated reminders. Customize the Google Sheets columns or the scoring logic to fit your unique productivity frameworks or KPIs. Incorporate notifications via email or Slack based on task priority or completion. Replace Google Sheets with other databases if scalable storage is required. Adjust the energy level and impact criteria in AI prompts to match your personal productivity rhythms. Sample inputs: "publish tasks tracker asap" "Improve personal portfolio asap" "Watch new movie - Jujutsu Kaisen" Sample output: 🔥 Today's Agenda ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ ⚡ Morning Deep Work Blocks (High Energy Required) Do these during peak hours (6 AM to 2 PM or whenever you're sharpest) 1️⃣ Improve personal portfolio for job applications 💡 Why: Critical for landing > 50k automation job - enhances job application success 📊 Priority: 100 | Impact: 10/10 Link: (https://discord.com/channels/1373770435146689/1481777943919293/1440107502032) 2️⃣ Complete all Udemy n8n courses ASAP 💡 Why: Essential skill for landing >50k automation job and digital product creation 📊 Priority: 88 | Impact: 9/10 Link: (https://discord.com/channels/137770435134668/14348177539192/14365639629204) 3️⃣ Finish finance debt tracker and publish as template 💡 Why: Generates immediate income potential and supports income generation 📊 Priority: 86 | Impact: 9/10 Link: (https://discord.com/channels/1373767704351346/1434817779453919/1436445965471973) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 🎯 DOWNTIME BLOCK (Low-Medium Energy) Do these during energy dips (post-lunch, late afternoon, tired moments) 1️⃣ Call Bank collections agency (09277055515 / 09559050973) 💡 Why: Limited-time offer; could significantly monthly payments 📊 Priority: 96 | Impact: 10/10 | Energy: Medium Link: (https://discord.com/channels/13767704351/14348177453/14379926889894) 2️⃣ Publish n8n workflow and submit for verification 💡 Why: Enables earning from this and future n8n workflows — immediate income opportunity 📊 Priority: 96 | Impact: 9/10 | Energy: Medium Link: (https://discord.com/channels/1373767435134/14347794539/143810998822) 3️⃣ Plan and pay Loan 💡 Why: Immediate debt payments reduce penalties/interest and support financial stability 📊 Priority: 96 | Impact: 10/10 | Energy: Medium Link: (https://discord.com/channels/1373767704351346/143481777945391/14382823609982) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 💪 EXECUTION STRATEGY 🛡️ Morning Block: Protect this ruthlessly — no meetings, no social, deep work only. ⚙️ Downtime Block: Tackle these during lower-energy windows; they move the money/debt needle without burning you out. 🎯 Win Condition: Complete all 6 = massive progress toward landing high-paying work and eliminating high-priority debt. Protect your morning deep work at all costs — it's your leverage. You're building financial freedom one prioritized action at a time. If you finish all 6 today: your portfolio and skills will be significantly closer to landing high-paying work, and you'll make a major dent in urgent debt obligations. Questions? If you have questions or need help with this workflow, feel free to reach out: elijahmamuri@gmail.com elijahfxtrading@gmail.com