Indeed job scraper with AI filtering & company research using Apify and Tavily

2436 views

2/3/2026

invoice processing gmail google drive google sheets ai extraction

This workflow contains community nodes that are only compatible with the self-hosted version of n8n.

This workflow scrapes job listings on indeed via Apify, automatically gets that dataset, extracts information about the listing filters jobs off relevance, finds a decision maker at the company and updates a database (google sheets) with that info for outreach. All you need to do is run Apify actor then the database will update with the processed data.

Benefits:

Complete Job search Automation - A webhook monitors the Apify actor which sends a integration and starts the process AI-Powered Filter - Uses ChatGPT to analyze content/context, identify company goals, and filters based on job description Smart Duplicate Prevention - Automatically tracks processed job listings in a database to avoid redundancy Multi-Platform Intelligence - Combines Indeed scraping, web research via Tavily, and enriches each listing Niche Focus - Process content from multiple niches 6 currently (hardcoded) but can be changed to fit other niches (just prompt the "job filter" node)

How It Works:

Indeed Job Discovery:

Search and apply filter for relevant job listings, copy and use URL in Apify
Uses Apify's Indeed job scraper to scrape job listings from the URL of interest
Automatically scrapes the information, stores it in a dataset and initiates a integration

Oncoming Data Processing:

Loops over 500 items (can be changed) with a batch size of 55 items (can be changed) to avoid running into API timeouts.
Multiple filters to ensure all fields are scrapped with our required metrics (website must exist and number of employees < 250)
Duplicate job listings are removed from oncoming batch to be processed

Job Analysis & Filter:

An additional filter to remove any job listing from the oncoming batch if it already exists in the google sheets database
Then all new job listings gets pasted to chatGPT which uses information about the job post/description to determine if it is relevant to us
All relevant jobs get a new field "verdict" which is either true or false and we keep the ones where verdict is true

Enrich & Update Database:

Uses Tavily to search for a decision maker (doesn't always finds one) and populate a row in google sheet with information about the job listing, the company and a decision maker at that company.
Waits for 1 minute and 30 seconds to avoid google sheets and chatGPT API timeouts then loops back to the next batch to start filtering again until all job listings are processed

Required Google Sheets Database Setup:

Before running this workflow, create a Google Sheets database with these exact column headers: Essential Columns:

jobUrl - Unique identifier for job listings title - Position Title descriptionText - Description of job listing hiringDemand/isHighVolumeHiring - Are they hiring at high volume? hiringDemand/isUrgentHire - Are they hiring at high urgency? isRemote - Is this job remote? jobType/0 - Job type: In person, Remote, Part-time, etc. companyCeo/name - CEO name collected from Tavily's search icebreaker - Column for holding custom icebreakers for each job listing (Not completed in the workflow. I will upload another that does this called "Personalized IJSFE") scrapedCeo - CEO name collected from Apify Scraper email - Email listed on for job listing companyName - Name of company that posted the job companyDescription - Description of the company that posted the job companyLinks/corporateWebsite - Website of the company that posted the job companyNumEmployees - Number of employees the company listed that they have location/country - Location of where the job is to take place salary/salaryText - Salary on job listing

Setup Instructions:

Create a new Google Sheet with these column headers in the first row Name the sheet whatever you please Connect your Google Sheets OAuth credentials in n8n Update the document ID in the workflow nodes

The merge logic relies on the id column to prevent duplicate processing, so this structure is essential for the workflow to function correctly. Feel free to reach out for additional help or clarification at my gmail: terflix45@gmail.com and I'll get back to you as soon as I can.

Set Up Steps:

Configure Apify Integration:

Sign up for an Apify account and obtain API key
Get indeed job scraper actor and use Apify's integration to send a HTTP request to your n8n webhook (if test URL doesn't work use production URL)
Use Apify node with Resource: Dataset, Operation: Get items and use your Api key as your credentials

Set Up AI Services:

Add OpenAI API credentials for job filtering
Add Tavily API credentials for company research
Set up appropriate rate limiting for cost control

Database Configuration:

Create Google Sheets database with provided column structure
Connect Google Sheets OAuth credentials
Configure the merge logic for duplicate detection

Content Filtering Setup:

Customize the AI prompts for your specific niche, requirements or interest
Adjust the filtering criteria to fit your needs

n8n Workflow: Indeed Job Scraper with AI Filtering & Company Research

This n8n workflow automates the process of finding relevant job postings on Indeed, filtering them using AI, and enriching the data with company research. It helps you focus on the most promising opportunities by leveraging a combination of web scraping, AI-powered analysis, and external data sources.

What it does

This workflow performs the following key steps:

Triggers Manually: The workflow is designed to be triggered manually, allowing you to initiate the job search when needed.
Scrapes Indeed Job Listings: It connects to a Google Sheets document, presumably to retrieve job search parameters or a list of companies to target.
Filters Data (Placeholder): An "If" node is present, suggesting conditional logic for filtering data, though the specific conditions are not defined in the provided JSON. This could be used for initial screening based on simple criteria.
Prepares Data for AI (Edit Fields): A "Set" node is used to manipulate or prepare the data, likely extracting specific fields or formatting them for subsequent AI processing.
Loops Over Items: The "Split in Batches" node indicates that the workflow will process job listings or company data in batches, likely to manage API limits or improve performance for subsequent steps.
Performs AI Analysis (OpenAI): An OpenAI node is included, suggesting that AI is used to analyze job descriptions, company information, or other data points. This could involve filtering based on relevance, sentiment analysis, or extracting key information.
Merges Data: A "Merge" node combines data streams, likely bringing together the original job posting data with the results of the AI analysis or company research.
Removes Duplicates: The "Remove Duplicates" node ensures that only unique job postings or company entries are processed further, preventing redundant operations.
Filters Again (Placeholder): Another "Filter" node is present, indicating a second stage of filtering. This could be used to narrow down results based on the AI's output or more refined criteria.
Waits (Placeholder): A "Wait" node is included, which might be used to introduce delays between API calls or processing steps to avoid rate limits or allow for asynchronous operations.
Final Output (Google Sheets): The final step writes the processed and filtered job data, potentially including AI insights and company research, back to a Google Sheet.

Prerequisites/Requirements

To use this workflow, you will need:

n8n Instance: A running instance of n8n.
Google Sheets Account: Configured credentials for Google Sheets to read initial data and write results.
OpenAI API Key: Credentials for OpenAI to utilize its AI models for filtering and analysis.
Indeed Scraper (External): While not explicitly shown as an n8n node, the workflow name suggests an external Indeed job scraper is used to feed data into the Google Sheet or directly into the workflow via the initial Google Sheets read operation.
Apify/Tavily Accounts (Implied): The directory name "Indeed Job Scraper with AI Filtering & Company Research using Apify and Tavily" suggests that Apify (for scraping) and Tavily (for company research) might be used, although their specific nodes are not present in the provided JSON. It's possible these services are integrated via the Google Sheets input/output or custom HTTP requests not detailed in the node definitions.

Setup/Usage

Import the workflow: Download the provided JSON and import it into your n8n instance.
Configure Credentials:
- Set up your Google Sheets credentials. Ensure the Google Sheet used for input/output is accessible and correctly formatted.
- Provide your OpenAI API Key in the OpenAI node's credentials.
Review Node Configurations:
- Google Sheets: Verify the spreadsheet ID, sheet name, and operation (e.g., "Read" for input, "Append" for output) are correctly configured.
- Edit Fields (Set): Adjust the fields being set or modified to match your data structure and AI requirements.
- If/Filter Nodes: Define the specific conditions for filtering job postings or company data based on your criteria.
- OpenAI: Configure the OpenAI model, prompt, and any other parameters for your AI filtering and analysis tasks.
- Loop Over Items (Split in Batches): Adjust the batch size if necessary.
Activate the workflow: Once configured, activate the workflow.
Trigger Manually: Execute the workflow manually to start the job scraping, filtering, and research process.

Related Templates

Automate cancellation feedback collection with Stripe and Google Sheets

Who's it for This template is perfect for any SaaS business or subscription service using Stripe. Product managers, customer success teams, and founders can use this to automatically collect cancellation feedback without manual follow-up. Ideal for companies looking to reduce churn by understanding why customers leave. What it does When a customer cancels their Stripe subscription, this workflow instantly: Detects the cancellation via Stripe webhook Fetches customer details from Stripe API Sends a personalized feedback survey email with embedded customer information Logs all cancellations to Google Sheets for tracking Receives survey responses via webhook Automatically routes feedback to different Google Sheets based on reason (pricing concerns, feature requests, or other feedback) This organized approach helps you identify patterns in cancellations and prioritize improvements that matter most. How it works Stripe triggers the workflow when a subscription is canceled Customer data is fetched from the Stripe API (email, name, plan details) Personalized email is sent with a survey link containing customer data as URL parameters Cancellation is logged to a Google Sheets "Cancellations" tab When the customer submits the survey, a webhook receives the response Feedback is routed to dedicated sheets based on cancellation reason: Price Concerns → for pricing-related issues Feature Requests → for missing functionality Other Feedback → for everything else Set up steps Setup time: ~20 minutes Prerequisites Stripe account (test mode recommended for initial setup) Google account with Google Sheets Email service (Gmail, Outlook, or SMTP) Survey tool with webhook support (Tally or Typeform recommended) Configuration Stripe webhook: Copy the webhook URL from the "Stripe Subscription Canceled" node and add it to your Stripe Dashboard → Webhooks. Select the customer.subscription.deleted event. Email credentials: Configure Gmail, Outlook, or SMTP credentials in the "Send Feedback Survey Email" node. Update the fromEmail parameter. Survey form: Create a survey form with these fields: Hidden fields (auto-populated from URL): email, customer_id, name, plan Visible fields: reason dropdown ("Too Expensive", "Missing Features", "Other"), comments textarea Configure webhook to POST responses to the "Survey Response Webhook" URL Google Sheets: Create a spreadsheet with 4 sheets: "Cancellations", "Price Concerns", "Feature Requests", and "Other Feedback". Connect your Google account in the Google Sheets nodes. Survey URL: Replace [SURVEY_URL] in the email template with your actual survey form URL. Test: Use Stripe test mode to trigger a test cancellation and verify the workflow executes correctly. Requirements Stripe account with API access Google Sheets (free) Email service: Gmail, Outlook, or SMTP server Survey tool: Tally (free), Typeform (paid), or custom form with webhook capability n8n instance: Cloud or self-hosted How to customize Different surveys by plan: Add an IF node after getting customer details to send different survey links based on subscription tier Slack notifications: Add a Slack node after "Route by Feedback Type" to alert your team about price concerns in real-time Delayed email: Insert a Wait node before sending the email to give customers a 24-hour cooldown period CRM integration: Add nodes to sync cancellation data with your CRM (HubSpot, Salesforce, etc.) Follow-up workflow: Create a separate workflow that triggers when feedback is received to send personalized follow-up offers Custom routing logic: Modify the Switch node conditions to match your specific survey options or add more categories Tips for success Use Stripe test mode initially to avoid sending emails to real customers during setup Customize the email tone to match your brand voice Keep the survey short (2-3 questions max) for higher response rates Review feedback weekly to identify patterns and prioritize improvements Consider offering a discount or incentive for completing the survey

By Daiki Takayama

Moderate your Discord server using chatGPT-5 & Google Sheets (Learning system)

Discord AI Content Moderator with Learning System This n8n template demonstrates how to automatically moderate Discord messages using AI-powered content analysis that learns from your community standards. It continuously monitors your server, intelligently flags problematic content while allowing context-appropriate language, and provides a complete audit trail for all moderation actions. Use cases are many: Try moderating a forex trading community where enthusiasm runs high, protecting a gaming server from toxic behavior while keeping banter alive, or maintaining professional standards in a business Discord without being overly strict! Good to know This workflow uses OpenAI's GPT-5 Mini model which incurs API costs per message analyzed (approximately $0.001-0.003 per moderation check depending on message volume) The workflow runs every minute by default - adjust the Schedule Trigger interval based on your server activity and budget Discord API rate limits apply - the batch processor includes 1.5-second delays between deletions to prevent rate limiting You'll need a Google Sheet to store training examples - a template link is provided in the workflow notes The AI analyzes context and intent, not just keywords - "I cking love this community" won't be deleted, but "you guys are sht" will be Deleted messages cannot be recovered from Discord - the admin notification channel preserves the content for review How it works The Schedule Trigger activates every minute to check for new messages requiring moderation We'll fetch training data from Google Sheets containing labeled examples of messages to delete (with reasons) and messages to keep The workflow retrieves the last 10 messages from your specified Discord channel using the Discord API A preparation node formats both the training examples and recent messages into a structured prompt with unique indices for each message The AI Agent (powered by GPT-5 Mini) analyzes each message against your community standards, considering intent and context rather than just keywords The AI returns a JSON array of message indices that violate guidelines (e.g., [0, 2, 5]) A parsing node extracts these indices, validates them, removes duplicates, and maps them to actual Discord message objects The batch processor loops through each flagged message one at a time to prevent API rate limiting and ensure proper error handling Each message is deleted from Discord using the exact message ID A 1.5-second wait prevents hitting Discord's rate limits between operations Finally, an admin notification is posted to your designated admin channel with the deleted message's author, ID, and original content for audit purposes How to use Replace the Discord Server ID, Moderated Channel ID, and Admin Channel ID in the "Edit Fields" node with your server's specific IDs Create a copy of the provided Google Sheets template with columns: messagecontent, shoulddelete (YES/NO), and reason Connect your Discord OAuth2 credentials (requires bot permissions for reading messages, deleting messages, and posting to channels) Add your OpenAI API key to access GPT-5 Mini Customize the AI Agent's system message to reflect your specific community standards and tone Adjust the message fetch limit (default: 10) based on your server activity - higher limits cost more per run but catch more violations Consider changing the Schedule Trigger from every minute to every 3-5 minutes if you have a smaller community Requirements Discord OAuth2 credentials for bot authentication with message read, delete, and send permissions Google Sheets API connection for accessing the training data knowledge base OpenAI API key for GPT-5 Mini model access A Google Sheet formatted with message examples, deletion labels, and reasoning Discord Server ID, Channel IDs (moderated + admin) which you can get by enabling Developer Mode in Discord Customising this workflow Try building an emoji-based feedback system where admins can react to notifications with ✅ (correct deletion) or ❌ (wrong deletion) to automatically update your training data Add a severity scoring system that issues warnings for minor violations before deleting messages Implement a user strike system that tracks repeat offenders and automatically applies temporary mutes or bans Expand the AI prompt to categorize violations (spam, harassment, profanity, etc.) and route different types to different admin channels Create a weekly digest that summarizes moderation statistics and trending violation types Add support for monitoring multiple channels by duplicating the Discord message fetch nodes with different channel IDs Integrate with a database instead of Google Sheets for faster lookups and more sophisticated training data management If you have questions Feel free to contact me here: elijahmamuri@gmail.com elijahfxtrading@gmail.com

By Cj Elijah Garay

Track software vulnerability patents with ScrapeGraphAI, Matrix, and Intercom

Software Vulnerability Patent Tracker ⚠️ COMMUNITY TEMPLATE DISCLAIMER: This is a community-contributed template that uses ScrapeGraphAI (a community node). Please ensure you have the ScrapeGraphAI community node installed in your n8n instance before using this template. This workflow automatically tracks newly-published patent filings that mention software-security vulnerabilities, buffer-overflow mitigation techniques, and related technology keywords. Every week it aggregates fresh patent data from USPTO and international patent databases, filters it by relevance, and delivers a concise JSON digest (and optional Intercom notification) to R&D teams and patent attorneys. Pre-conditions/Requirements Prerequisites n8n instance (self-hosted or n8n cloud, v1.7.0+) ScrapeGraphAI community node installed Basic understanding of patent search syntax (for customizing keyword sets) Optional: Intercom account for in-app alerts Required Credentials | Credential | Purpose | |------------|---------| | ScrapeGraphAI API Key | Enables ScrapeGraphAI nodes to fetch and parse patent-office webpages | | Intercom Access Token (optional) | Sends weekly digests directly to an Intercom workspace | Additional Setup Requirements | Setting | Recommended Value | Notes | |---------|-------------------|-------| | Cron schedule | 0 9 1 | Triggers every Monday at 09:00 server time | | Patent keyword matrix | See example CSV below | List of comma-separated keywords per tech focus | Example keyword matrix (upload as keywords.csv or paste into the “Matrix” node): topic,keywords Buffer Overflow,"buffer overflow, stack smashing, stack buffer" Memory Safety,"memory safety, safe memory allocation, pointer sanitization" Code Injection,"SQL injection, command injection, injection prevention" How it works This workflow automatically tracks newly-published patent filings that mention software-security vulnerabilities, buffer-overflow mitigation techniques, and related technology keywords. Every week it aggregates fresh patent data from USPTO and international patent databases, filters it by relevance, and delivers a concise JSON digest (and optional Intercom notification) to R&D teams and patent attorneys. Key Steps: Schedule Trigger: Fires weekly based on the configured cron expression. Matrix (Keyword Loader): Loads the CSV-based technology keyword matrix into memory. Code (Build Search Queries): Dynamically assembles patent-search URLs for each keyword group. ScrapeGraphAI (Fetch Results): Scrapes USPTO, EPO, and WIPO result pages and parses titles, abstracts, publication numbers, and dates. If (Relevance Filter): Removes patents older than 1 year or without vulnerability-related terms in the abstract. Set (Normalize JSON): Formats the remaining records into a uniform JSON schema. Intercom (Notify Team): Sends a summarized digest to your chosen Intercom workspace. (Skip or disable this node if you prefer to consume the raw JSON output instead.) Sticky Notes: Contain inline documentation and customization tips for future editors. Set up steps Setup Time: 10-15 minutes Install Community Node Navigate to “Settings → Community Nodes”, search for ScrapeGraphAI, and click “Install”. Create Credentials Go to “Credentials” → “New Credential” → select ScrapeGraphAI API → paste your API key. (Optional) Add an Intercom credential with a valid access token. Import the Workflow Click “Import” → “Workflow JSON” and paste the template JSON, or drag-and-drop the .json file. Configure Schedule Open the Schedule Trigger node and adjust the cron expression if a different frequency is required. Upload / Edit Keyword Matrix Open the Matrix node, paste your custom CSV, or modify existing topics & keywords. Review Search Logic In the Code (Build Search Queries) node, review the base URLs and adjust patent databases as needed. Define Notification Channel If using Intercom, select your Intercom credential in the Intercom node and choose the target channel. Execute & Activate Click “Execute Workflow” for a trial run. Verify the output. If satisfied, switch the workflow to “Active”. Node Descriptions Core Workflow Nodes: Schedule Trigger – Initiates the workflow on a weekly cron schedule. Matrix – Holds the CSV keyword table and makes each row available as an item. Code (Build Search Queries) – Generates search URLs and attaches meta-data for later nodes. ScrapeGraphAI – Scrapes patent listings and extracts structured fields (title, abstract, pub. date, link). If (Relevance Filter) – Applies date and keyword relevance filters. Set (Normalize JSON) – Maps scraped fields into a clean JSON schema for downstream use. Intercom – Sends formatted patent summaries to an Intercom inbox or channel. Sticky Notes – Provide inline documentation and edit history markers. Data Flow: Schedule Trigger → Matrix → Code → ScrapeGraphAI → If → Set → Intercom Customization Examples Change Data Source to Google Patents javascript // In the Code node const base = 'https://patents.google.com/?q='; items.forEach(item => { item.json.searchUrl = ${base}${encodeURIComponent(item.json.keywords)}&oq=${encodeURIComponent(item.json.keywords)}; }); return items; Send Digest via Slack Instead of Intercom javascript // Replace Intercom node with Slack node { "text": 🚀 New Vulnerability-related Patents (${items.length})\n + items.map(i => • <${i.json.link}|${i.json.title}>).join('\n') } Data Output Format The workflow outputs structured JSON data: json { "topic": "Memory Safety", "keywords": "memory safety, safe memory allocation, pointer sanitization", "title": "Memory protection for compiled binary code", "publicationNumber": "US20240123456A1", "publicationDate": "2024-03-21", "abstract": "Techniques for enforcing memory safety in compiled software...", "link": "https://patents.google.com/patent/US20240123456A1/en", "source": "USPTO" } Troubleshooting Common Issues Empty Result Set – Ensure that the keywords are specific but not overly narrow; test queries manually on USPTO. ScrapeGraphAI Timeouts – Increase the timeout parameter in the ScrapeGraphAI node or reduce concurrent requests. Performance Tips Limit the keyword matrix to <50 rows to keep weekly runs under 2 minutes. Schedule the workflow during off-peak hours to reduce load on patent-office servers. Pro Tips: Combine this workflow with a vector database (e.g., Pinecone) to create a semantic patent knowledge base. Add a “Merge” node to correlate new patents with existing vulnerability CVE entries. Use a second ScrapeGraphAI node to crawl citation trees and identify emerging technology clusters.

By vinci-king-01