AI agent that can scrape webpages
βοΈπ οΈππ€π¦Ύ
This template is a PoC of a ReAct AI Agent capable of fetching random pages (not only Wikipedia or Google search results).
On the top part there's a manual chat node connected to a LangChain ReAct Agent. The agent has access to a workflow tool for getting page content.
The page content extraction starts with converting query parameters into a JSON object. There are 3 pre-defined parameters:
- url β an address of the page to fetch
- method = full / simplified
- maxlimit - maximum length for the final page. For longer pages an error message is returned back to the agent
Page content fetching is a multistep process:
- An HTTP Request mode tries to get the page content.
If the page content was successfuly retrieved, a series of post-processing begin:
- Extract HTML BODY; content
- Remove all unnecessary tags to recuce the page size
- Further eliminate external URLs and IMG scr values (based on the method query parameter)
- Remaining HTML is converted to Markdown, thus recuding the page lengh even more while preserving the basic page structure
- The remaining content is sent back to an Agent if it's not too long (maxlimit = 70000 by default, see CONFIG node).
NB:
- You can isolate the HTTP Request part into a separate workflow.
- Check the Workflow Tool description, it guides the agent to provide a query string with several parameters instead of a JSON object.
Please reach out to Eduard is you need further assistance with you n8n workflows and automations!
Note that to use this template, you need to be on n8n version 1.19.4 or later.
AI Agent That Can Scrape Webpages
This n8n workflow demonstrates how to set up an AI agent capable of interacting with a user, making decisions, and potentially calling other n8n workflows as tools. While the workflow's current JSON definition doesn't include a direct web scraping action, it lays the groundwork for an AI agent that can be extended to perform such tasks by integrating a web scraping tool.
What it does
This workflow sets up a basic AI agent interaction loop:
- Listens for Chat Messages: It starts by waiting for a chat message from a user.
- Initializes AI Agent: It then passes the received message to an AI Agent node.
- Configures Language Model: The AI Agent uses an OpenAI Chat Model to process the input and determine the next steps.
- Defines Tools: It provides the AI Agent with the ability to "Call n8n Workflow Tool". This means the AI can decide to execute another n8n workflow based on its understanding of the user's request.
- Processes Agent Output: The output from the AI Agent is then processed.
- Conditional Logic: An "If" node checks a condition (though the specific condition isn't defined in the provided JSON, it would typically check for a specific response or action from the AI).
- Prepares Output (True Branch): If the condition is true, an "Edit Fields (Set)" node prepares the data.
- Formats Output (Markdown): The prepared data is then formatted as Markdown.
- Prepares Output (False Branch): If the condition is false, a different "Edit Fields (Set)" node prepares the data.
- Formats Output (Markdown): This data is also formatted as Markdown.
- Placeholder for External Execution: An "Execute Workflow Trigger" node suggests that this workflow might be called by another workflow or could trigger another workflow upon completion.
- Documentation Note: A "Sticky Note" is included, likely for internal documentation or reminders within the workflow.
Prerequisites/Requirements
- n8n Instance: A running n8n instance.
- OpenAI API Key: Required for the "OpenAI Chat Model" node to function. This credential needs to be configured within n8n.
- Langchain Nodes: Ensure the
@n8n/n8n-nodes-langchainpackage is installed in your n8n instance, as it provides the "AI Agent", "OpenAI Chat Model", and "Call n8n Workflow Tool" nodes.
Setup/Usage
- Import the Workflow: Import the provided JSON into your n8n instance.
- Configure Credentials:
- Locate the "OpenAI Chat Model" node and configure your OpenAI API Key credential.
- Configure AI Agent:
- Review the "AI Agent" node settings. You may want to adjust the system message, model, temperature, and other parameters to fine-tune its behavior.
- Define "Call n8n Workflow Tool":
- The "Call n8n Workflow Tool" node is currently generic. To enable the AI to perform specific actions like web scraping, you would need to:
- Create a separate n8n workflow dedicated to web scraping.
- Configure the "Call n8n Workflow Tool" node to point to this web scraping workflow, providing a clear description of what the tool does so the AI can understand when to use it.
- The "Call n8n Workflow Tool" node is currently generic. To enable the AI to perform specific actions like web scraping, you would need to:
- Configure Conditional Logic:
- Adjust the "If" node's condition to match your desired branching logic based on the AI agent's output.
- Activate the Workflow: Once configured, activate the workflow.
- Test: Send a chat message to the "Manual Chat Trigger" to test the AI agent's responses and tool usage.
Related Templates
Two-way property repair management system with Google Sheets & Drive
This workflow automates the repair request process between tenants and building managers, keeping all updates organized in a single spreadsheet. It is composed of two coordinated workflows, as two separate triggers are required β one for new repair submissions and another for repair updates. A Unique Unit ID that corresponds to individual units is attributed to each request, and timestamps are used to coordinate repair updates with specific requests. General use cases include: Property managers who manage multiple buildings or units. Building owners looking to centralize tenant repair communication. Automation builders who want to learn multi-trigger workflow design in n8n. --- βοΈ How It Works Workflow 1 β New Repair Requests Behind the Scenes: A tenant fills out a Google Form (βRepair Request Formβ), which automatically adds a new row to a linked Google Sheet. Steps: Trigger: Google Sheets rowAdded β runs when a new form entry appears. Extract & Format: Collects all relevant form data (address, unit, urgency, contacts). Generate Unit ID: Creates a standardized identifier (e.g., BUILDING-UNIT) for tracking. Email Notification: Sends the building manager a formatted email summarizing the repair details and including a link to a Repair Update Form (which activates Workflow 2). --- Workflow 2 β Repair Updates Behind the Scenes:\ Triggered when the building manager submits a follow-up form (βRepair Update Formβ). Steps: Lookup by UUID: Uses the Unit ID from Workflow 1 to find the existing row in the Google Sheet. Conditional Logic: If photos are uploaded: Saves each image to a Google Drive folder, renames files consistently, and adds URLs to the sheet. If no photos: Skips the upload step and processes textual updates only. Merge & Update: Combines new data with existing repair info in the same spreadsheet row β enabling a full repair history in one place. --- π§© Requirements Google Account (for Forms, Sheets, and Drive) Gmail/email node connected for sending notifications n8n credentials configured for Google API access --- β‘ Setup Instructions (see more detail in workflow) Import both workflows into n8n, then copy one into a second workflow. Change manual trigger in workflow 2 to a n8n Form node. Connect Google credentials to all nodes. Update spreadsheet and folder IDs in the corresponding nodes. Customize email text, sender name, and form links for your organization. Test each workflow with a sample repair request and a repair update submission. --- π οΈ Customization Ideas Add Slack or Telegram notifications for urgent repairs. Auto-create folders per building or unit for photo uploads. Generate monthly repair summaries using Google Sheets triggers. Add an AI node to create summaries/extract relevant repair data from repair request that include long submissions.
Send WooCommerce cross-sell offers to customers via WhatsApp using Rapiwa API
Who Is This For? This n8n workflow enables automated cross-selling by identifying each WooCommerce customer's most frequently purchased product, finding a related product to recommend, and sending a personalized WhatsApp message using the Rapiwa API. It also verifies whether the user's number is WhatsApp-enabled before sending, and logs both successful and unsuccessful attempts to Google Sheets for tracking. What This Workflow Does Retrieves all paying customers from your WooCommerce store Identifies each customer's most purchased product Finds the latest product in the same category as their most purchased item Cleans and verifies customer phone numbers for WhatsApp compatibility Sends personalized WhatsApp messages with product recommendations Logs all activities to Google Sheets for tracking and analysis Handles both verified and unverified numbers appropriately Key Features Customer Segmentation: Automatically identifies paying customers from your WooCommerce store Product Analysis: Determines each customer's most purchased product Smart Recommendations: Finds the latest products in the same category as customer favorites WhatsApp Integration: Uses Rapiwa API for message delivery Phone Number Validation: Verifies WhatsApp numbers before sending messages Dual Logging System: Tracks both successful and failed message attempts in Google Sheets Rate Limiting: Uses batching and wait nodes to prevent API overload Personalized Messaging: Includes customer name and product details in messages Requirements WooCommerce store with API access Rapiwa account with API access for WhatsApp verification and messaging Google account with Sheets access Customer phone numbers in WooCommerce (stored in billing.phone field) How to Use β Step-by-Step Setup Credentials Setup WooCommerce API: Configure WooCommerce API credentials in n8n (e.g., "WooCommerce (get customer)" and "WooCommerce (get customer data)") Rapiwa Bearer Auth: Create an HTTP Bearer credential with your Rapiwa API token Google Sheets OAuth2: Set up OAuth2 credentials for Google Sheets access Configure Google Sheets Ensure your sheet has the required columns as specified in the Google Sheet Column Structure section Verify Code Nodes Code (get paying_customer): Filters customers to include only those who have made purchases Get most buy product id & Clear Number: Identifies the most purchased product and cleans phone numbers Configure HTTP Request Nodes Get customer data: Verify the WooCommerce API endpoint for retrieving customer orders Get specific product data: Verify the WooCommerce API endpoint for product details Get specific product recommend latest product: Verify the WooCommerce API endpoint for finding latest products by category Check valid WhatsApp number Using Rapiwa: Verify the Rapiwa endpoint for WhatsApp number validation Rapiwa Sender: Verify the Rapiwa endpoint for sending messages Google Sheet Required Columns Youβll need two Google Sheets (or two tabs in one spreadsheet): A Google Sheet formatted like this β€ sample The workflow uses a Google Sheet with the following columns to track coupon distribution: Both must have the following headers (match exactly): | name | number | email | address1 | price | suk | title | product link | validity | staus | | ---------- | ------------- | ----------------------------------------------- | ----------- | ----- | --- | ---------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------ | ---------- | -------- | | Abdul Mannan | 8801322827799 | contact@spagreen.net | mirpur dohs | 850 | | Sharp Most Demanding Hoodie x Nike | https://yourshopdomain/p-img-nike | verified | sent | | Abdul Mannan | 8801322827799 | contact@spagreen.net | mirpur dohs | 850 | | Sharp Most Demanding Hoodie x Nike | https://yourshopdomain/p-img-nike | unverified | not sent | | Abdul Mannan | 8801322827799 | contact@spagreen.net | mirpur dohs | 850 | | Sharp Most Demanding Hoodie x Nike | https://yourshopdomain/p-img-nike | verified | sent | Important Notes Phone Number Format: The workflow cleans phone numbers by removing all non-digit characters. Ensure your WooCommerce phone numbers are in a compatible format. API Rate Limits: Rapiwa and WooCommerce APIs have rate limits. Adjust batch sizes and wait times accordingly. Data Privacy: Ensure compliance with data protection regulations when sending marketing messages. Error Handling: The workflow logs unverified numbers but doesn't have extensive error handling. Consider adding error notifications for failed API calls. Product Availability: The workflow recommends the latest product in a category, but doesn't check if it's in stock. Consider adding stock status verification. Testing: Always test with a small batch before running the workflow on your entire customer list. Useful Links Dashboard: https://app.rapiwa.com Official Website: https://rapiwa.com Documentation: https://docs.rapiwa.com Support & Help WhatsApp: Chat on WhatsApp Discord: SpaGreen Community Facebook Group: SpaGreen Support Website: https://spagreen.net Developer Portfolio: Codecanyon SpaGreen
Track SDK documentation drift with GitHub, Notion, Google Sheets, and Slack
π Description Automatically track SDK releases from GitHub, compare documentation freshness in Notion, and send Slack alerts when docs lag behind. This workflow ensures documentation stays in sync with releases, improves visibility, and reduces version drift across teams. πππ¬ What This Template Does Step 1: Listens to GitHub repository events to detect new SDK releases. π§© Step 2: Fetches release metadata including version, tag, and publish date. π¦ Step 3: Logs release data into Google Sheets for record-keeping and analysis. π Step 4: Retrieves FAQ or documentation data from Notion. π Step 5: Merges GitHub and Notion data to calculate documentation drift. π Step 6: Flags SDKs whose documentation is over 30 days out of date. β οΈ Step 7: Sends detailed Slack alerts to notify responsible teams. π Key Benefits β Keeps SDK documentation aligned with product releases β Prevents outdated information from reaching users β Provides centralized release tracking in Google Sheets β Sends real-time Slack alerts for overdue updates β Strengthens DevRel and developer experience operations Features GitHub release trigger for real-time monitoring Google Sheets logging for tracking and auditing Notion database integration for documentation comparison Automated drift calculation (days since last update) Slack notifications for overdue documentation Requirements GitHub OAuth2 credentials Notion API credentials Google Sheets OAuth2 credentials Slack Bot token with chat:write permissions Target Audience Developer Relations (DevRel) and SDK engineering teams Product documentation and technical writing teams Project managers tracking SDK and doc release parity Step-by-Step Setup Instructions Connect your GitHub account and select your SDK repository. Replace YOURGOOGLESHEETID and YOURSHEET_GID with your tracking spreadsheet. Add your Notion FAQ database ID. Configure your Slack channel ID for alerts. Run once manually to validate setup, then enable automation.