Web research assistant: automated search & scraping with Gemini AI and spreadsheet reports

1021 views

2/3/2026

⚠️ IMPORTANT: This template requires self-hosted n8n hosting due to the use of community nodes (MCP tools). It will not work on n8n Cloud. Make sure you have access to a self-hosted n8n instance before using this template.

Overview

Screenshot 20250905 103811.png

This workflow automation allows a Google Gemini-powered AI Agent to orchestrate multi-source web intelligence using MCP (Model Context Protocol) tools such as Firecrawl, Brave Search, and Apify.

The system allows users to interact with the agent in natural language, which then leverages various external data collection tools, processes the results, and automatically organizes them into structured spreadsheets.

With built-in memory, flexible tool execution, and conversational capabilities, this workflow acts as a multi-agent research assistant, capable of retrieving, synthesizing, and delivering actionable insights in real time.

How the system works

AI Agent + MCP Pipeline

User Interaction A chat message is received and forwarded to the AI Agent.
AI Orchestration The agent, powered by Google Gemini, decides which MCP tools to invoke based on the query.
- Firecrawl-MCP: Recursive web crawling and content extraction.
- Brave-MCP: Real-time web search with structured results.
- Apify-MCP: Automation of web scraping tasks with scalable execution.
Memory Management A memory module stores context across conversations, ensuring multi-turn reasoning and task continuity.
Spreadsheet automation Results are structured in a new, automatically created Google Spreadsheet, enriched with formatting and additional metadata.
Data processing The workflow generates the spreadsheet content, updates the sheet, and improves results via HTTP requests and field edits.
Delivery of results Users receive a structured and contextualized dataset ready for review, analysis, or integration into other systems.

Configuration instructions

Estimated setup time: 45 minutes

Prerequisites

Self-hosted n8n instance (v0.200.0 or higher recommended)
Google Gemini API key
MCP-compatible nodes (Firecrawl, Brave, Apify) configured
Google Sheets credentials for spreadsheet automation

Detailed configuration steps

Step 1: Configuring the AI Agent

AI Agent node:
- Select Google Gemini as the LLM model
- Configure your Google Gemini API key in the n8n credentials
- Set the system prompt to guide the agent's behavior
- Connect the Simple Memory node to enable context tracking

Step 2: Integrating MCP Tools

Firecrawl-MCP Configuration:
- Install the @n8n/n8n-nodes-firecrawl-mcp package
- Configure your Firecrawl API key
- Set crawling parameters (depth, CSS selectors)
Brave-MCP configuration:
- Install the @n8n/n8n-nodes-brave-mcp package
- Add your Brave Search API key
- Configure search filters (region, language, SafeSearch)
Apify-MCP configuration:
- Install the @n8n/n8n-nodes-apify-mcp package
- Configure your Apify credentials
- Select the appropriate actors for your use cases

Step 3: Spreadsheet automation

“Create Spreadsheet” node:
- Configure Google Sheets authentication (OAuth2 or Service Account)
- Set the file name with dynamic timestamps
- Specify the destination folder in Google Drive
“Generate Spreadsheet Content” node:
- Transform the agent's outputs into tabular format
- Define the columns: URL, Title, Description, Source, Timestamp
- Configure data formatting (dates, links, metadata)
“Update Spreadsheet” node:
- Insert the data into the created sheet
- Apply automatic formatting (headers, colors, column widths)
- Add summary formulas if necessary

Step 4: Post-processing and delivery

“Data Enrichment Request” node (formerly “HTTP Request1”):
- Configure optional API calls to enrich the data
- Add additional metadata (geolocation, sentiment, categorization)
- Manage errors and timeouts
“Edit Fields” node:
- Refine the final dataset (metadata, tags, filters)
- Clean and normalize the data
- Prepare the final response for the user

Structure of generated Google Sheets

Default columns

| Column | Description | Type | |---------|-------------|------| | URL | Data source URL | Hyperlink | | Title | Page/resource title | Text | | Description | Description or content excerpt | Long text | | Source | MCP tool used (Brave/Firecrawl/Apify) | Text | | Timestamp | Date/time of collection | Date/Time | | Metadata | Additional data (JSON) | Text |

Automatic formatting

Headings: Bold font, colored background
URLs: Formatted as clickable links
Dates: Standardized ISO 8601 format
Columns: Width automatically adjusted to content

Use cases

Business and enterprise

Competitive analysis combining search, crawling, and structured scraping
Market trend research with multi-source aggregation
Automated reporting pipelines for business intelligence

Research and academia

Literature discovery across multiple sources
Data collection for research projects
Automated bibliographic extraction from online sources

Engineering and development

Discovery of APIs and documentation
Aggregation of product information from multiple platforms
Scalable structured scraping for datasets

Personal productivity

Automated creation of newsletters or knowledge hubs
Personal research assistant compiling spreadsheets from various online data

Key features

Multi-source intelligence

Firecrawl for deep crawling
Brave for real-time search
Apify for structured web scraping

AI-driven orchestration

Google Gemini for reasoning and tool selection
Memory for multi-turn interactions
Context-based adaptive workflows

Structured data output

Automatic spreadsheet creation
Data enrichment and formatting
Ready-to-use datasets for reporting

Performance and scalability

Handles multiple simultaneous tool calls
Scalable web data extraction
Real-time aggregation from multiple MCPs

Security and privacy

Secure authentication based on API keys
Data managed in Google Sheets / n8n
Configurable retention and deletion policies

Technical architecture

Workflow

User query → AI agent (Gemini) → MCP tools (Firecrawl / Brave / Apify) → Aggregated results → Spreadsheet creation → Data processing → Results delivery

Supported data types

Text and metadata from crawled web pages
Search results from Brave queries
Structured data from Apify scrapers
Tabular reports via Google Sheets

Integration options

Chat interfaces

Web widget for conversational queries
Slack/Teams chatbot integration
REST API access points

Data sources

Websites (via Firecrawl/Apify)
Search engines (via Brave)
APIs (via HTTP Request enrichment)

Performance specifications

Query response: < 5 seconds (search tasks)
Crawl capacity: Thousands of pages per run
Spreadsheet automation: Real-time creation and updates
Accuracy: > 90% when using combined sources

Advanced configuration options

Customization

Set custom prompts for the AI Agent
Adjust the spreadsheet schema for reporting needs
Configure retries for failed tool runs

Analytics and monitoring

Track tool usage and costs
Monitor crawl and search success rates
Log queries and outputs for auditing

Troubleshooting and support

Timeouts: Manually re-run failed MCP executions
Data gaps: Validate Firecrawl/Apify selectors
Spreadsheet errors: Check Google Sheets API quotas

Web Research Assistant: Automated Search & Scraping with Gemini AI and Spreadsheet Reports

This n8n workflow automates the process of conducting web research, extracting information, summarizing it using Google Gemini AI, and generating a report in Google Sheets. It acts as a powerful research assistant, streamlining data collection and analysis.

What it does

This workflow simplifies and automates the following steps:

Receives Chat Messages: The workflow is triggered by an incoming chat message, likely containing a research query.
Initial Data Preparation: It prepares the incoming chat message for further processing.
AI-Powered Web Research: It leverages an AI Agent (likely configured with web scraping tools) to perform a comprehensive search based on the provided query.
Information Extraction & Summarization: The AI Agent, powered by the Google Gemini Chat Model, extracts relevant information from the search results and summarizes it.
Memory Management: It uses a simple memory buffer to maintain context during the AI interaction, enhancing the quality of the research.
Data Transformation: It processes and formats the AI-generated research summary.
Google Sheet Reporting: The summarized research findings are then written into a Google Sheet, creating a structured report.

Prerequisites/Requirements

To use this workflow, you will need:

n8n Instance: A running instance of n8n.
Google Sheets Account: To store the research reports. You'll need to configure Google Sheets credentials in n8n.
Google Gemini API Key: For the Google Gemini Chat Model to function. You'll need to configure the credential for Google Gemini in n8n.
LangChain AI Agent Configuration: The AI Agent node will require configuration, likely including tools for web scraping (e.g., HTTP Request node) and instructions for summarization.

Setup/Usage

Import the Workflow: Download the provided JSON and import it into your n8n instance.
Configure Credentials:
- Google Sheets: Set up your Google Sheets OAuth2 or API Key credentials in n8n.
- Google Gemini: Configure your Google Gemini API Key credential in n8n.
Configure Nodes:
- Chat Trigger (Node 1247): Ensure this node is correctly configured to listen for chat messages from your desired platform (e.g., Slack, Telegram, Discord, etc.).
- AI Agent (Node 1119): This is the core of the research. You will need to configure:
  - Tools: Define the tools the AI Agent can use for web research (e.g., an HTTP Request node for making web calls, potentially integrated with a scraping service).
  - Instructions: Provide clear instructions to the AI Agent on how to conduct research, extract information, and summarize it.
- Google Sheets (Node 18): Configure this node to specify the spreadsheet ID, sheet name, and the data to be written from the AI's output.
Activate the Workflow: Once configured, activate the workflow.
Send a Chat Message: Send a chat message with your research query to the configured chat platform. The workflow will then execute, perform the research, and update your Google Sheet.

Related Templates

Automate Dutch Public Procurement Data Collection with TenderNed

TenderNed Public Procurement What This Workflow Does This workflow automates the collection of public procurement data from TenderNed (the official Dutch tender platform). It: Fetches the latest tender publications from the TenderNed API Retrieves detailed information in both XML and JSON formats for each tender Parses and extracts key information like organization names, titles, descriptions, and reference numbers Filters results based on your custom criteria Stores the data in a database for easy querying and analysis Setup Instructions This template comes with sticky notes providing step-by-step instructions in Dutch and various query options you can customize. Prerequisites TenderNed API Access - Register at TenderNed for API credentials Configuration Steps Set up TenderNed credentials: Add HTTP Basic Auth credentials with your TenderNed API username and password Apply these credentials to the three HTTP Request nodes: "Tenderned Publicaties" "Haal XML Details" "Haal JSON Details" Customize filters: Modify the "Filter op ..." node to match your specific requirements Examples: specific organizations, contract values, regions, etc. How It Works Step 1: Trigger The workflow can be triggered either manually for testing or automatically on a daily schedule. Step 2: Fetch Publications Makes an API call to TenderNed to retrieve a list of recent publications (up to 100 per request). Step 3: Process & Split Extracts the tender array from the response and splits it into individual items for processing. Step 4: Fetch Details For each tender, the workflow makes two parallel API calls: XML endpoint - Retrieves the complete tender documentation in XML format JSON endpoint - Fetches metadata including reference numbers and keywords Step 5: Parse & Merge Parses the XML data and merges it with the JSON metadata and batch information into a single data structure. Step 6: Extract Fields Maps the raw API data to clean, structured fields including: Publication ID and date Organization name Tender title and description Reference numbers (kenmerk, TED number) Step 7: Filter Applies your custom filter criteria to focus on relevant tenders only. Step 8: Store Inserts the processed data into your database for storage and future analysis. Customization Tips Modify API Parameters In the "Tenderned Publicaties" node, you can adjust: offset: Starting position for pagination size: Number of results per request (max 100) Add query parameters for date ranges, status filters, etc. Add More Fields Extend the "Splits Alle Velden" node to extract additional fields from the XML/JSON data, such as: Contract value estimates Deadline dates CPV codes (procurement classification) Contact information Integrate Notifications Add a Slack, Email, or Discord node after the filter to get notified about new matching tenders. Incremental Updates Modify the workflow to only fetch new tenders by: Storing the last execution timestamp Adding date filters to the API query Only processing publications newer than the last run Troubleshooting No data returned? Verify your TenderNed API credentials are correct Check that you have setup youre filter proper Need help setting this up or interested in a complete tender analysis solution? Get in touch 🔗 LinkedIn – Wessel Bulte

By Wessel Bulte

247

🎓 How to transform unstructured email data into structured format with AI agent

This workflow automates the process of extracting structured, usable information from unstructured email messages across multiple platforms. It connects directly to Gmail, Outlook, and IMAP accounts, retrieves incoming emails, and sends their content to an AI-powered parsing agent built on OpenAI GPT models. The AI agent analyzes each email, identifies relevant details, and returns a clean JSON structure containing key fields: From – sender’s email address To – recipient’s email address Subject – email subject line Summary – short AI-generated summary of the email body The extracted information is then automatically inserted into an n8n Data Table, creating a structured database of email metadata and summaries ready for indexing, reporting, or integration with other tools. --- Key Benefits ✅ Full Automation: Eliminates manual reading and data entry from incoming emails. ✅ Multi-Source Integration: Handles data from different email providers seamlessly. ✅ AI-Driven Accuracy: Uses advanced language models to interpret complex or unformatted content. ✅ Structured Storage: Creates a standardized, query-ready dataset from previously unstructured text. ✅ Time Efficiency: Processes emails in real time, improving productivity and response speed. *✅ Scalability: Easily extendable to handle additional sources or extract more data fields. --- How it works This workflow automates the transformation of unstructured email data into a structured, queryable format. It operates through a series of connected steps: Email Triggering: The workflow is initiated by one of three different email triggers (Gmail, Microsoft Outlook, or a generic IMAP account), which constantly monitor for new incoming emails. AI-Powered Parsing & Structuring: When a new email is detected, its raw, unstructured content is passed to a central "Parsing Agent." This agent uses a specified OpenAI language model to intelligently analyze the email text. Data Extraction & Standardization: Following a predefined system prompt, the AI agent extracts key information from the email, such as the sender, recipient, subject, and a generated summary. It then forces the output into a strict JSON structure using a "Structured Output Parser" node, ensuring data consistency. Data Storage: Finally, the clean, structured data (the from, to, subject, and summarize fields) is inserted as a new row into a specified n8n Data Table, creating a searchable and reportable database of email information. --- Set up steps To implement this workflow, follow these configuration steps: Prepare the Data Table: Create a new Data Table within n8n. Define the columns with the following names and string type: From, To, Subject, and Summary. Configure Email Credentials: Set up the credential connections for the email services you wish to use (Gmail OAuth2, Microsoft Outlook OAuth2, and/or IMAP). Ensure the accounts have the necessary permissions to read emails. Configure AI Model Credentials: Set up the OpenAI API credential with a valid API key. The workflow is configured to use the model, but this can be changed in the respective nodes if needed. Connect the Nodes: The workflow canvas is already correctly wired. Visually confirm that the email triggers are connected to the "Parsing Agent," which is connected to the "Insert row" (Data Table) node. Also, ensure the "OpenAI Chat Model" and "Structured Output Parser" are connected to the "Parsing Agent" as its AI model and output parser, respectively. Activate the Workflow: Save the workflow and toggle the "Active" switch to ON. The triggers will begin polling for new emails according to their schedule (e.g., every minute), and the automation will start processing incoming messages. --- Need help customizing? Contact me for consulting and support or add me on Linkedin.

By Davide

1616

Tax deadline management & compliance alerts with GPT-4, Google Sheets & Slack

AI-Driven Tax Compliance & Deadline Management System Description Automate tax deadline monitoring with AI-powered insights. This workflow checks your tax calendar daily at 8 AM, uses GPT-4 to analyze upcoming deadlines across multiple jurisdictions, detects overdue and critical items, and sends intelligent alerts via email and Slack only when immediate action is required. Perfect for finance teams and accounting firms who need proactive compliance management without manual tracking. 🏛️🤖📊 Good to Know AI-Powered: GPT-4 provides risk assessment and strategic recommendations Multi-Jurisdiction: Handles Federal, State, and Local tax requirements automatically Smart Alerts: Only notifies executives when deadlines are overdue or critical (≤3 days) Priority Classification: Categorizes deadlines as Overdue, Critical, High, or Medium priority Dual Notifications: Critical alerts to leadership + daily summaries to team channel Complete Audit Trail: Logs all checks and deadlines to Google Sheets for compliance records How It Works Daily Trigger - Runs at 8:00 AM every morning Fetch Data - Pulls tax calendar and company configuration from Google Sheets Analyze Deadlines - Calculates days remaining, filters by jurisdiction/entity type, categorizes by priority AI Analysis - GPT-4 provides strategic insights and risk assessment on upcoming deadlines Smart Routing - Only sends alerts if overdue or critical deadlines exist Critical Alerts - HTML email to executives + Slack alert for urgent items Team Updates - Slack summary to finance channel with all upcoming deadlines Logging - Records compliance check results to Google Sheets for audit trail Requirements Google Sheets Structure Sheet 1: TaxCalendar DeadlineID | DeadlineName | DeadlineDate | Jurisdiction | Category | AssignedTo | IsActive FED-Q1 | Form 1120 Q1 | 2025-04-15 | Federal | Income | John Doe | TRUE Sheet 2: CompanyConfig (single row) Jurisdictions | EntityType | FiscalYearEnd Federal, California | Corporation | 12-31 Sheet 3: ComplianceLog (auto-populated) Date | AlertLevel | TotalUpcoming | CriticalCount | OverdueCount 2025-01-15 | HIGH | 12 | 3 | 1 Credentials Needed Google Sheets - Service Account OAuth2 OpenAI - API Key (GPT-4 access required) SMTP - Email account for sending alerts Slack - Bot Token with chat:write permission Setup Steps Import workflow JSON into n8n Add all 4 credentials Replace these placeholders: YOURTAXCALENDAR_ID - Tax calendar sheet ID YOURCONFIGID - Company config sheet ID YOURLOGID - Compliance log sheet ID C12345678 - Slack channel ID tax@company.com - Sender email cfo@company.com - Recipient email Share all sheets with Google service account email Invite Slack bot to channels Test workflow manually Activate the trigger Customizing This Workflow Change Alert Thresholds: Edit "Analyze Deadlines" node: Critical: Change <= 3 to <= 5 for 5-day warning High: Change <= 7 to <= 14 for 2-week notice Medium: Change <= 30 to <= 60 for 2-month lookout Adjust Schedule: Edit "Daily Tax Check" trigger: Change hour/minute for different run time Add multiple trigger times for tax season (8 AM, 2 PM, 6 PM) Add More Recipients: Edit "Send Email" node: To: cfo@company.com, director@company.com CC: accounting@company.com BCC: archive@company.com Customize Email Design: Edit "Format Email" node to change colors, add logo, or modify layout Add SMS Alerts: Insert Twilio node after "Is Critical" for emergency notifications Integrate Task Management: Add HTTP Request node to create tasks in Asana/Jira for critical deadlines Troubleshooting | Issue | Solution | |-------|----------| | No deadlines found | Check date format (YYYY-MM-DD) and IsActive = TRUE | | AI analysis failed | Verify OpenAI API key and account credits | | Email not sending | Test SMTP credentials and check if critical condition met | | Slack not posting | Invite bot to channel and verify channel ID format | | Permission denied | Share Google Sheets with service account email | 📞 Professional Services Need help with implementation or customization? Our team offers: 🎯 Custom workflow development 🏢 Enterprise deployment support 🎓 Team training sessions 🔧 Ongoing maintenance 📊 Custom reporting & dashboards 🔗 Additional API integrations Discover more workflows – Get in touch with us

By Oneclick AI Squad