Daily RAG research paper hub with arXiv, Gemini AI, and Notion

696 views

2/3/2026

Customer Management Subscription Management E-commerce CRM API Automation

Fetch user-specific research papers from arXiv on a daily schedule, process and structure the data, and create or update entries in a Notion database, with support for data delivery

Paper Topic: single query keyword
Update Frequency: Daily updates, with fewer than 20 entries expected per day
Tools:
- Platform: n8n, for end-to-end workflow configuration
- AI Model: Gemini-2.5-Flash, for daily paper summarization and data processing
- Database: Notion, with two tables — Daily Paper Summary and Paper Details
- Message: Feishu (IM bot notifications), Gmail (email notifications)

1. Data Retrieval

arXiv API

The arXiv provides a public API that allows users to query research papers by topic or by predefined categories.

arXiv API User Manual

Key Notes:

Response Format: The API returns data as a typical Atom Response.
Timezone & Update Frequency:
- The arXiv submission process operates on a 24-hour cycle.
- Newly submitted articles become available in the API only at midnight after they have been processed.
- Feeds are updated daily at midnight Eastern Standard Time (EST).
- Therefore, a single request per day is sufficient.
Request Limits:
- The maximum number of results per call (max_results) is 30,000,
- Results must be retrieved in slices of at most 2,000 at a time, using the max_results and start query parameters.
Time Format:
- The expected format is [YYYYMMDDTTTT+TO+YYYYMMDDTTTT],
- TTTT is provided in 24-hour time to the minute, in GMT.

Scheduled Task

Execution Frequency: Daily
Execution Time: 6:00 AM
Time Parameter Handling (JS):
According to arXiv’s update rules, the scheduled task should query the previous day’s (T-1) submittedDate data.

2. Data Extraction

Data Cleaning Rules (Convert to Standard JSON)

Remove Header
- Keep only the 【entry】【/entry】 blocks representing paper items.
Single Item
- Each 【entry】【/entry】 represents a single item.
Field Processing Rules
- 【id】【/id】 ➡️ id
  Extract content.
  Example:
  【id】http://arxiv.org/abs/2409.06062v1【/id】 → http://arxiv.org/abs/2409.06062v1
- 【updated】【/updated】 ➡️ updated
  Convert timestamp to yyyy-mm-dd hh:mm:ss
- 【published】【/published】 ➡️ published
  Convert timestamp to yyyy-mm-dd hh:mm:ss
- 【title】【/title】 ➡️ title
  Extract text content
- 【summary】【/summary】 ➡️ summary
  Keep text, remove line breaks
- 【author】【/author】 ➡️ author
  Combine all authors into an array
  Example: [ "Ernest Pusateri", "Anmol Walia" ] (for Notion multi-select field)
- 【arxiv:comment】【/arxiv:comment】 ➡️ Ignore / discard
- 【link type="text/html"】 ➡️ html_url
  Extract URL
- 【link type="application/pdf"】 ➡️ pdf_url
  Extract URL
- 【arxiv:primary_category term="cs.CL"】 ➡️ primary_category
  Extract term value
- 【category】 ➡️ category
  Merge all 【category】 values into an array
  Example: [ "eess.AS", "cs.SD" ] (for Notion multi-select field)
Add Empty Fields
- github
- huggingface

3. Data Processing

Analyze and summarize paper data using AI, then standardize output as JSON.

Single Paper Basic Information Analysis and Enhancement
Daily Paper Summary and Multilingual Translation

4. Data Storage: Notion Database

Create a corresponding database in Notion with the same predefined field names.
In Notion, create an integration under Integrations and grant access to the database. Obtain the corresponding Secret Key.
Use the Notion "Create a database page" node to configure the field mapping and store the data.

Notes

"Create a database page" only adds new entries; data will not be updated.
The updated and published timestamps of arXiv papers are in UTC.
Notion single-select and multi-select fields only accept arrays. They do not automatically parse comma-separated strings. You need to format them as proper arrays.
Notion does not accept null values, which causes a 400 error.

5. Data Delivery

Set up two channels for message delivery: EMAIL and IM, and define the message format and content.

Email: Gmail

GMAIL OAuth 2.0 – Official Documentation
Configure your OAuth consent screen

Steps:

Enable Gmail API
Create OAuth consent screen
Create OAuth client credentials
Audience: Add Test users under Testing status

Message format: HTML
(Model: OpenAI GPT — used to design an HTML email template)

IM: Feishu (LARK)

Bots in groups
Use bots in groups

Daily RAG Research Paper Hub with ArXiv, Gemini AI, and Notion

This n8n workflow automates the process of discovering, summarizing, and organizing daily research papers from ArXiv using Google Gemini AI, and storing them in Notion. It acts as a personalized Research Paper Hub, keeping you updated with relevant academic literature.

What it does

This workflow performs the following key steps:

Scheduled Trigger: The workflow is initiated on a regular schedule (e.g., daily).
Fetch ArXiv Papers: It makes an HTTP request to the ArXiv API to retrieve the latest research papers.
Process and Filter Papers:
- It uses a "Switch" node to route papers based on certain criteria (the exact criteria are not specified in the JSON, but would be configured in the node).
- An "If" node further filters the papers, likely based on relevance or keywords (again, specific conditions are not detailed in the JSON).
Summarize with Google Gemini AI: For each filtered paper, it utilizes the Google Gemini Chat Model (via a Basic LLM Chain) to generate a concise summary.
Store in Notion: The summarized research paper, along with its details, is then created as an item in a specified Notion database.
Email Notification (Optional/Conditional): Depending on the filtering logic, it can send an email notification via Gmail, potentially for highly relevant papers or a summary of the day's findings.
Code Execution (Optional): A "Code" node is present, suggesting custom logic can be applied at various stages, such as data transformation, advanced filtering, or formatting before Notion integration.
Sticky Note: A "Sticky Note" node is included for documentation or temporary notes within the workflow.

Prerequisites/Requirements

To use this workflow, you will need:

n8n Instance: A running n8n instance.
ArXiv API: No direct API key is typically needed for public ArXiv API access, but familiarity with its query parameters is useful.
Google Gemini AI Credentials: An API key or credentials for accessing the Google Gemini Chat Model.
Notion Account: A Notion account with a pre-configured database where the research papers will be stored. You will need to create an integration and grant it access to your database.
Gmail Account (Optional): If you wish to send email notifications, a configured Gmail credential in n8n.

Setup/Usage

Import the Workflow: Import the provided JSON into your n8n instance.
Configure Credentials:
- Google Gemini Chat Model: Set up your Google Gemini credentials in the "Google Gemini Chat Model" node.
- Notion: Set up your Notion credentials in the "Notion" node and specify the database ID where you want to store the papers.
- Gmail (Optional): Configure your Gmail credentials if you plan to use the email notification feature.
Customize ArXiv Request: In the "HTTP Request" node, adjust the ArXiv API URL and query parameters to fetch papers relevant to your research interests (e.g., keywords, categories, authors).
Refine Filtering Logic:
- Switch Node: Configure the conditions in the "Switch" node to categorize or route papers based on your needs.
- If Node: Define the conditions in the "If" node to filter papers for summarization and Notion storage.
Adjust Gemini Prompt: In the "Basic LLM Chain" node, you might want to refine the prompt passed to the Google Gemini Chat Model to get summaries in a specific format or with particular focus.
Notion Database Structure: Ensure your Notion database has the necessary properties (e.g., "Title", "Summary", "ArXiv Link", "Date Published") to store the extracted information.
Schedule the Workflow: Configure the "Schedule Trigger" node to run the workflow at your desired interval (e.g., once daily).
Activate the Workflow: Once configured, activate the workflow to start automating your research paper discovery!

Related Templates

Automate invoice processing with OCR, GPT-4 & Salesforce opportunity creation

PDF Invoice Extractor (AI) End-to-end pipeline: Watch Drive ➜ Download PDF ➜ OCR text ➜ AI normalize to JSON ➜ Upsert Buyer (Account) ➜ Create Opportunity ➜ Map Products ➜ Create OLI via Composite API ➜ Archive to OneDrive. --- Node by node (what it does & key setup) 1) Google Drive Trigger Purpose: Fire when a new file appears in a specific Google Drive folder. Key settings: Event: fileCreated Folder ID: google drive folder id Polling: everyMinute Creds: googleDriveOAuth2Api Output: Metadata { id, name, ... } for the new file. --- 2) Download File From Google Purpose: Get the file binary for processing and archiving. Key settings: Operation: download File ID: ={{ $json.id }} Creds: googleDriveOAuth2Api Output: Binary (default key: data) and original metadata. --- 3) Extract from File Purpose: Extract text from PDF (OCR as needed) for AI parsing. Key settings: Operation: pdf OCR: enable for scanned PDFs (in options) Output: JSON with OCR text at {{ $json.text }}. --- 4) Message a model (AI JSON Extractor) Purpose: Convert OCR text into strict normalized JSON array (invoice schema). Key settings: Node: @n8n/n8n-nodes-langchain.openAi Model: gpt-4.1 (or gpt-4.1-mini) Message role: system (the strict prompt; references {{ $json.text }}) jsonOutput: true Creds: openAiApi Output (per item): $.message.content → the parsed JSON (ensure it’s an array). --- 5) Create or update an account (Salesforce) Purpose: Upsert Buyer as Account using an external ID. Key settings: Resource: account Operation: upsert External Id Field: taxid_c External Id Value: ={{ $json.message.content.buyer.tax_id }} Name: ={{ $json.message.content.buyer.name }} Creds: salesforceOAuth2Api Output: Account record (captures Id) for downstream Opportunity. --- 6) Create an opportunity (Salesforce) Purpose: Create Opportunity linked to the Buyer (Account). Key settings: Resource: opportunity Name: ={{ $('Message a model').item.json.message.content.invoice.code }} Close Date: ={{ $('Message a model').item.json.message.content.invoice.issue_date }} Stage: Closed Won Amount: ={{ $('Message a model').item.json.message.content.summary.grand_total }} AccountId: ={{ $json.id }} (from Upsert Account output) Creds: salesforceOAuth2Api Output: Opportunity Id for OLI creation. --- 7) Build SOQL (Code / JS) Purpose: Collect unique product codes from AI JSON and build a SOQL query for PricebookEntry by Pricebook2Id. Key settings: pricebook2Id (hardcoded in script): e.g., 01sxxxxxxxxxxxxxxx Source lines: $('Message a model').first().json.message.content.products Output: { soql, codes } --- 8) Query PricebookEntries (Salesforce) Purpose: Fetch PricebookEntry.Id for each Product2.ProductCode. Key settings: Resource: search Query: ={{ $json.soql }} Creds: salesforceOAuth2Api Output: Items with Id, Product2.ProductCode (used for mapping). --- 9) Code in JavaScript (Build OLI payloads) Purpose: Join lines with PBE results and Opportunity Id ➜ build OpportunityLineItem payloads. Inputs: OpportunityId: ={{ $('Create an opportunity').first().json.id }} Lines: ={{ $('Message a model').first().json.message.content.products }} PBE rows: from previous node items Output: { body: { allOrNone:false, records:[{ OpportunityLineItem... }] } } Notes: Converts discount_total ➜ per-unit if needed (currently commented for standard pricing). Throws on missing PBE mapping or empty lines. --- 10) Create Opportunity Line Items (HTTP Request) Purpose: Bulk create OLIs via Salesforce Composite API. Key settings: Method: POST URL: https://<your-instance>.my.salesforce.com/services/data/v65.0/composite/sobjects Auth: salesforceOAuth2Api (predefined credential) Body (JSON): ={{ $json.body }} Output: Composite API results (per-record statuses). --- 11) Update File to One Drive Purpose: Archive the original PDF in OneDrive. Key settings: Operation: upload File Name: ={{ $json.name }} Parent Folder ID: onedrive folder id Binary Data: true (from the Download node) Creds: microsoftOneDriveOAuth2Api Output: Uploaded file metadata. --- Data flow (wiring) Google Drive Trigger → Download File From Google Download File From Google → Extract from File → Update File to One Drive Extract from File → Message a model Message a model → Create or update an account Create or update an account → Create an opportunity Create an opportunity → Build SOQL Build SOQL → Query PricebookEntries Query PricebookEntries → Code in JavaScript Code in JavaScript → Create Opportunity Line Items --- Quick setup checklist 🔐 Credentials: Connect Google Drive, OneDrive, Salesforce, OpenAI. 📂 IDs: Drive Folder ID (watch) OneDrive Parent Folder ID (archive) Salesforce Pricebook2Id (in the JS SOQL builder) 🧠 AI Prompt: Use the strict system prompt; jsonOutput = true. 🧾 Field mappings: Buyer tax id/name → Account upsert fields Invoice code/date/amount → Opportunity fields Product name must equal your Product2.ProductCode in SF. ✅ Test: Drop a sample PDF → verify: AI returns array JSON only Account/Opportunity created OLI records created PDF archived to OneDrive --- Notes & best practices If PDFs are scans, enable OCR in Extract from File. If AI returns non-JSON, keep “Return only a JSON array” as the last line of the prompt and keep jsonOutput enabled. Consider adding validation on parsing.warnings to gate Salesforce writes. For discounts/taxes in OLI: Standard OLI fields don’t support per-line discount amounts directly; model them in UnitPrice or custom fields. Replace the Composite API URL with your org’s domain or use the Salesforce node’s Bulk Upsert for simplicity.

By Le Nguyen

942

Synchronizing WooCommerce inventory and creating products with Google Gemini AI and BrowserAct

Synchronize WooCommerce Inventory & Create Products with Gemini AI & BrowserAct This sophisticated n8n template automates WooCommerce inventory management by scraping supplier data, updating existing products, and intelligently creating new ones with AI-formatted descriptions. This workflow is essential for e-commerce operators, dropshippers, and inventory managers who need to ensure their product pricing and stock levels are synchronized with multiple third-party suppliers, minimizing overselling and maximizing profit. --- Self-Hosted Only This Workflow uses a community contribution and is designed and tested for self-hosted n8n instances only. --- How it works The workflow is typically run by a Schedule Trigger (though a Manual Trigger is also shown) to check stock automatically. It reads a list of suppliers and their inventory page URLs from a central Google Sheet. The workflow loops through each supplier: A BrowserAct node scrapes the current stock and price data from the supplier's inventory page. A Code node parses this bulk data into individual product items. It then loops through each individual product found. The workflow checks WooCommerce to see if the product already exists based on its name. If the product exists: It proceeds to update the existing product's price and stock quantity. If the product DOES NOT exist: An If node checks if the missing product's category matches a predefined type (optional filtering). If it passes the filter, a second BrowserAct workflow scrapes detailed product attributes from a dedicated product page (e.g., DigiKey). An AI Agent (Gemini) transforms these attributes into a specific, styled HTML table for the product description. Finally, the product is created in WooCommerce with all scraped details and the AI-generated description. Error Handling: Multiple Slack nodes are configured to alert your team immediately if any scraping task fails or if the product update/creation process encounters an issue. Note: This workflow does not support image uploads for new products. To enable this functionality, you must modify both the n8n and BrowserAct workflows. --- Requirements BrowserAct API account for web scraping BrowserAct n8n Community Node -> (n8n Nodes BrowserAct) BrowserAct templates named “WooCommerce Inventory & Stock Synchronization” and “WooCommerce Product Data Reconciliation” Google Sheets credentials for the supplier list WooCommerce credentials for product management Google Gemini account for the AI Agent Slack credentials for error alerts --- Need Help? How to Find Your BrowseAct API Key & Workflow ID How to Connect n8n to Browseract How to Use & Customize BrowserAct Templates How to Use the BrowserAct N8N Community Node --- Workflow Guidance and Showcase STOP Overselling! Auto-Sync WooCommerce Inventory from ANY Supplier

By Madame AI Team | Kai

600

Tax deadline management & compliance alerts with GPT-4, Google Sheets & Slack

AI-Driven Tax Compliance & Deadline Management System Description Automate tax deadline monitoring with AI-powered insights. This workflow checks your tax calendar daily at 8 AM, uses GPT-4 to analyze upcoming deadlines across multiple jurisdictions, detects overdue and critical items, and sends intelligent alerts via email and Slack only when immediate action is required. Perfect for finance teams and accounting firms who need proactive compliance management without manual tracking. 🏛️🤖📊 Good to Know AI-Powered: GPT-4 provides risk assessment and strategic recommendations Multi-Jurisdiction: Handles Federal, State, and Local tax requirements automatically Smart Alerts: Only notifies executives when deadlines are overdue or critical (≤3 days) Priority Classification: Categorizes deadlines as Overdue, Critical, High, or Medium priority Dual Notifications: Critical alerts to leadership + daily summaries to team channel Complete Audit Trail: Logs all checks and deadlines to Google Sheets for compliance records How It Works Daily Trigger - Runs at 8:00 AM every morning Fetch Data - Pulls tax calendar and company configuration from Google Sheets Analyze Deadlines - Calculates days remaining, filters by jurisdiction/entity type, categorizes by priority AI Analysis - GPT-4 provides strategic insights and risk assessment on upcoming deadlines Smart Routing - Only sends alerts if overdue or critical deadlines exist Critical Alerts - HTML email to executives + Slack alert for urgent items Team Updates - Slack summary to finance channel with all upcoming deadlines Logging - Records compliance check results to Google Sheets for audit trail Requirements Google Sheets Structure Sheet 1: TaxCalendar DeadlineID | DeadlineName | DeadlineDate | Jurisdiction | Category | AssignedTo | IsActive FED-Q1 | Form 1120 Q1 | 2025-04-15 | Federal | Income | John Doe | TRUE Sheet 2: CompanyConfig (single row) Jurisdictions | EntityType | FiscalYearEnd Federal, California | Corporation | 12-31 Sheet 3: ComplianceLog (auto-populated) Date | AlertLevel | TotalUpcoming | CriticalCount | OverdueCount 2025-01-15 | HIGH | 12 | 3 | 1 Credentials Needed Google Sheets - Service Account OAuth2 OpenAI - API Key (GPT-4 access required) SMTP - Email account for sending alerts Slack - Bot Token with chat:write permission Setup Steps Import workflow JSON into n8n Add all 4 credentials Replace these placeholders: YOURTAXCALENDAR_ID - Tax calendar sheet ID YOURCONFIGID - Company config sheet ID YOURLOGID - Compliance log sheet ID C12345678 - Slack channel ID tax@company.com - Sender email cfo@company.com - Recipient email Share all sheets with Google service account email Invite Slack bot to channels Test workflow manually Activate the trigger Customizing This Workflow Change Alert Thresholds: Edit "Analyze Deadlines" node: Critical: Change <= 3 to <= 5 for 5-day warning High: Change <= 7 to <= 14 for 2-week notice Medium: Change <= 30 to <= 60 for 2-month lookout Adjust Schedule: Edit "Daily Tax Check" trigger: Change hour/minute for different run time Add multiple trigger times for tax season (8 AM, 2 PM, 6 PM) Add More Recipients: Edit "Send Email" node: To: cfo@company.com, director@company.com CC: accounting@company.com BCC: archive@company.com Customize Email Design: Edit "Format Email" node to change colors, add logo, or modify layout Add SMS Alerts: Insert Twilio node after "Is Critical" for emergency notifications Integrate Task Management: Add HTTP Request node to create tasks in Asana/Jira for critical deadlines Troubleshooting | Issue | Solution | |-------|----------| | No deadlines found | Check date format (YYYY-MM-DD) and IsActive = TRUE | | AI analysis failed | Verify OpenAI API key and account credits | | Email not sending | Test SMTP credentials and check if critical condition met | | Slack not posting | Invite bot to channel and verify channel ID format | | Permission denied | Share Google Sheets with service account email | 📞 Professional Services Need help with implementation or customization? Our team offers: 🎯 Custom workflow development 🏢 Enterprise deployment support 🎓 Team training sessions 🔧 Ongoing maintenance 📊 Custom reporting & dashboards 🔗 Additional API integrations Discover more workflows – Get in touch with us

By Oneclick AI Squad