Evaluation metric example: categorization
AI evaluation in n8n
This is a template for n8n's evaluation feature.
Evaluation is a technique for getting confidence that your AI workflow performs reliably, by running a test dataset containing different inputs through the workflow.
By calculating a metric (score) for each input, you can see where the workflow is performing well and where it isn't.
How it works
This template shows how to calculate a workflow evaluation metric: whether a category matches the expected one.
The workflow takes support tickets and generates a category and priority, which is then compared with the correct answers in the dataset.
- We use an evaluation trigger to read in our dataset
- It is wired up in parallel with the regular trigger so that the workflow can be started from either one. More info
- Once the category is generated by the agent, we check whether it matches the expected one in the dataset
- Finally we pass this information back to n8n as a metric
n8n AI Agent for Evaluation Metric Example Categorization
This n8n workflow demonstrates how to use an AI Agent to categorize examples for evaluation metrics. It's designed to process input data, apply AI-driven categorization, and then prepare the results for evaluation.
What it does
This workflow automates the following steps:
- Receives Evaluation Data: It starts by listening for incoming data, likely representing a dataset row to be evaluated.
- Prepares Input for AI: An "Edit Fields" node is used to transform or set the necessary input fields for the AI agent. This could involve extracting specific text or context from the incoming data.
- Applies AI Categorization: An "AI Agent" node, powered by an "OpenAI Chat Model", processes the prepared input. This agent is configured to perform categorization based on a "Structured Output Parser", ensuring the output is in a predictable format (e.g., JSON).
- Responds to Webhook: The categorized output from the AI Agent is then used to respond to the initial webhook, providing the AI's categorization result back to the caller.
- Records Evaluation Metrics: Finally, an "Evaluation" node captures and records the output, allowing for the assessment of the AI agent's performance against defined metrics.
Prerequisites/Requirements
To use this workflow, you will need:
- n8n Instance: A running instance of n8n.
- OpenAI API Key: Configured as a credential in your n8n instance for the "OpenAI Chat Model" node.
- LangChain Nodes: Ensure the
@n8n/n8n-nodes-langchainpackage is installed in your n8n instance, as it provides the "AI Agent", "OpenAI Chat Model", and "Structured Output Parser" nodes.
Setup/Usage
- Import the Workflow: Download the provided JSON and import it into your n8n instance.
- Configure Credentials:
- Set up your OpenAI API Key as a credential within n8n.
- Activate the Webhook: The "Webhook" node will generate a unique URL when the workflow is activated. This URL is where you will send your evaluation data.
- Configure "Edit Fields": Adjust the "Edit Fields" node (ID: 38) to correctly format the incoming data for the AI agent. This might involve renaming fields or creating new ones based on the AI agent's expected input.
- Configure "AI Agent" and "OpenAI Chat Model":
- Ensure the "OpenAI Chat Model" node (ID: 1153) is correctly linked to your OpenAI API Key credential.
- Review the configuration of the "AI Agent" (ID: 1119) and "Structured Output Parser" (ID: 1179) to match your specific categorization requirements. The "Structured Output Parser" will define the expected JSON schema for the AI's output.
- Test the Workflow: Send a sample dataset row to the webhook URL to test the categorization process.
- Monitor Evaluation: The "Evaluation" node (ID: 1301) will capture the results, which can then be used to analyze the performance of your AI categorization.
Related Templates
Automated YouTube video uploads with 12h interval scheduling in JST
This workflow automates a batch upload of multiple videos to YouTube, spacing each upload 12 hours apart in Japan Standard Time (UTC+9) and automatically adding them to a playlist. ⚙️ Workflow Logic Manual Trigger — Starts the workflow manually. List Video Files — Uses a shell command to find all .mp4 files under the specified directory (/opt/downloads/单词卡/A1-A2). Sort and Generate Items — Sorts videos by day number (dayXX) extracted from filenames and assigns a sequential order value. Calculate Publish Schedule (+12h Interval) — Computes the next rounded JST hour plus a configurable buffer (default 30 min). Staggers each video’s scheduled time by order × 12 hours. Converts JST back to UTC for YouTube’s publishAt field. Split in Batches (1 per video) — Iterates over each video item. Read Video File — Loads the corresponding video from disk. Upload to YouTube (Scheduled) — Uploads the video privately with the computed publishAtUtc. Add to Playlist — Adds the newly uploaded video to the target playlist. 🕒 Highlights Timezone-safe: Pure UTC ↔ JST conversion avoids double-offset errors. Sequential scheduling: Ensures each upload is 12 hours apart to prevent clustering. Customizable: Change SPANHOURS, BUFFERMIN, or directory paths easily. Retry-ready: Each upload and playlist step has retry logic to handle transient errors. 💡 Typical Use Cases Multi-part educational video series (e.g., A1–A2 English learning). Regular content release cadence without manual scheduling. Automated YouTube publishing pipelines for pre-produced content. --- Author: Zane Category: Automation / YouTube / Scheduler Timezone: JST (UTC+09:00)
Detect holiday conflicts & suggest meeting reschedules with Google Calendar and Slack
Who’s it for Remote and distributed teams that schedule across time zones and want to avoid meetings landing on public holidays—PMs, CS/AM teams, and ops leads who own cross-regional calendars. What it does / How it works The workflow checks next week’s Google Calendar events, compares event dates against public holidays for selected country codes, and produces a single Slack digest with any conflicts plus suggested alternative dates. Core steps: Workflow Configuration (Set) → Fetch Public Holidays (via a public holiday API such as Calendarific/Nager.Date) → Get Next Week Calendar Events (Google Calendar) → Detect Holiday Conflicts (compare dates) → Generate Reschedule Suggestions (find nearest business day that isn’t a holiday/weekend) → Format Slack Digest → Post Slack Digest. How to set up Open Workflow Configuration (Set) and edit: countryCodes, calendarId, slackChannel, nextWeekStart, nextWeekEnd. Connect your own Google Calendar and Slack credentials in n8n (no hardcoded keys). (Optional) Adjust the Trigger to run daily or only on Mondays. Requirements n8n (Cloud or self-hosted) Google Calendar read access to the target calendar Slack app with permission to post to the chosen channel A public-holiday API (no secrets needed for Nager.Date; Calendarific requires an API key) How to customize the workflow Time window: Change nextWeekStart/End to scan a different period. Holiday sources: Add or swap APIs; merge multiple regions. Suggestion logic: Tweak the look-ahead window or rules (e.g., skip Fridays). Output: Post per-calendar messages, DM owners, or create tentative reschedule events automatically.
Automate Reddit brand monitoring & responses with GPT-4o-mini, Sheets & Slack
How it Works This workflow automates intelligent Reddit marketing by monitoring brand mentions, analyzing sentiment with AI, and engaging authentically with communities. Every 24 hours, the system searches Reddit for posts containing your configured brand keywords across all subreddits, finding up to 50 of the newest mentions to analyze. Each discovered post is sent to OpenAI's GPT-4o-mini model for comprehensive analysis. The AI evaluates sentiment (positive/neutral/negative), assigns an engagement score (0-100), determines relevance to your brand, and generates contextual, helpful responses that add genuine value to the conversation. It also classifies the response type (educational/supportive/promotional) and provides reasoning for whether engagement is appropriate. The workflow intelligently filters posts using a multi-criteria system: only posts that are relevant to your brand, score above 60 in engagement quality, and warrant a response type other than "pass" proceed to engagement. This prevents spam and ensures every interaction is meaningful. Selected posts are processed one at a time through a loop to respect Reddit's rate limits. For each worthy post, the AI-generated comment is posted, and complete interaction data is logged to Google Sheets including timestamp, post details, sentiment, engagement scores, and success status. This creates a permanent audit trail and analytics database. At the end of each run, the workflow aggregates all data into a comprehensive daily summary report with total posts analyzed, comments posted, engagement rate, sentiment breakdown, and the top 5 engagement opportunities ranked by score. This report is automatically sent to Slack with formatted metrics, giving your team instant visibility into your Reddit marketing performance. --- Who is this for? Brand managers and marketing teams needing automated social listening and engagement on Reddit Community managers responsible for authentic brand presence across multiple subreddits Startup founders and growth marketers who want to scale Reddit marketing without hiring a team PR and reputation teams monitoring brand sentiment and responding to discussions in real-time Product marketers seeking organic engagement opportunities in product-related communities Any business that wants to build authentic Reddit presence while avoiding spammy marketing tactics --- Setup Steps Setup time: Approx. 30-40 minutes (credential configuration, keyword setup, Google Sheets creation, Slack integration) Requirements: Reddit account with OAuth2 application credentials (create at reddit.com/prefs/apps) OpenAI API key with GPT-4o-mini access Google account with a new Google Sheet for tracking interactions Slack workspace with posting permissions to a marketing/monitoring channel Brand keywords and subreddit strategy prepared Create Reddit OAuth Application: Visit reddit.com/prefs/apps, create a "script" type app, and obtain your client ID and secret Configure Reddit Credentials in n8n: Add Reddit OAuth2 credentials with your app credentials and authorize access Set up OpenAI API: Obtain API key from platform.openai.com and configure in n8n OpenAI credentials Create Google Sheet: Set up a new sheet with columns: timestamp, postId, postTitle, subreddit, postUrl, sentiment, engagementScore, responseType, commentPosted, reasoning Configure these nodes: Brand Keywords Config: Edit the JavaScript code to include your brand name, product names, and relevant industry keywords Search Brand Mentions: Adjust the limit (default 50) and sort preference based on your needs AI Post Analysis: Customize the prompt to match your brand voice and engagement guidelines Filter Engagement-Worthy: Adjust the engagementScore threshold (default 60) based on your quality standards Loop Through Posts: Configure max iterations and batch size for rate limit compliance Log to Google Sheets: Replace YOURSHEETID with your actual Google Sheets document ID Send Slack Report: Replace YOURCHANNELID with your Slack channel ID Test the workflow: Run manually first to verify all connections work and adjust AI prompts Activate for daily runs: Once tested, activate the Schedule Trigger to run automatically every 24 hours --- Node Descriptions (10 words each) Daily Marketing Check - Schedule trigger runs workflow every 24 hours automatically daily Brand Keywords Config - JavaScript code node defining brand keywords to monitor Reddit Search Brand Mentions - Reddit node searches all subreddits for brand keyword mentions AI Post Analysis - OpenAI analyzes sentiment, relevance, generates contextual helpful comment responses Filter Engagement-Worthy - Conditional node filters only high-quality relevant posts worth engaging Loop Through Posts - Split in batches processes each post individually respecting limits Post Helpful Comment - Reddit node posts AI-generated comment to worthy Reddit discussions Log to Google Sheets - Appends all interaction data to spreadsheet for permanent tracking Generate Daily Summary - JavaScript aggregates metrics, sentiment breakdown, generates comprehensive daily report Send Slack Report - Posts formatted daily summary with metrics to team Slack channel