Evaluate AI agent response relevance using OpenAI and cosine similarity

1034 views

2/3/2026

Export Appwrite Data Transformation Couchbase Capella CSV

This n8n template demonstrates how to calculate the evaluation metric "Relevance" which in this scenario, measures the relevance of the agent's response to the user's question.

The scoring approach is adapted from the open-source evaluations project RAGAS and you can see the source here https://github.com/explodinggradients/ragas/blob/main/ragas/src/ragas/metrics/_answer_relevance.py

How it works

This evaluation works best for Q&A agents.
For our scoring, we analyse the agent's response and ask another AI to generate a question from it. This generated question is then compared to the original question using cosine similarity.
A high score indicates relevance and the agent's successful ability to answer the question whereas a low score means agent may have added too much irrelevant info, went off script or hallucinated.

Requirements

n8n version 1.94+
Check out this Google Sheet for a sample data https://docs.google.com/spreadsheets/d/1YOnu2JJjlxd787AuYcg-wKbkjyjyZFgASYVV0jsij5Y/edit?usp=sharing

Evaluate AI Agent Response Relevance using OpenAI and Cosine Similarity

This n8n workflow automates the process of evaluating the relevance of AI agent responses using OpenAI's embedding models and cosine similarity. It's designed to help assess the quality of AI-generated content against a predefined "golden answer" or expected response.

What it does

This workflow performs the following key steps:

Triggers Evaluation: Initiates the workflow for each row in a dataset, typically containing a user query, an AI agent's response, and a "golden answer" for comparison.
Sets up Evaluation Fields: Prepares the data, extracting the AI agent's response and the golden answer for subsequent processing.
Generates OpenAI Embeddings:
- Sends the AI agent's response to OpenAI to generate a vector embedding.
- Sends the golden answer to OpenAI to generate a vector embedding.
Calculates Cosine Similarity: Uses custom JavaScript code to compute the cosine similarity between the two generated embeddings. This metric indicates how semantically similar the AI agent's response is to the golden answer.
Records Evaluation Metrics: Stores the calculated cosine similarity score as an evaluation metric, allowing for quantitative analysis of response relevance.

Prerequisites/Requirements

To use this workflow, you will need:

n8n Instance: A running n8n instance.
OpenAI API Key: An API key for OpenAI with access to embedding models (e.g., text-embedding-ada-002). This key needs to be configured as an n8n credential.
Evaluation Dataset: A dataset (e.g., CSV, Google Sheet, database) containing at least:
- ai_agent_response: The response generated by the AI agent.
- golden_answer: The expected or ideal response for the given query.

Setup/Usage

Import the Workflow:
- Download the provided JSON file for this workflow.
- In your n8n instance, click "Workflows" in the left sidebar.
- Click "New" -> "Import from JSON" and upload the downloaded JSON file.
Configure Credentials:
- Locate the "OpenAI Chat Model" nodes (1153) within the workflow.
- Ensure they are configured with your OpenAI API Key credential. If not, create a new OpenAI API credential and select it for these nodes.
Configure the Evaluation Trigger:
- The "When fetching a dataset row" (1300) node is an Evaluation Trigger. This node is designed to be used within n8n's evaluation framework.
- When running an evaluation, you will select your dataset and map the relevant columns to ai_agent_response and golden_answer as expected by the "Edit Fields" (38) node.
Activate the Workflow:
- Once configured, activate the workflow by toggling the "Active" switch in the top right corner of the workflow editor.
Run an Evaluation:
- Navigate to the "Evaluations" section in n8n and create a new evaluation.
- Select this workflow and your dataset.
- Map the input fields (ai_agent_response, golden_answer) from your dataset to the corresponding fields in the "Edit Fields" node.
- Run the evaluation to get relevance scores for your AI agent's responses.

Related Templates

Generate song lyrics and music from text prompts using OpenAI and Fal.ai Minimax

Spark your creativity instantly in any chat—turn a simple prompt like "heartbreak ballad" into original, full-length lyrics and a professional AI-generated music track, all without leaving your conversation. 📋 What This Template Does This chat-triggered workflow harnesses AI to generate detailed, genre-matched song lyrics (at least 600 characters) from user messages, then queues them for music synthesis via Fal.ai's minimax-music model. It polls asynchronously until the track is ready, delivering lyrics and audio URL back in chat. Crafts original, structured lyrics with verses, choruses, and bridges using OpenAI Submits to Fal.ai for melody, instrumentation, and vocals aligned to the style Handles long-running generations with smart looping and status checks Returns complete song package (lyrics + audio link) for seamless sharing 🔧 Prerequisites n8n account (self-hosted or cloud with chat integration enabled) OpenAI account with API access for GPT models Fal.ai account for AI music generation 🔑 Required Credentials OpenAI API Setup Go to platform.openai.com → API keys (sidebar) Click "Create new secret key" → Name it (e.g., "n8n Songwriter") Copy the key and add to n8n as "OpenAI API" credential type Test by sending a simple chat completion request Fal.ai HTTP Header Auth Setup Sign up at fal.ai → Dashboard → API Keys Generate a new API key → Copy it In n8n, create "HTTP Header Auth" credential: Name="Fal.ai", Header Name="Authorization", Header Value="Key [Your API Key]" Test with a simple GET to their queue endpoint (e.g., /status) ⚙️ Configuration Steps Import the workflow JSON into your n8n instance Assign OpenAI API credentials to the "OpenAI Chat Model" node Assign Fal.ai HTTP Header Auth to the "Generate Music Track", "Check Generation Status", and "Fetch Final Result" nodes Activate the workflow—chat trigger will appear in your n8n chat interface Test by messaging: "Create an upbeat pop song about road trips" 🎯 Use Cases Content Creators: YouTubers generating custom jingles for videos on the fly, streamlining production from idea to audio export Educators: Music teachers using chat prompts to create era-specific folk tunes for classroom discussions, fostering interactive learning Gift Personalization: Friends crafting anniversary R&B tracks from shared memories via quick chats, delivering emotional audio surprises Artist Brainstorming: Songwriters prototyping hip-hop beats in real-time during sessions, accelerating collaboration and iteration ⚠️ Troubleshooting Invalid JSON from AI Agent: Ensure the system prompt stresses valid JSON; test the agent standalone with a sample query Music Generation Fails (401/403): Verify Fal.ai API key has minimax-music access; check usage quotas in dashboard Status Polling Loops Indefinitely: Bump wait time to 45-60s for complex tracks; inspect fal.ai queue logs for bottlenecks Lyrics Under 600 Characters: Tweak agent prompt to enforce fuller structures like [V1][C][V2][B][C]; verify output length in executions

By Daniel Nkencho

601

AI-powered code review with linting, red-marked corrections in Google Sheets & Slack

Advanced Code Review Automation (AI + Lint + Slack) Who’s it for For software engineers, QA teams, and tech leads who want to automate intelligent code reviews with both AI-driven suggestions and rule-based linting — all managed in Google Sheets with instant Slack summaries. How it works This workflow performs a two-layer review system: Lint Check: Runs a lightweight static analysis to find common issues (e.g., use of var, console.log, unbalanced braces). AI Review: Sends valid code to Gemini AI, which provides human-like review feedback with severity classification (Critical, Major, Minor) and visual highlights (red/orange tags). Formatter: Combines lint and AI results, calculating an overall score (0–10). Aggregator: Summarizes results for quick comparison. Google Sheets Writer: Appends results to your review log. Slack Notification: Posts a concise summary (e.g., number of issues and average score) to your team’s channel. How to set up Connect Google Sheets and Slack credentials in n8n. Replace placeholders (<YOURSPREADSHEETID>, <YOURSHEETGIDORNAME>, <YOURSLACKCHANNEL_ID>). Adjust the AI review prompt or lint rules as needed. Activate the workflow — reviews will start automatically whenever new code is added to the sheet. Requirements Google Sheets and Slack integrations enabled A configured AI node (Gemini, OpenAI, or compatible) Proper permissions to write to your target Google Sheet How to customize Add more linting rules (naming conventions, spacing, forbidden APIs) Extend the AI prompt for project-specific guidelines Customize the Slack message formatting Export analytics to a dashboard (e.g., Notion or Data Studio) Why it’s valuable This workflow brings realistic, team-oriented AI-assisted code review to n8n — combining the speed of automated linting with the nuance of human-style feedback. It saves time, improves code quality, and keeps your team’s review history transparent and centralized.

By higashiyama

Auto-reply & create Linear tickets from Gmail with GPT-5, gotoHuman & human review

This workflow automatically classifies every new email from your linked mailbox, drafts a personalized reply, and creates Linear tickets for bugs or feature requests. It uses a human-in-the-loop with gotoHuman and continuously improves itself by learning from approved examples. How it works The workflow triggers on every new email from your linked mailbox. Self-learning Email Classifier: an AI model categorizes the email into defined categories (e.g., Bug Report, Feature Request, Sales Opportunity, etc.). It fetches previously approved classification examples from gotoHuman to refine decisions. Self-learning Email Writer: the AI drafts a reply to the email. It learns over time by using previously approved replies from gotoHuman, with per-classification context to tailor tone and style (e.g., different style for sales vs. bug reports). Human Review in gotoHuman: review the classification and the drafted reply. Drafts can be edited or retried. Approved values are used to train the self-learning agents. Send approved Reply: the approved response is sent as a reply to the email thread. Create ticket: if the classification is Bug or Feature Request, a ticket is created by another AI agent in Linear. Human Review in gotoHuman: How to set up Most importantly, install the gotoHuman node before importing this template! (Just add the node to a blank canvas before importing) Set up credentials for gotoHuman, OpenAI, your email provider (e.g. Gmail), and Linear. In gotoHuman, select and create the pre-built review template "Support email agent" or import the ID: 6fzuCJlFYJtlu9mGYcVT. Select this template in the gotoHuman node. In the "gotoHuman: Fetch approved examples" http nodes you need to add your formId. It is the ID of the review template that you just created/imported in gotoHuman. Requirements gotoHuman (human supervision, memory for self-learning) OpenAI (classification, drafting) Gmail or your preferred email provider (for email trigger+replies) Linear (ticketing) How to customize Expand or refine the categories used by the classifier. Update the prompt to reflect your own taxonomy. Filter fetched training data from gotoHuman by reviewer so the writer adapts to their personalized tone and preferences. Add more context to the AI email writer (calendar events, FAQs, product docs) to improve reply quality.

By gotoHuman

353