Evaluate AI agent response relevance using OpenAI and cosine similarity
This n8n template demonstrates how to calculate the evaluation metric "Relevance" which in this scenario, measures the relevance of the agent's response to the user's question.
The scoring approach is adapted from the open-source evaluations project RAGAS and you can see the source here https://github.com/explodinggradients/ragas/blob/main/ragas/src/ragas/metrics/_answer_relevance.py
How it works
- This evaluation works best for Q&A agents.
- For our scoring, we analyse the agent's response and ask another AI to generate a question from it. This generated question is then compared to the original question using cosine similarity.
- A high score indicates relevance and the agent's successful ability to answer the question whereas a low score means agent may have added too much irrelevant info, went off script or hallucinated.
Requirements
- n8n version 1.94+
- Check out this Google Sheet for a sample data https://docs.google.com/spreadsheets/d/1YOnu2JJjlxd787AuYcg-wKbkjyjyZFgASYVV0jsij5Y/edit?usp=sharing
Evaluate AI Agent Response Relevance using OpenAI and Cosine Similarity
This n8n workflow automates the process of evaluating the relevance of AI agent responses using OpenAI's embedding models and cosine similarity. It's designed to help assess the quality of AI-generated content against a predefined "golden answer" or expected response.
What it does
This workflow performs the following key steps:
- Triggers Evaluation: Initiates the workflow for each row in a dataset, typically containing a user query, an AI agent's response, and a "golden answer" for comparison.
- Sets up Evaluation Fields: Prepares the data, extracting the AI agent's response and the golden answer for subsequent processing.
- Generates OpenAI Embeddings:
- Sends the AI agent's response to OpenAI to generate a vector embedding.
- Sends the golden answer to OpenAI to generate a vector embedding.
- Calculates Cosine Similarity: Uses custom JavaScript code to compute the cosine similarity between the two generated embeddings. This metric indicates how semantically similar the AI agent's response is to the golden answer.
- Records Evaluation Metrics: Stores the calculated cosine similarity score as an evaluation metric, allowing for quantitative analysis of response relevance.
Prerequisites/Requirements
To use this workflow, you will need:
- n8n Instance: A running n8n instance.
- OpenAI API Key: An API key for OpenAI with access to embedding models (e.g.,
text-embedding-ada-002). This key needs to be configured as an n8n credential. - Evaluation Dataset: A dataset (e.g., CSV, Google Sheet, database) containing at least:
ai_agent_response: The response generated by the AI agent.golden_answer: The expected or ideal response for the given query.
Setup/Usage
- Import the Workflow:
- Download the provided JSON file for this workflow.
- In your n8n instance, click "Workflows" in the left sidebar.
- Click "New" -> "Import from JSON" and upload the downloaded JSON file.
- Configure Credentials:
- Locate the "OpenAI Chat Model" nodes (1153) within the workflow.
- Ensure they are configured with your OpenAI API Key credential. If not, create a new OpenAI API credential and select it for these nodes.
- Configure the Evaluation Trigger:
- The "When fetching a dataset row" (1300) node is an
Evaluation Trigger. This node is designed to be used within n8n's evaluation framework. - When running an evaluation, you will select your dataset and map the relevant columns to
ai_agent_responseandgolden_answeras expected by the "Edit Fields" (38) node.
- The "When fetching a dataset row" (1300) node is an
- Activate the Workflow:
- Once configured, activate the workflow by toggling the "Active" switch in the top right corner of the workflow editor.
- Run an Evaluation:
- Navigate to the "Evaluations" section in n8n and create a new evaluation.
- Select this workflow and your dataset.
- Map the input fields (
ai_agent_response,golden_answer) from your dataset to the corresponding fields in the "Edit Fields" node. - Run the evaluation to get relevance scores for your AI agent's responses.
Related Templates
Generate song lyrics and music from text prompts using OpenAI and Fal.ai Minimax
Spark your creativity instantly in any chatโturn a simple prompt like "heartbreak ballad" into original, full-length lyrics and a professional AI-generated music track, all without leaving your conversation. ๐ What This Template Does This chat-triggered workflow harnesses AI to generate detailed, genre-matched song lyrics (at least 600 characters) from user messages, then queues them for music synthesis via Fal.ai's minimax-music model. It polls asynchronously until the track is ready, delivering lyrics and audio URL back in chat. Crafts original, structured lyrics with verses, choruses, and bridges using OpenAI Submits to Fal.ai for melody, instrumentation, and vocals aligned to the style Handles long-running generations with smart looping and status checks Returns complete song package (lyrics + audio link) for seamless sharing ๐ง Prerequisites n8n account (self-hosted or cloud with chat integration enabled) OpenAI account with API access for GPT models Fal.ai account for AI music generation ๐ Required Credentials OpenAI API Setup Go to platform.openai.com โ API keys (sidebar) Click "Create new secret key" โ Name it (e.g., "n8n Songwriter") Copy the key and add to n8n as "OpenAI API" credential type Test by sending a simple chat completion request Fal.ai HTTP Header Auth Setup Sign up at fal.ai โ Dashboard โ API Keys Generate a new API key โ Copy it In n8n, create "HTTP Header Auth" credential: Name="Fal.ai", Header Name="Authorization", Header Value="Key [Your API Key]" Test with a simple GET to their queue endpoint (e.g., /status) โ๏ธ Configuration Steps Import the workflow JSON into your n8n instance Assign OpenAI API credentials to the "OpenAI Chat Model" node Assign Fal.ai HTTP Header Auth to the "Generate Music Track", "Check Generation Status", and "Fetch Final Result" nodes Activate the workflowโchat trigger will appear in your n8n chat interface Test by messaging: "Create an upbeat pop song about road trips" ๐ฏ Use Cases Content Creators: YouTubers generating custom jingles for videos on the fly, streamlining production from idea to audio export Educators: Music teachers using chat prompts to create era-specific folk tunes for classroom discussions, fostering interactive learning Gift Personalization: Friends crafting anniversary R&B tracks from shared memories via quick chats, delivering emotional audio surprises Artist Brainstorming: Songwriters prototyping hip-hop beats in real-time during sessions, accelerating collaboration and iteration โ ๏ธ Troubleshooting Invalid JSON from AI Agent: Ensure the system prompt stresses valid JSON; test the agent standalone with a sample query Music Generation Fails (401/403): Verify Fal.ai API key has minimax-music access; check usage quotas in dashboard Status Polling Loops Indefinitely: Bump wait time to 45-60s for complex tracks; inspect fal.ai queue logs for bottlenecks Lyrics Under 600 Characters: Tweak agent prompt to enforce fuller structures like [V1][C][V2][B][C]; verify output length in executions
AI-powered code review with linting, red-marked corrections in Google Sheets & Slack
Advanced Code Review Automation (AI + Lint + Slack) Whoโs it for For software engineers, QA teams, and tech leads who want to automate intelligent code reviews with both AI-driven suggestions and rule-based linting โ all managed in Google Sheets with instant Slack summaries. How it works This workflow performs a two-layer review system: Lint Check: Runs a lightweight static analysis to find common issues (e.g., use of var, console.log, unbalanced braces). AI Review: Sends valid code to Gemini AI, which provides human-like review feedback with severity classification (Critical, Major, Minor) and visual highlights (red/orange tags). Formatter: Combines lint and AI results, calculating an overall score (0โ10). Aggregator: Summarizes results for quick comparison. Google Sheets Writer: Appends results to your review log. Slack Notification: Posts a concise summary (e.g., number of issues and average score) to your teamโs channel. How to set up Connect Google Sheets and Slack credentials in n8n. Replace placeholders (<YOURSPREADSHEETID>, <YOURSHEETGIDORNAME>, <YOURSLACKCHANNEL_ID>). Adjust the AI review prompt or lint rules as needed. Activate the workflow โ reviews will start automatically whenever new code is added to the sheet. Requirements Google Sheets and Slack integrations enabled A configured AI node (Gemini, OpenAI, or compatible) Proper permissions to write to your target Google Sheet How to customize Add more linting rules (naming conventions, spacing, forbidden APIs) Extend the AI prompt for project-specific guidelines Customize the Slack message formatting Export analytics to a dashboard (e.g., Notion or Data Studio) Why itโs valuable This workflow brings realistic, team-oriented AI-assisted code review to n8n โ combining the speed of automated linting with the nuance of human-style feedback. It saves time, improves code quality, and keeps your teamโs review history transparent and centralized.
Auto-reply & create Linear tickets from Gmail with GPT-5, gotoHuman & human review
This workflow automatically classifies every new email from your linked mailbox, drafts a personalized reply, and creates Linear tickets for bugs or feature requests. It uses a human-in-the-loop with gotoHuman and continuously improves itself by learning from approved examples. How it works The workflow triggers on every new email from your linked mailbox. Self-learning Email Classifier: an AI model categorizes the email into defined categories (e.g., Bug Report, Feature Request, Sales Opportunity, etc.). It fetches previously approved classification examples from gotoHuman to refine decisions. Self-learning Email Writer: the AI drafts a reply to the email. It learns over time by using previously approved replies from gotoHuman, with per-classification context to tailor tone and style (e.g., different style for sales vs. bug reports). Human Review in gotoHuman: review the classification and the drafted reply. Drafts can be edited or retried. Approved values are used to train the self-learning agents. Send approved Reply: the approved response is sent as a reply to the email thread. Create ticket: if the classification is Bug or Feature Request, a ticket is created by another AI agent in Linear. Human Review in gotoHuman: How to set up Most importantly, install the gotoHuman node before importing this template! (Just add the node to a blank canvas before importing) Set up credentials for gotoHuman, OpenAI, your email provider (e.g. Gmail), and Linear. In gotoHuman, select and create the pre-built review template "Support email agent" or import the ID: 6fzuCJlFYJtlu9mGYcVT. Select this template in the gotoHuman node. In the "gotoHuman: Fetch approved examples" http nodes you need to add your formId. It is the ID of the review template that you just created/imported in gotoHuman. Requirements gotoHuman (human supervision, memory for self-learning) OpenAI (classification, drafting) Gmail or your preferred email provider (for email trigger+replies) Linear (ticketing) How to customize Expand or refine the categories used by the classifier. Update the prompt to reflect your own taxonomy. Filter fetched training data from gotoHuman by reviewer so the writer adapts to their personalized tone and preferences. Add more context to the AI email writer (calendar events, FAQs, product docs) to improve reply quality.