2 templates found
Category:
Author:
Sort:

Compare GPT-4, Claude & Gemini Responses with Contextual AI's LMUnit Evaluation

PROBLEM Evaluating and comparing responses from multiple LLMs (OpenAI, Claude, Gemini) can be challenging when done manually. Each model produces outputs that differ in clarity, tone, and reasoning structure. Traditional evaluation metrics like ROUGE or BLEU fail to capture nuanced quality differences. Human evaluations are inconsistent, slow, and difficult to scale. This workflow automates LLM response quality evaluation using Contextual AI’s LMUnit, a natural language unit testing framework that provides systematic, fine-grained feedback on response clarity and conciseness. > Note: LMUnit offers natural language-based evaluation with a 1–5 scoring scale, enabling consistent and interpretable results across different model outputs. How it works A chat trigger node collects responses from multiple LLMs such as OpenAI GPT-4.1, Claude 4.5 Sonnet, and Gemini 2.5 Flash. Each model receives the same input prompt to ensure fair comparison, which is then aggregated and associated with each test cases We use Contextual AI's LMUnit node to evaluate each response using predefined quality criteria: “Is the response clear and easy to understand?” - Clarity “Is the response concise and free from redundancy?” - Conciseness LMUnit then produces evaluation scores (1–5) for each test Results are aggregated and formatted into a structured summary showing model-wise performance and overall averages. How to set up Create a free Contextual AI account and obtain your CONTEXTUALAIAPIKEY. In your n8n instance, add this key as a credential under “Contextual AI.” Obtain and add credentials for each model provider you wish to test: OpenAI API Key: platform.openai.com/account/api-keys Anthropic API Key: console.anthropic.com/settings/keys Gemini API Key: ai.google.dev/gemini-api/docs/api-key Start sending prompts using chat interface to automatically generate model outputs and evaluations. How to customize the workflow Add more evaluation criteria (e.g., factual accuracy, tone, completeness) in the LMUnit test configuration. Include additional LLM providers by duplicating the response generation nodes. Adjust thresholds and aggregation logic to suit your evaluation goals. Enhance the final summary formatting for dashboards, tables, or JSON exports. For detailed API parameters, refer to the LMUnit API reference. If you have feedback or need support, please email feedback@contextual.ai.

Jinash RouniyarBy Jinash Rouniyar
1110

Automate lead qualification with AI voice calls using GPT-3.5, Notion and Vapi

Website Leads to Voice Demo and Scheduling Creator: Summer Chang AI Booking Agent Setup Guide Overview This automation turns your website into an active booking agent. When someone fills out your form, it automatically: Adds their information to Notion AI researches their business from their website Calls them immediately with a personalized pitch Updates Notion with call results Total setup time: 30-45 minutes What You Need Before starting, create accounts and gather these: n8n account (cloud or self-hosted) Notion account - Free plan works duplicate my notion template OpenRouter API key - Get from openrouter.ai Vapi account - Get from vapi.ai Create an AI assistant Set up a phone number Copy your API key, Assistant ID, and Phone Number ID How It Works The Complete Flow Visitor fills form on your website Form submission creates new record in Notion with Status = "New" Notion Trigger detects new record (checks every minute) Main Workflow executes: Fetches lead's website AI analyzes their business Updates Notion with analysis Makes Vapi call with personalized intro Call happens between your AI agent and the lead When call ends, Vapi sends webhook to n8n Webhook Workflow executes: Fetches call details from Vapi AI generates call summary Updates Notion with results and recording

SummerBy Summer
899
All templates loaded