Back to Catalog

Scheduled YouTube transcription with de-duplication using Transcript.io and Supabase

automediaautomedia
165 views
2/3/2026
Official Page

Scheduled YouTube Transcription with Duplicate Prevention

Who's It For?

This template is for advanced users, content teams, and data analysts who need a robust, automated system for capturing YouTube transcripts. It’s ideal for those who monitor multiple channels and want to ensure they only process and save each video's transcript once.

What It Does

This is an advanced, "set-it-and-forget-it" workflow that runs on a daily schedule to monitor YouTube channels for new content. It enhances the basic transcription process by connecting to your Supabase database to prevent duplicate entries.

The workflow fetches all recent videos from the channels you track, filters out any that are too old, and then checks your database to see if a video's transcript has already been saved. Only brand-new videos are sent for transcription via the youtube-transcript.io API, with the final data (title, URL, full transcript, author) being saved back to your Supabase table.

Requirements

  • A Supabase account with a table to store video data. This table must have a column for the source_url to enable duplicate checking.
  • An API key from youtube-transcript.io (offers a free tier).
  • The Channel ID for each YouTube channel you want to track.

How to Set Up

  1. Set Your Time Filter:
    • In the "Max Days" node, set the number of days you want to look back for new videos (e.g., 7 for the last week).
  2. Add Channel IDs:
    • In the "Channels To Track" node, replace the example YouTube Channel IDs with the ones you want to monitor.
  3. Configure API Credentials:
    • Select the "Get Transcript from API" node.
    • In the credentials tab, create a new "Header Auth" credential. Name it youtube-transcript-io and paste your API key into the "Value" field. The "Name" field should be x-api-key.
  4. Connect Your Supabase Account:
    • This workflow uses Supabase in two places: "Check if URL Is In Database" and "Add to Content Queue Table".
    • You must configure your Supabase credentials in both nodes.
    • In each node, select your target table and ensure the columns are mapped correctly.
  5. Adjust the Schedule:
    • The "Schedule Trigger" node is set to run once a day. Click it to adjust the time and frequency to your needs.
  6. Activate the Workflow:
    • Save your changes and toggle the workflow to Active.

n8n YouTube Transcription and De-duplication Workflow

This n8n workflow automates the process of fetching new YouTube videos from a specified channel, transcribing them using an external API, and storing the transcriptions in a Supabase database. It includes de-duplication logic to avoid processing the same video multiple times.

What it does

This workflow streamlines the transcription of YouTube content by:

  1. Scheduling a Check: Periodically (e.g., daily, hourly) checks a YouTube channel's RSS feed for new video uploads.
  2. Extracting Video IDs: Parses the RSS feed to get the YouTube video IDs of recently uploaded videos.
  3. De-duplicating Videos: Queries a Supabase database to check if a video with the extracted ID has already been transcribed.
  4. Filtering New Videos: Proceeds only with videos that have not yet been processed.
  5. Requesting Transcription: For each new video, it sends a request to a transcription API (e.g., Transcript.io) to get the video's transcript.
  6. Formatting Data: Prepares the video metadata and transcription for storage.
  7. Storing in Supabase: Inserts the video ID and its transcription into a Supabase table.
  8. Error Handling: Includes a "Stop and Error" node, suggesting potential error handling or notification for failed transcription attempts or database operations.

Prerequisites/Requirements

To use this workflow, you will need:

  • n8n Instance: A running n8n instance.
  • YouTube Channel RSS Feed URL: The RSS feed URL for the YouTube channel you want to monitor.
  • Transcription API Key: An API key for a transcription service (e.g., Transcript.io, as suggested by the directory name, though the JSON uses a generic HTTP Request node).
  • Supabase Account: Access to a Supabase project with a database table configured to store video IDs and their transcriptions.
    • You will need your Supabase Project URL and Anon Key.
    • The Supabase table should have at least two columns: one for the YouTube video ID (e.g., video_id) and another for the transcription (e.g., transcript).

Setup/Usage

  1. Import the Workflow:
    • Download the provided JSON file.
    • In your n8n instance, go to "Workflows" and click "New".
    • Click the "Import from JSON" button and paste the workflow JSON or upload the file.
  2. Configure Credentials:
    • HTTP Request (Transcription API):
      • Locate the "HTTP Request" node (ID: 19) that calls the transcription API.
      • Configure the URL, method (likely POST), and any necessary headers (e.g., Authorization with your API key) or body parameters required by your chosen transcription service.
      • You might need to dynamicall pass the YouTube video URL or ID to this API.
    • Supabase:
      • Locate the "Supabase" node (ID: 545).
      • Click on "Credentials" and add a new Supabase API credential.
      • Enter your Supabase Project URL and Anon Key.
  3. Configure Nodes:
    • Schedule Trigger:
      • Adjust the "Schedule Trigger" node (ID: 839) to your desired interval for checking new videos (e.g., every hour, daily).
    • RSS Read:
      • In the "RSS Read" node (ID: 37), set the "URL" field to the RSS feed URL of the YouTube channel you want to monitor.
    • Edit Fields (Set):
      • Review the "Edit Fields" nodes (e.g., ID: 38) to ensure the data is being transformed as expected for your Supabase table and transcription API.
    • Supabase (Read):
      • In the "Supabase" node that checks for existing videos, ensure the "Table" and "Filters" are correctly configured to query your transcription table using the video_id.
    • Supabase (Write):
      • In the "Supabase" node that inserts new transcriptions, ensure the "Table" and "Column values" are correctly mapped to your Supabase table's schema.
    • Code Node:
      • The "Code" node (ID: 834) likely contains custom logic for extracting video IDs or preparing data. Review and adjust it if your RSS feed structure or data requirements differ.
  4. Activate the Workflow:
    • Once all configurations are complete, save and activate the workflow. It will now run automatically based on your schedule.

Related Templates

Document RAG & chat agent: Google Drive to Qdrant with Mistral OCR

Knowledge RAG & AI Chat Agent: Google Drive to Qdrant Description This workflow transforms a Google Drive folder into an intelligent, searchable knowledge base and provides a chat agent to query it. It’s composed of two distinct flows: An ingestion pipeline to process documents. A live chat agent that uses RAG (Retrieval-Augmented Generation) and optional web search to answer user questions. This system fully automates the creation of a “Chat with your docs” solution and enhances it with external web-searching capabilities. --- Quick Implementation Steps Import the workflow JSON into your n8n instance. Set up credentials for Google Drive, Mistral AI, OpenAI, and Qdrant. Open the Web Search node and add your Tavily AI API key to the Authorization header. In the Google Drive (List Files) node, set the Folder ID you want to ingest. Run the workflow manually once to populate your Qdrant database (Flow 1). Activate the workflow to enable the chat trigger (Flow 2). Copy the public webhook URL from the When chat message received node and open it in a new tab to start chatting. --- What It Does The workflow is divided into two primary functions: Knowledge Base Ingestion (Manual Trigger) This flow populates your vector database. Scans Google Drive: Lists all files from a specified folder. Processes Files Individually: Downloads each file. Extracts Text via OCR: Uses Mistral AI OCR API for text extraction from PDFs, images, etc. Generates Smart Metadata: A Mistral LLM assigns metadata like documenttype, project, and assignedto. Chunks & Embeds: Text is cleaned, chunked, and embedded via OpenAI’s text-embedding-3-small model. Stores in Qdrant: Text chunks, embeddings, and metadata are stored in a Qdrant collection (docaiauto). AI Chat Agent (Chat Trigger) This flow powers the conversational interface. Handles User Queries: Triggered when a user sends a chat message. Internal RAG Retrieval: Searches Qdrant Vector Store first for answers. Web Search Fallback: If unavailable internally, the agent offers to perform a Tavily AI web search. Contextual Responses: Combines internal and external info for comprehensive answers. --- Who's It For Ideal for: Teams building internal AI knowledge bases from Google Drive. Developers creating AI-powered support, research, or onboarding bots. Organizations implementing RAG pipelines. Anyone making unstructured Google Drive documents searchable via chat. --- Requirements n8n instance (self-hosted or cloud). Google Drive Credentials (to list and download files). Mistral AI API Key (for OCR & metadata extraction). OpenAI API Key (for embeddings and chat LLM). Qdrant instance (cloud or self-hosted). Tavily AI API Key (for web search). --- How It Works The workflow runs two independent flows in parallel: Flow 1: Ingestion Pipeline (Manual Trigger) List Files: Fetch files from Google Drive using the Folder ID. Loop & Download: Each file is processed one by one. OCR Processing: Upload file to Mistral Retrieve signed URL Extract text using Mistral DOC OCR Metadata Extraction: Analyze text using a Mistral LLM. Text Cleaning & Chunking: Split into 1000-character chunks. Embeddings Creation: Use OpenAI embeddings. Vector Insertion: Push chunks + metadata into Qdrant. Flow 2: AI Chat Agent (Chat Trigger) Chat Trigger: Starts when a chat message is received. AI Agent: Uses OpenAI + Simple Memory to process context. RAG Retrieval: Queries Qdrant for related data. Decision Logic: Found → Form answer. Not found → Ask if user wants web search. Web Search: Performs Tavily web lookup. Final Response: Synthesizes internal + external info. --- How To Set Up Import the Workflow Upload the provided JSON into your n8n instance. Configure Credentials Create and assign: Google Drive → Google Drive nodes Mistral AI → Upload, Signed URL, DOC OCR, Cloud Chat Model OpenAI → Embeddings + Chat Model nodes Qdrant → Vector Store nodes Add Tavily API Key Open Web Search node → Parameters → Headers Add your key under Authorization (e.g., tvly-xxxx). Node Configuration Google Drive (List Files): Set Folder ID. Qdrant Nodes: Ensure same collection name (docaiauto). Run Ingestion (Flow 1) Click Test workflow to populate Qdrant with your Drive documents. Activate Chat (Flow 2) Toggle the workflow ON to enable real-time chat. Test Open the webhook URL and start chatting! --- How To Customize Change LLMs: Swap models in OpenAI or Mistral nodes (e.g., GPT-4o, Claude 3). Modify Prompts: Edit the system message in ai chat agent to alter tone or logic. Chunking Strategy: Adjust chunkSize and chunkOverlap in the Code node. Different Sources: Replace Google Drive with AWS S3, Local Folder, etc. Automate Updates: Add a Cron node for scheduled ingestion. Validation: Add post-processing steps after metadata extraction. Expand Tools: Add more functional nodes like Google Calendar or Calculator. --- Use Case Examples Internal HR Bot: Answer HR-related queries from stored policy docs. Tech Support Assistant: Retrieve troubleshooting steps for products. Research Assistant: Summarize and compare market reports. Project Management Bot: Query document ownership or project status. --- Troubleshooting Guide | Issue | Possible Solution | |------------|------------------------| | Chat agent doesn’t respond | Check OpenAI API key and model availability (e.g., gpt-4.1-mini). | | Known documents not found | Ensure ingestion flow ran and both Qdrant nodes use same collection name. | | OCR node fails | Verify Mistral API key and input file integrity. | | Web search not triggered | Re-check Tavily API key in Web Search node headers. | | Incorrect metadata | Tune Information Extractor prompt or use a stronger Mistral model. | --- Need Help or More Workflows? Want to customize this workflow for your business or integrate it with your existing tools? Our team at Digital Biz Tech can tailor it precisely to your use case from automation logic to AI-powered enhancements. We can help you set it up for free — from connecting credentials to deploying it live. Contact: shilpa.raju@digitalbiz.tech Website: https://www.digitalbiz.tech LinkedIn: https://www.linkedin.com/company/digital-biz-tech/ You can also DM us on LinkedIn for any help. ---

DIGITAL BIZ TECHBy DIGITAL BIZ TECH
1409

Newsletter signup flow with Email Verification API, Gmail & Google Sheets tracking

Newsletter Sign-up with Email Verification & Welcome Email Automation 📋 Description A complete, production-ready newsletter automation workflow that validates email addresses, sends personalized welcome emails, and maintains comprehensive logs in Google Sheets. Perfect for marketing teams, content creators, and businesses looking to build high-quality email lists with minimal manual effort. ✨ Key Features Email Verification Real-time validation using Verifi Email API Checks email format (RFC compliance) Verifies domain existence and MX records Detects disposable/temporary email addresses Identifies potential spoofed emails Automated Welcome Emails Personalized HTML emails with subscriber's first name Beautiful, mobile-responsive design with gradient headers Branded confirmation and unsubscribe links Sent via Gmail (or SMTP) automatically to valid subscribers Smart Data Handling Comprehensive logging to Google Sheets with three separate tabs Handles incomplete submissions gracefully Preserves original user data throughout verification process Tracks source attribution for multi-channel campaigns Error Management Automatic retry logic on API failures Separate logging for different error types Detailed technical reasons for invalid emails No data loss with direct webhook referencing 🎯 Use Cases Newsletter sign-ups on websites and landing pages Lead generation forms with quality control Marketing campaigns requiring verified email lists Community building with automated onboarding SaaS product launches with email collection Content creator audience building E-commerce customer list management 📊 What Gets Logged Master Log (All Subscribers) Timestamp, name, email, verification result Verification score and email sent status Source tracking, disposable status, domain info Invalid Emails Log Detailed rejection reasons Technical diagnostic information MX record status, RFC compliance Provider information for troubleshooting Invalid Submissions Log Incomplete form data Missing required fields Timestamp for follow-up 🔧 Technical Stack Trigger: Webhook (POST endpoint) Email Verification: Verifi Email API Email Sending: Gmail OAuth2 (or SMTP) Data Storage: Google Sheets (3 tabs) Processing: JavaScript code nodes for data formatting 🚀 Setup Requirements Google Account - For Sheets and Gmail integration Verifi Email API Key - (https://verifi.email) Google Sheets - Pre-configured with 3 tabs (template provided) 5-10 minutes - Quick setup with step-by-step instructions included 📈 Benefits ✅ Improve Email Deliverability - Remove invalid emails before sending campaigns ✅ Reduce Bounce Rates - Only send to verified, active email addresses ✅ Save Money - Don't waste email credits on invalid addresses ✅ Better Analytics - Track conversion rates by source ✅ Professional Onboarding - Personalized welcome experience ✅ Scalable Solution - Handles high-volume sign-ups automatically ✅ Data Quality - Build a clean, high-quality subscriber list 🎨 Customization Options Email Template - Fully customizable HTML design Verification Threshold - Adjust score requirements Brand Colors - Match your company branding Confirmation Flow - Add double opt-in if desired Multiple Sources - Track different signup forms Language - Easily translate email content 📦 What's Included ✅ Complete n8n workflow JSON (ready to import) ✅ Google Sheets template structure ✅ Responsive HTML email template ✅ Setup documentation with screenshots ✅ Troubleshooting guide ✅ Customization examples 🔒 Privacy & Compliance GDPR-compliant with unsubscribe links Secure data handling via OAuth2 No data shared with third parties Audit trail in Google Sheets Easy data deletion/export 💡 Quick Stats 12 Nodes - Fully automated workflow 3 Data Paths - Valid, invalid, and incomplete submissions 100% Uptime - When properly configured Instant Processing - Real-time email verification Unlimited Scale - Based on your API limits 🏆 Perfect For Marketing Agencies SaaS Companies Content Creators E-commerce Stores Community Platforms Educational Institutions Membership Sites Newsletter Publishers 🌟 Why Use This Workflow? Instead of manually verifying emails or dealing with bounce complaints, this workflow automates the entire process from sign-up to welcome email. Save hours of manual work, improve your email deliverability, and create a professional first impression with every new subscriber. Start building a high-quality email list today! ---

Jitesh DugarBy Jitesh Dugar
245

Automate job searching & resume customization with AI, LinkedIn & Google Sheets

🤖 AI-Powered Job Matcher & Resume Customizer Description This advanced workflow automates the entire job search and preparation process, moving beyond simple notifications to provide AI-driven career intelligence. It connects to LinkedIn to scrape fresh job postings, filters against jobs you've already seen, and then uses powerful LLMs (Mistral Large/Small) to perform a detailed resume-to-job match, generate tailored cover letters, and provide concrete resume improvement suggestions. All data is logged into a Google Sheet for comprehensive tracking, and a clean, single Daily Digest Email summarizes the top 5 matches found each day. --- ✨ Key Features Automated Scheduling: Runs daily to find new job postings. Multi-Keyword Search: Uses your main job title and three alternate titles generated by an AI Agent for maximum search coverage. LinkedIn Web Scraping: Pulls new job URLs, details, location, and salary data from LinkedIn Search results. Duplicate Prevention: Uses the Compare Datasets node to ensure only new, unseen jobs are processed against your master Google Sheet. Intelligent Matching (LLM): The workflow performs a detailed job-to-resume comparison, generating: A Match Score (0-100) with evidence for alignment in skills, experience, and domain. A Tailored Cover Letter specific to the job title and company. Actionable Resume Improvement points (e.g., [ADD], [QUANTIFY]) to optimize your resume for the specific role. Centralized Tracking: Saves all job data, match scores, cover letters, and resume suggestions to a Google Sheet. Professional Daily Digest: Sends a single, clean HTML email summarizing the top 5 highest-scoring job matches for easy review. --- 🛠️ Prerequisites n8n Credentials: Google Drive: To download your resume (PDF/DOCX file URL). Google Sheets: To connect to your job tracking sheet. Gmail: To send the daily digest email. Mistral Cloud: For the LLM processing (Resume Breakdown, Job Matching, and Resume Analysis). External Files: A Job Tracking Google Sheet (used as a master database). Your current Resume file (PDF recommended, hosted on Google Drive). Setup Notes: Update the file links (Download Resume node) and Google Sheet details (Get row(s)/Append nodes). Set your personal email address in the Send Digest Email node. Review the LLM prompts to tailor the AI agent's persona and output fields to your exact needs.

Jordan HoyleBy Jordan Hoyle
1405