Mariela Slavenova

Total Views1,356

Templates2

Templates by Mariela Slavenova

From sitemap crawling to vector storage: Creating an efficient workflow for RAG

This template crawls a website from its sitemap, deduplicates URLs in Supabase, scrapes pages with Crawl4AI, cleans and validates the text, then stores content + metadata in a Supabase vector store using OpenAI embeddings. It’s a reliable, repeatable pipeline for building searchable knowledge bases, SEO research corpora, and RAG datasets. ⸻ Good to know • Built-in de-duplication via a scrape_queue table (status: pending/completed/error). • Resilient flow: waits, retries, and marks failed tasks. • Costs depend on Crawl4AI usage and OpenAI embeddings. • Replace any placeholders (API keys, tokens, URLs) before running. • Respect website robots/ToS and applicable data laws when scraping. How it works Sitemap fetch & parse — Load sitemap.xml, extract all URLs. De-dupe — Normalize URLs, check Supabase scrape_queue; insert only new ones. Scrape — Send URLs to Crawl4AI; poll task status until completed. Clean & score — Remove boilerplate/markup, detect content type, compute quality metrics, extract metadata (title, domain, language, length). Chunk & embed — Split text, create OpenAI embeddings. Store — Upsert into Supabase vector store (documents) with metadata; update job status. Requirements • Supabase (Postgres + Vector extension enabled) • Crawl4AI API key (or header auth) • OpenAI API key (for embeddings) • n8n credentials set for HTTP, Postgres/Supabase How to use Configure credentials (Supabase/Postgres, Crawl4AI, OpenAI). (Optional) Run the provided SQL to create scrape_queue and documents. Set your sitemap URL in the HTTP Request node. Execute the workflow (manual trigger) and monitor Supabase statuses. Query your documents table or vector store from your app/RAG stack. Potential Use Cases This automation is ideal for: Market research teams collecting competitive data Content creators monitoring web trends SEO specialists tracking website content updates Analysts gathering structured data for insights Anyone needing reliable, structured web content for analysis Need help customizing? Contact me for consulting and support: LinkedIn

By Mariela Slavenova

883

Generate personalized cold email openers with website scraping using Claude & GPT-4

This template enriches a lead list by analyzing each contact’s company website and auto-generating a single personalized cold-email opener. Drop a spreadsheet into a Google Drive folder → the workflow parses rows, fetches website content via Jina AI, uses OpenAI to check if the site contains valid business info, then calls Anthropic to craft a one-liner. It writes both the website summary and personalized opener back to Google Sheets, and finally sends you a Telegram confirmation with the file link. What it does Turns a CSV/Google Sheet of leads into tailored cold-email openers. For each lead, the workflow fetches the company website, writes a 300-word business summary, then crafts a one-sentence, emotionally engaging opening line. Results are written back to the same Sheet, and you get a Telegram ping when processing finishes. How it works (high-level) Trigger: Watches a Google Drive folder. When a new Sheet is added, the flow starts. Parse: Reads rows (expects columns like First Name, Last Name, Email, domain). Enrich: An AI Agent calls Jina “r.jina.ai/{url}” to fetch page markdown, then produces a structured website summary. Validate: An OpenAI step checks if the fetched content is a real business page (hasWebsite: true/false). Personalize: • If true → Anthropic crafts a bespoke opener using the summary. • If false → Fallback prompt creates a strong opener using domain + universal lead-gen pains. Update: Writes websiteSummary and personalization back to the Sheet (matching on domain). Notify: Sends a Telegram message with the file name + link when done. What you need • Google Drive (folder to watch) • Google Sheets (the uploaded Sheet to enrich) • Jina HTTP header auth (for the markdown fetch tool) • OpenAI (JSON check for website validity) • Anthropic (Claude Sonnet 4 for copy quality) • Telegram Bot (to receive completion alerts) Inputs & expected schema • A Google Sheet with at least: First Name, Last Name, Email, domain • Optional columns are preserved; rows are processed in batches. Outputs • New/updated columns per row: • websiteSummary — concise, structured business overview • personalization — a single, high-impact opening sentence • Telegram confirmation with file name and link. Customization tips • Tweak the system prompts for tone or length. • Add scoring (e.g., ICP fit) before personalization. • Expand validation (e.g., handle multi-page sites or language detection). • Swap/parallel LLMs to balance quality, cost, and speed. Nodes & key logic • Google Drive Trigger → Google Drive (Download) → Spreadsheet File (parse) → Split in Batches • LangChain Agent with: HTTP Tool (Jina) + Think • OpenAI (JSON validator) → If (website present?) • Anthropic Chat (with + without website branches) • Edit Fields (Set) → Google Sheets (Update) → Telegram Great for Sales teams, SDRs, and founders who want fast, high-quality personalization at scale without manual research. Need help customizing? Contact me for consulting and support: LinkedIn

By Mariela Slavenova

240

All templates loaded

Mariela Slavenova

Categories

Templates by Mariela Slavenova

From sitemap crawling to vector storage: Creating an efficient workflow for RAG

Generate personalized cold email openers with website scraping using Claude & GPT-4