Back to Catalog
Dataki

Dataki

I am passionate about transforming complex processes into seamless automations with n8n. My expertise spans across creating ETL pipelines, sales automations, and data & AI-driven workflows. As an avid problem solver, I thrive on optimizing workflows to drive efficiency and innovation.

Total Views99,670
Templates8

Templates by Dataki

✨ Vision-based AI agent scraper - with Google Sheets, ScrapingBee, and Gemini

Important Notes: Check Legal Regulations: This workflow involves scraping, so ensure you comply with the legal regulations in your country before getting started. Better safe than sorry! Workflow Description: 😮‍💨 Tired of struggling with XPath, CSS selectors, or DOM specificity when scraping ? This AI-powered solution is here to simplify your workflow! With a vision-based AI Agent, you can extract data effortlessly without worrying about how the DOM is structured. This workflow leverages a vision-based AI Agent, integrated with Google Sheets, ScrapingBee, and the Gemini-1.5-Pro model, to extract structured data from webpages. The AI Agent primarily uses screenshots for data extraction but switches to HTML scraping when necessary, ensuring high accuracy. Key Features: Google Sheets Integration: Manage URLs to scrape and store structured results. ScrapingBee: Capture full-page screenshots and retrieve HTML data for fallback extraction. AI-Powered Data Parsing: Use Gemini-1.5-Pro for vision-based scraping and a Structured Output Parser to format extracted data into JSON. Token Efficiency: HTML is converted to Markdown to optimize processing costs. This template is designed for e-commerce scraping but can be customized for various use cases.

DatakiBy Dataki
37011

AI-powered information monitoring with OpenAI, Google Sheets, Jina AI and Slack

Check Legal Regulations: This workflow involves scraping, so ensure you comply with the legal regulations in your country before getting started. Better safe than sorry! 📌 Purpose This workflow enables automated and AI-driven topic monitoring, delivering concise article summaries directly to a Slack channel in a structured and easy-to-read format. It allows users to stay informed on specific topics of interest effortlessly, without manually checking multiple sources, ensuring a time-efficient and focused monitoring experience. To get started, copy the Google Sheets template required for this workflow from here. 🎯 Target Audience This workflow is designed for: Industry professionals looking to track key developments in their field. Research teams who need up-to-date insights on specific topics. Companies aiming to keep their teams informed with relevant content. ⚙️ How It Works Trigger: A Scheduler initiates the workflow at regular intervals (default: every hour). Data Retrieval: RSS feeds are fetched using the RSS Read node. Previously monitored articles are checked in Google Sheets to avoid duplicates. Content Processing: The article relevance is assessed using OpenAI (GPT-4o-mini). Relevant articles are scraped using Jina AI to extract content. Summaries are generated and formatted for Slack. Output: Summaries are posted to the specified Slack channel. Article metadata is stored in Google Sheets for tracking. 🛠️ Key APIs and Nodes Used Scheduler Node: Triggers the workflow periodically. RSS Read: Fetches the latest articles from defined RSS feeds. Google Sheets: Stores monitored articles and manages feed URLs. OpenAI API (GPT-4o-mini): Classifies article relevance and generates summaries. Jina AI API: Extracts the full content of relevant articles. Slack API: Posts formatted messages to Slack channels. --- This workflow provides an efficient and intelligent way to stay informed about your topics of interest, directly within Slack.

DatakiBy Dataki
19380

WordPress - AI chatbot to enhance user experience - with Supabase and OpenAI

This is the first version of a template for a RAG/GenAI App using WordPress content. As creating, sharing, and improving templates brings me joy 😄, feel free to reach out on LinkedIn if you have any ideas to enhance this template! How It Works This template includes three workflows: Workflow 1: Generate embeddings for your WordPress posts and pages, then store them in the Supabase vector store. Workflow 2: Handle upserts for WordPress content when edits are made. Workflow 3: Enable chat functionality by performing Retrieval-Augmented Generation (RAG) on the embedded documents. Why use this template? This template can be applied to various use cases: Build a GenAI application that requires embedded documents from your website's content. Embed or create a chatbot page on your website to enhance user experience as visitors search for information. Gain insights into the types of questions visitors are asking on your website. Simplify content management by asking the AI for related content ideas or checking if similar content already exists. Useful for internal linking. Prerequisites Access to Supabase for storing embeddings. Basic knowledge of Postgres and pgvector. A WordPress website with content to be embedded. An OpenAI API key Ensure that your n8n workflow, Supabase instance, and WordPress website are set to the same timezone (or use GMT) for consistency. Workflow 1 : Initial Embedding This workflow retrieves your WordPress pages and posts, generates embeddings from the content, and stores them in Supabase using pgvector. Step 0 : Create Supabase tables Nodes : Postgres - Create Documents Table: This table is structured to support OpenAI embedding models with 1536 dimensions Postgres - Create Workflow Execution History Table These two nodes create tables in Supabase: The documents table, which stores embeddings of your website content. The n8nwebsiteembedding_histories table, which logs workflow executions for efficient management of upserts. This table tracks the workflow execution ID and execution timestamp. Step 1 : Retrieve and Merge WordPress Pages and Posts Nodes : WordPress - Get All Posts WordPress - Get All Pages Merge WordPress Posts and Pages These three nodes retrieve all content and metadata from your posts and pages and merge them. Important: Apply filters to avoid generating embeddings for all site content. Step 2 : Set Fields, Apply Filter, and Transform HTML to Markdown Nodes : Set Fields Filter - Only Published & Unprotected Content HTML to Markdown These three nodes prepare the content for embedding by: Setting up the necessary fields for content embeddings and document metadata. Filtering to include only published and unprotected content (protected=false), ensuring private or unpublished content is excluded from your GenAI application. Converting HTML to Markdown, which enhances performance and relevance in Retrieval-Augmented Generation (RAG) by optimizing document embeddings. Step 3: Generate Embeddings, Store Documents in Supabase, and Log Workflow Execution Nodes: Supabase Vector Store Sub-nodes: Embeddings OpenAI Default Data Loader Token Splitter Aggregate Supabase - Store Workflow Execution This step involves generating embeddings for the content and storing it in Supabase, followed by logging the workflow execution details. Generate Embeddings: The Embeddings OpenAI node generates vector embeddings for the content. Load Data: The Default Data Loader prepares the content for embedding storage. The metadata stored includes the content title, publication date, modification date, URL, and ID, which is essential for managing upserts. ⚠️ Important Note : Be cautious not to store any sensitive information in metadata fields, as this information will be accessible to the AI and may appear in user-facing answers. Token Management: The Token Splitter ensures that content is segmented into manageable sizes to comply with token limits. Aggregate: Ensure the last node is run only for 1 item. Store Execution Details: The Supabase - Store Workflow Execution node saves the workflow execution ID and timestamp, enabling tracking of when each content update was processed. This setup ensures that content embeddings are stored in Supabase for use in downstream applications, while workflow execution details are logged for consistency and version tracking. This workflow should be executed only once for the initial embedding. Workflow 2, described below, will handle all future upserts, ensuring that new or updated content is embedded as needed. Workflow 2: Handle document upserts Content on a website follows a lifecycle—it may be updated, new content might be added, or, at times, content may be deleted. In this first version of the template, the upsert workflow manages: Newly added content Updated content Step 1: Retrieve WordPress Content with Regular CRON Nodes: CRON - Every 30 Seconds Postgres - Get Last Workflow Execution WordPress - Get Posts Modified After Last Workflow Execution WordPress - Get Pages Modified After Last Workflow Execution Merge Retrieved WordPress Posts and Pages A CRON job (set to run every 30 seconds in this template, but you can adjust it as needed) initiates the workflow. A Postgres SQL query on the n8nwebsiteembedding_histories table retrieves the timestamp of the latest workflow execution. Next, the HTTP nodes use the WordPress API (update the example URL in the template with your own website’s URL and add your WordPress credentials) to request all posts and pages modified after the last workflow execution date. This process captures both newly added and recently updated content. The retrieved content is then merged for further processing. Step 2 : Set fields, use filter Nodes : Set fields2 Filter - Only published and unprotected content The same that Step 2 in Workflow 1, except that HTML To Makrdown is used in further Step. Step 3: Loop Over Items to Identify and Route Updated vs. Newly Added Content Here, I initially aimed to use 'update documents' instead of the delete + insert approach, but encountered challenges, especially with updating both content and metadata columns together. Any help or suggestions are welcome! :) Nodes: Loop Over Items Postgres - Filter on Existing Documents Switch Route existing_documents (if documents with matching IDs are found in metadata): Supabase - Delete Row if Document Exists: Removes any existing entry for the document, preparing for an update. Aggregate2: Used to aggregate documents on Supabase with ID to ensure that Set Fields3 is executed only once for each WordPress content to avoid duplicate execution. Set Fields3: Sets fields required for embedding updates. Route new_documents (if no matching documents are found with IDs in metadata): Set Fields4: Configures fields for embedding newly added content. In this step, a loop processes each item, directing it based on whether the document already exists. The Aggregate2 node acts as a control to ensure Set Fields3 runs only once per WordPress content, effectively avoiding duplicate execution and optimizing the update process. Step 4 : HTML to Markdown, Supabase Vector Store, Update Workflow Execution Table The HTML to Markdown node mirrors Workflow 1 - Step 2. Refer to that section for a detailed explanation on how HTML content is converted to Markdown for improved embedding performance and relevance. Following this, the content is stored in the Supabase vector store to manage embeddings efficiently. Lastly, the workflow execution table is updated. These nodes mirros the Workflow 1 - Step 3 nodes. Workflow 3 : An example of GenAI App with Wordpress Content : Chatbot to be embed on your website Step 1: Retrieve Supabase Documents, Aggregate, and Set Fields After a Chat Input Nodes: When Chat Message Received Supabase - Retrieve Documents from Chat Input Embeddings OpenAI1 Aggregate Documents Set Fields When a user sends a message to the chat, the prompt (user question) is sent to the Supabase vector store retriever. The RPC function match_documents (created in Workflow 1 - Step 0) retrieves documents relevant to the user’s question, enabling a more accurate and relevant response. In this step: The Supabase vector store retriever fetches documents that match the user’s question, including metadata. The Aggregate Documents node consolidates the retrieved data. Finally, Set Fields organizes the data to create a more readable input for the AI agent. Directly using the AI agent without these nodes would prevent metadata from being sent to the language model (LLM), but metadata is essential for enhancing the context and accuracy of the AI’s response. By including metadata, the AI’s answers can reference relevant document details, making the interaction more informative. Step 2: Call AI Agent, Respond to User, and Store Chat Conversation History Nodes: AI Agent Sub-nodes: OpenAI Chat Model Postgres Chat Memories Respond to Webhook This step involves calling the AI agent to generate an answer, responding to the user, and storing the conversation history. The model used is gpt4-o-mini, chosen for its cost-efficiency.

DatakiBy Dataki
12661

Enrich company data from Google Sheet with OpenAI Agent and ScrapingBee

This workflow demonstrates how to enrich data from a list of companies in a spreadsheet. While this workflow is production-ready if all steps are followed, adding error handling would enhance its robustness. Important notes Check legal regulations: This workflow involves scraping, so make sure to check the legal regulations around scraping in your country before getting started. Better safe than sorry! Mind those tokens: OpenAI tokens can add up fast, so keep an eye on usage unless you want a surprising bill that could knock your socks off! 💸 Main Workflow Node 1 - Webhook This node triggers the workflow via a webhook call. You can replace it with any other trigger of your choice, such as form submission, a new row added in Google Sheets, or a manual trigger. Node 2 - Get Rows from Google Sheet This node retrieves the list of companies from your spreadsheet. here is the Google Sheet Template you can use. The columns in this Google Sheet are: Company: The name of the company Website: The website URL of the company These two fields are required at this step. Business Area: The business area deduced by OpenAI from the scraped data Offer: The offer deduced by OpenAI from the scraped data Value Proposition: The value proposition deduced by OpenAI from the scraped data Business Model: The business model deduced by OpenAI from the scraped data ICP: The Ideal Customer Profile deduced by OpenAI from the scraped data Additional Information: Information related to the scraped data, including: Information Sufficiency: Description: Indicates if the information was sufficient to provide a full analysis. Options: "Sufficient" or "Insufficient" Insufficient Details: Description: If labeled "Insufficient," specifies what information was missing or needed to complete the analysis. Mismatched Content: Description: Indicates whether the page content aligns with that of a typical company page. Suggested Actions: Description: Provides recommendations if the page content is insufficient or mismatched, such as verifying the URL or searching for alternative sources. Node 3 - Loop Over Items This node ensures that, in subsequent steps, the website in "extra workflow input" corresponds to the row being processed. You can delete this node, but you'll need to ensure that the "query" sent to the scraping workflow corresponds to the website of the specific company being scraped (rather than just the first row). Node 4 - AI Agent This AI agent is configured with a prompt to extract data from the content it receives. The node has three sub-nodes: OpenAI Chat Model: The model used is currently gpt4-o-mini. Call n8n Workflow: This sub-node calls the workflow to use ScrapingBee and retrieves the scraped data. Structured Output Parser: This parser structures the output for clarity and ease of use, and then adds rows to the Google Sheet. Node 5 - Update Company Row in Google Sheet This node updates the specific company's row in Google Sheets with the enriched data. Scraper Agent Workflow Node 1 - Tool Called from Agent This is the trigger for when the AI Agent calls the Scraper. A query is sent with: Company name Website (the URL of the website) Node 2 - Set Company URL This node renames a field, which may seem trivial but is useful for performing transformations on data received from the AI Agent. Node 3 - ScrapingBee: Scrape Company's Website This node scrapes data from the URL provided using ScrapingBee. You can use any scraper of your choice, but ScrapingBee is recommended, as it allows you to configure scraper behavior directly. Once configured, copy the provided "curl" command and import it into n8n. Node 4 - HTML to Markdown This node converts the scraped HTML data to Markdown, which is then sent to OpenAI. The Markdown format generally uses fewer tokens than HTML. Improving the Workflow It's always a pleasure to share workflows, but creators sometimes want to keep some magic to themselves ✨. Here are some ways you can enhance this workflow: Handle potential errors Configure the scraper tool to scrape other pages on the website. Although this will cost more tokens, it can be useful (e.g., scraping "Pricing" or "About Us" pages in addition to the homepage). Instead of Google Sheets, connect directly to your CRM to enrich company data. Trigger the workflow from form submissions on your website and send the scraped data about the lead to a Slack or Teams channel.

DatakiBy Dataki
9375

Store Notion's Pages as Vector Documents into Supabase with OpenAI

*Workflow updated on 17/06/2024: Added 'Summarize' node to avoid creating a row for each Notion content block in the Supabase table.* Store Notion's Pages as Vector Documents into Supabase This workflow assumes you have a Supabase project with a table that has a vector column. If you don't have it, follow the instructions here: Supabase Langchain Guide Workflow Description This workflow automates the process of storing Notion pages as vector documents in a Supabase database with a vector column. The steps are as follows: Notion Page Added Trigger: Monitors a specified Notion database for newly added pages. You can create a specific Notion database where you copy the pages you want to store in Supabase. Node: Page Added in Notion Database Retrieve Page Content: Fetches all block content from the newly added Notion page. Node: Get Blocks Content Filter Non-Text Content: Excludes blocks of type "image" and "video" to focus on textual content. Node: Filter - Exclude Media Content Summarize Content: Concatenates the Notion blocks content to create a single text for embedding. Node: Summarize - Concatenate Notion's blocks content Store in Supabase: Stores the processed documents and their embeddings into a Supabase table with a vector column. Node: Store Documents in Supabase Generate Embeddings: Utilizes OpenAI's API to generate embeddings for the textual content. Node: Generate Text Embeddings Create Metadata and Load Content: Loads the block content and creates associated metadata, such as page ID and block ID. Node: Load Block Content & Create Metadata Split Content into Chunks: Divides the text into smaller chunks for easier processing and embedding generation. Node: Token Splitter

DatakiBy Dataki
6879

AI-generated summary block for WordPress posts

What is this workflow? This n8n template automates the process of adding an AI-generated summary at the top of your WordPress posts. It retrieves, processes, and updates your posts dynamically, ensuring efficiency and flexibility without relying on a heavy WordPress plugin. Example of AI Summary Section How It Works Triggers → Runs on a scheduled interval or via a webhook when a new post is published. Retrieves posts → Fetches content from WordPress and converts HTML to Markdown for AI processing. AI Summary Generation → Uses OpenAI to create a concise summary. Post Update → Inserts the summary at the top of the post while keeping the original excerpt intact. Data Logging & Notifications → Saves processed posts to Google Sheets and notifies a Slack channel. Why use this workflow? ✅ No need for a WordPress plugin → Keeps your site lightweight. ✅ Highly flexible → Easily connect with Google Sheets, Slack, or other services. ✅ Customizable → Adapt AI prompts, formatting, and integrations to your needs. ✅ Smart filtering → Ensures posts are not reprocessed unnecessarily. 💡 Check the detailed sticky notes for setup instructions and customization options!

DatakiBy Dataki
2792

Enrich Pipedrive's Organization Data with OpenAI GPT-4o & Notify it in Slack

This workflow enriches new Pipedrive organization's data by adding a note to the organization object in Pipedrive. It assumes there is a custom "website" field in your Pipedrive setup, as data will be scraped from this website to generate a note using OpenAI. Then, a notification is sent in Slack. ⚠️ Disclaimer This workflow uses a scraping API. Before using it, ensure you comply with the regulations regarding web scraping in your country or state. Important Notes The OpenAI model used is GPT-4o, chosen for its large input token capacity. However, it is not the cheapest model if cost is very important to you. The system prompt in the OpenAI Node generates output with relevant information, but feel free to improve or modify it according to your needs. How It Works Node 1: Pipedrive Trigger - An Organization is Created This is the trigger of the workflow. When an organization object is created in Pipedrive, this node is triggered and retrieves the data. Make sure you have a "website" custom field in Pipedrive (the name of the field in the n8n node will appear as a random ID and not with the Pipedrive custom field name). Node 2: ScrapingBee - Get Organization's Website's Homepage Content This node scrapes the content from the URL of the website associated with the Pipedrive Organization created in Node 1. The workflow uses the ScrapingBee API, but you can use any preferred API or simply the HTTP request node in n8n. Node 3: OpenAI - Message GPT-4o with Scraped Data This node sends HTML-scraped data from the previous node to the OpenAI GPT-4o model. The system prompt instructs the model to extract company data, such as products or services offered and competitors (if known by the model), and format it as HTML for optimal use in a Pipedrive Note. Node 4: Pipedrive - Create a Note with OpenAI Output This node adds a Note to the Organization created in Pipedrive using the OpenAI node output. The Note will include the company description, target market, selling products, and competitors (if GPT-4o was able to determine them). Node 5 & 6: HTML To Markdown & Code - Markdown to Slack Markdown These two nodes format the HTML output to Slack Markdown. The Note created in Pipedrive is in HTML format, as specified by the System Prompt of the OpenAI Node. To send it to Slack, it needs to be converted to Markdown and then to Slack Markdown. Node 7: Slack - Notify This node sends a message in Slack containing the Pipedrive Organization Note created with this workflow.

DatakiBy Dataki
2195

Answer questions about documentation with BigQuery RAG and OpenAI

BigQuery RAG with OpenAI Embeddings This workflow demonstrates how to use Retrieval-Augmented Generation (RAG) with BigQuery and OpenAI. By default, you cannot directly use OpenAI Cloud Models within BigQuery. Try it *This template comes with access to a public BigQuery table that stores part of the n8n documentation (about nodes and triggers), allowing you to try the workflow right away: n8n-docs-rag.n8ndocs.n8ndocs_embeddings* ⚠️ Important: BigQuery uses the requester pays model. *The table is small (~40 MB), and BigQuery provides 1 TB of free processing per month. Running 3–4 queries for testing should remain within the free tier, unless your project has already consumed its quota. More info here: BigQuery Pricing* Why this workflow? Many organizations already use BigQuery to store enterprise data, and OpenAI for LLM use cases. When it comes to RAG, the common approach is to rely on dedicated vector databases such as Qdrant, Pinecone, Weaviate, or PostgreSQL with pgvector. Those are good choices, but in cases where an organization already uses and is familiar with BigQuery, it can be more efficient to leverage its built-in vector capabilities for RAG. Then comes the question of the LLM. If OpenAI is the chosen provider, teams are often frustrated that it is not directly compatible with BigQuery. This workflow solves that limitation. Prerequisites To use this workflow, you will need: A good understanding of BigQuery and its vector capabilities A BigQuery table containing documents and an embeddings column The embeddings column must be of type FLOAT and mode REPEATED (to store arrays) A data pipeline that generates embeddings with the OpenAI API and stores them in BigQuery This template comes with a public table that stores part of the n8n documentation (about nodes and triggers), so you can try it out: n8n-docs-rag.n8ndocs.n8ndocs_embeddings How it works The system consists of two workflows: Main workflow → Hosts the AI Agent, which connects to a subworkflow for RAG Subworkflow → Queries the BigQuery vector table. The retrieved documents are then used by the AI Agent to generate an answer for the user.

DatakiBy Dataki
358
All templates loaded