Back to Catalog

AI powered web scraping with Jina, Google Sheets and OpenAI : the EASY way

Derek CheungDerek Cheung
50430 views
2/3/2026
Official Page

Purpose of workflow: The purpose of this workflow is to automate scraping of a website, transforming it into a structured format, and loading it directly into a Google Sheets spreadsheet.

How it works:

  1. Web Scraping: Uses the Jina AI service to scrape website data and convert it into LLM-friendly text.
  2. Information Extraction: Employs an AI node to extract specific book details (title, price, availability, image URL, product URL) from the scraped data.
  3. Data Splitting: Splits the extracted information into individual book entries.
  4. Google Sheets Integration: Automatically populates a Google Sheets spreadsheet with the structured book data.

Step by step setup:

  1. Set up Jina AI service:

    • Sign up for a Jina AI account and obtain an API key.
  2. Configure the HTTP Request node:

    • Enter the Jina AI URL with the target website.
    • Add the API key to the request headers for authentication.
  3. Set up the Information Extractor node:

    • Use Claude AI to generate a JSON schema for data extraction.
    • Upload a screenshot of the target website to Claude AI.
    • Ask Claude AI to suggest a JSON schema for extracting required information.
    • Copy the generated schema into the Information Extractor node.
  4. Configure the Split node:

    • Set it up to separate the extracted data into individual book entries.
  5. Set up the Google Sheets node:

    • Create a Google Sheets spreadsheet with columns for title, price, availability, image URL, and product URL.
    • Configure the node to map the extracted data to the appropriate columns.

AI-Powered Web Scraping with Jina, Google Sheets, and OpenAI

This n8n workflow provides an easy way to perform AI-powered web scraping. It allows you to extract structured information from web pages using Jina AI's reader service, summarize the content with OpenAI, and then store the extracted data in a Google Sheet.

What it does

  1. Triggers Manually: The workflow starts when manually executed.
  2. Reads URLs from Google Sheets: It fetches a list of URLs from a specified Google Sheet.
  3. Scrapes Web Content with Jina AI: For each URL, it uses Jina AI's reader service to fetch the clean, readable content of the web page.
  4. Extracts Structured Information with OpenAI: It then uses an OpenAI Chat Model (via the Langchain Information Extractor node) to extract specific, structured information (e.g., title, description, author, publication date) from the scraped content based on a defined schema.
  5. Summarizes Content with OpenAI: It also uses another OpenAI Chat Model to generate a concise summary of the web page content.
  6. Writes Data to Google Sheets: Finally, it appends the extracted structured data and the generated summary for each URL back into a Google Sheet.

Prerequisites/Requirements

  • n8n Account: A running n8n instance.
  • Google Sheets Account: To store and retrieve URLs and extracted data.
  • Jina AI API Key: For the Jina AI Reader service (used implicitly by the HTTP Request node for web scraping).
  • OpenAI API Key: For the OpenAI Chat Model nodes.

Setup/Usage

  1. Import the workflow: Download the JSON provided and import it into your n8n instance.
  2. Configure Credentials:
    • Google Sheets: Set up your Google Sheets OAuth2 credentials. Ensure the service account or user has read/write access to the target spreadsheet.
    • OpenAI: Set up your OpenAI API Key credentials.
  3. Configure Google Sheets (Read):
    • In the first "Google Sheets" node, specify the "Spreadsheet ID" and "Sheet Name" where your URLs are listed.
    • Ensure your URLs are in a column that can be easily referenced (e.g., URL).
  4. Configure HTTP Request (Jina AI):
    • The "HTTP Request" node is configured to use Jina AI's reader service. You might need to add your Jina AI API key as a header or modify the URL if Jina's API requires it differently.
    • The URL for Jina AI's reader typically looks like https://r.jina.ai/{your_target_url}.
  5. Configure OpenAI Chat Model (Information Extractor):
    • In the "Information Extractor" node, define the JSON schema for the structured data you want to extract (e.g., {"title": "string", "description": "string", "author": "string", "published_date": "string"}).
    • Ensure the "Input Data" points to the content scraped by Jina AI.
  6. Configure OpenAI Chat Model (Summary):
    • In the "OpenAI Chat Model" node for summarization, define a clear prompt for generating a summary.
    • Ensure the "Input Data" points to the content scraped by Jina AI.
  7. Configure Google Sheets (Write):
    • In the second "Google Sheets" node, specify the "Spreadsheet ID" and "Sheet Name" where you want to write the extracted data.
    • Map the fields from the "Information Extractor" and "OpenAI Chat Model" (summary) nodes to the corresponding columns in your Google Sheet.
  8. Execute the workflow: Click "Execute Workflow" to run it manually and process the URLs from your Google Sheet.

Related Templates

Auto-create TikTok videos with VEED.io AI avatars, ElevenLabs & GPT-4

💥 Viral TikTok Video Machine: Auto-Create Videos with Your AI Avatar --- 🎯 Who is this for? This workflow is for content creators, marketers, and agencies who want to use Veed.io’s AI avatar technology to produce short, engaging TikTok videos automatically. It’s ideal for creators who want to appear on camera without recording themselves, and for teams managing multiple brands who need to generate videos at scale. --- ⚙️ What problem this workflow solves Manually creating videos for TikTok can take hours — finding trends, writing scripts, recording, and editing. By combining Veed.io, ElevenLabs, and GPT-4, this workflow transforms a simple Telegram input into a ready-to-post TikTok video featuring your AI avatar powered by Veed.io — speaking naturally with your cloned voice. --- 🚀 What this workflow does This automation links Veed.io’s video-generation API with multiple AI tools: Analyzes TikTok trends via Perplexity AI Writes a 10-second viral script using GPT-4 Generates your voiceover via ElevenLabs Uses Veed.io (Fabric 1.0 via FAL.ai) to animate your avatar and sync the lips to the voice Creates an engaging caption + hashtags for TikTok virality Publishes the video automatically via Blotato TikTok API Logs all results to Google Sheets for tracking --- 🧩 Setup Telegram Bot Create your bot via @BotFather Configure it as the trigger for sending your photo and theme Connect Veed.io Create an account on Veed.io Get your FAL.ai API key (Veed Fabric 1.0 model) Use HTTPS image/audio URLs compatible with Veed Fabric Other APIs Add Perplexity, ElevenLabs, and Blotato TikTok keys Connect your Google Sheet for logging results --- 🛠️ How to customize this workflow Change your Avatar: Upload a new image through Telegram, and Veed.io will generate a new talking version automatically. Modify the Script Style: Adjust the GPT prompt for tone (educational, funny, storytelling). Adjust Voice Tone: Tweak ElevenLabs stability and similarity settings. Expand Platforms: Add Instagram, YouTube Shorts, or X (Twitter) posting nodes. Track Performance: Customize your Google Sheet to measure your most successful Veed.io-based videos. --- 🧠 Expected Outcome In just a few seconds after sending your photo and theme, this workflow — powered by Veed.io — creates a fully automated TikTok video featuring your AI avatar with natural lip-sync and voice. The result is a continuous stream of viral short videos, made without cameras, editing, or effort. --- ✅ Import the JSON file in n8n, add your API keys (including Veed.io via FAL.ai), and start generating viral TikTok videos starring your AI avatar today! 🎥 Watch This Tutorial --- 📄 Documentation: Notion Guide Need help customizing? Contact me for consulting and support : Linkedin / Youtube

Dr. FirasBy Dr. Firas
39510

Track competitor SEO keywords with Decodo + GPT-4.1-mini + Google Sheets

This workflow automates competitor keyword research using OpenAI LLM and Decodo for intelligent web scraping. Who this is for SEO specialists, content strategists, and growth marketers who want to automate keyword research and competitive intelligence. Marketing analysts managing multiple clients or websites who need consistent SEO tracking without manual data pulls. Agencies or automation engineers using Google Sheets as an SEO data dashboard for keyword monitoring and reporting. What problem this workflow solves Tracking competitor keywords manually is slow and inconsistent. Most SEO tools provide limited API access or lack contextual keyword analysis. This workflow solves that by: Automatically scraping any competitor’s webpage with Decodo. Using OpenAI GPT-4.1-mini to interpret keyword intent, density, and semantic focus. Storing structured keyword insights directly in Google Sheets for ongoing tracking and trend analysis. What this workflow does Trigger — Manually start the workflow or schedule it to run periodically. Input Setup — Define the website URL and target country (e.g., https://dev.to, france). Data Scraping (Decodo) — Fetch competitor web content and metadata. Keyword Analysis (OpenAI GPT-4.1-mini) Extract primary and secondary keywords. Identify focus topics and semantic entities. Generate a keyword density summary and SEO strength score. Recommend optimization and internal linking opportunities. Data Structuring — Clean and convert GPT output into JSON format. Data Storage (Google Sheets) — Append structured keyword data to a Google Sheet for long-term tracking. Setup Prerequisites If you are new to Decode, please signup on this link visit.decodo.com n8n account with workflow editor access Decodo API credentials OpenAI API key Google Sheets account connected via OAuth2 Make sure to install the Decodo Community node. Create a Google Sheet Add columns for: primarykeywords, seostrengthscore, keyworddensity_summary, etc. Share with your n8n Google account. Connect Credentials Add credentials for: Decodo API credentials - You need to register, login and obtain the Basic Authentication Token via Decodo Dashboard OpenAI API (for GPT-4o-mini) Google Sheets OAuth2 Configure Input Fields Edit the “Set Input Fields” node to set your target site and region. Run the Workflow Click Execute Workflow in n8n. View structured results in your connected Google Sheet. How to customize this workflow Track Multiple Competitors → Use a Google Sheet or CSV list of URLs; loop through them using the Split In Batches node. Add Language Detection → Add a Gemini or GPT node before keyword analysis to detect content language and adjust prompts. Enhance the SEO Report → Expand the GPT prompt to include backlink insights, metadata optimization, or readability checks. Integrate Visualization → Connect your Google Sheet to Looker Studio for SEO performance dashboards. Schedule Auto-Runs → Use the Cron Node to run weekly or monthly for competitor keyword refreshes. Summary This workflow automates competitor keyword research using: Decodo for intelligent web scraping OpenAI GPT-4.1-mini for keyword and SEO analysis Google Sheets for live tracking and reporting It’s a complete AI-powered SEO intelligence pipeline ideal for teams that want actionable insights on keyword gaps, optimization opportunities, and content focus trends, without relying on expensive SEO SaaS tools.

Ranjan DailataBy Ranjan Dailata
161

Two-way property repair management system with Google Sheets & Drive

This workflow automates the repair request process between tenants and building managers, keeping all updates organized in a single spreadsheet. It is composed of two coordinated workflows, as two separate triggers are required — one for new repair submissions and another for repair updates. A Unique Unit ID that corresponds to individual units is attributed to each request, and timestamps are used to coordinate repair updates with specific requests. General use cases include: Property managers who manage multiple buildings or units. Building owners looking to centralize tenant repair communication. Automation builders who want to learn multi-trigger workflow design in n8n. --- ⚙️ How It Works Workflow 1 – New Repair Requests Behind the Scenes: A tenant fills out a Google Form (“Repair Request Form”), which automatically adds a new row to a linked Google Sheet. Steps: Trigger: Google Sheets rowAdded – runs when a new form entry appears. Extract & Format: Collects all relevant form data (address, unit, urgency, contacts). Generate Unit ID: Creates a standardized identifier (e.g., BUILDING-UNIT) for tracking. Email Notification: Sends the building manager a formatted email summarizing the repair details and including a link to a Repair Update Form (which activates Workflow 2). --- Workflow 2 – Repair Updates Behind the Scenes:\ Triggered when the building manager submits a follow-up form (“Repair Update Form”). Steps: Lookup by UUID: Uses the Unit ID from Workflow 1 to find the existing row in the Google Sheet. Conditional Logic: If photos are uploaded: Saves each image to a Google Drive folder, renames files consistently, and adds URLs to the sheet. If no photos: Skips the upload step and processes textual updates only. Merge & Update: Combines new data with existing repair info in the same spreadsheet row — enabling a full repair history in one place. --- 🧩 Requirements Google Account (for Forms, Sheets, and Drive) Gmail/email node connected for sending notifications n8n credentials configured for Google API access --- ⚡ Setup Instructions (see more detail in workflow) Import both workflows into n8n, then copy one into a second workflow. Change manual trigger in workflow 2 to a n8n Form node. Connect Google credentials to all nodes. Update spreadsheet and folder IDs in the corresponding nodes. Customize email text, sender name, and form links for your organization. Test each workflow with a sample repair request and a repair update submission. --- 🛠️ Customization Ideas Add Slack or Telegram notifications for urgent repairs. Auto-create folders per building or unit for photo uploads. Generate monthly repair summaries using Google Sheets triggers. Add an AI node to create summaries/extract relevant repair data from repair request that include long submissions.

Matt@VeraisonLabsBy Matt@VeraisonLabs
208