Analyze Crunchbase startups by keyword with Bright Data, Gemini AI & Google Sheets

347 views

2/3/2026

Order Customer Webhook Shopify Data Transformation

This n8n workflow automates the discovery, enrichment, and comparative analysis of startups from the Crunchbase dataset via Bright Data, enhanced with AI, and exports structured results to Google Sheets.

🚀 What It Does

Receives a keyword from the user that describes the area of interest — such as an industry, sector, technology, or trend (e.g., "AI in healthcare", "carbon capture", "edtech").
This keyword is used to filter relevant startups from the Crunchbase dataset via Bright Data.
Fetches data from Bright Data's Crunchbase snapshot API.
Extracts and cleans key fields from the JSON response.
Sorts startups by most recent founding date.
Selects the top 10 most recent companies.
Sends these 10 companies to Google Gemini AI for comparative analysis.
Embeds the AI-generated summary into the final export.
Appends results to a Google Sheet for tracking and reporting.

🛠️ Step-by-Step Setup

Get user keyword input from a form.
Use 3 Bright Data requests: Start snapshot. Poll snapshot status until ready. Fetch snapshot data in JSON format.
Use a Python Code node to:
Parse and sort companies by founded_date.
Clean and standardize data fields.
Pass the top 10 companies into Gemini AI for comparative insight.
Merge the AI output back with company data.
Send everything to Google Sheets.

🧠 How It Works

Snapshot Control: Polls every few seconds until the Bright Data snapshot is complete.
Code Cleanup: Ensures consistent structure and formatting across all records.
Comparative AI Analysis: Gemini compares all 10 companies at once and returns a unified analysis.
Merging Output: AI analysis is merged into the first company’s record (to avoid duplication), while all 10 are exported.

📤 Google Sheet Output Each row includes:

name, founded, about, num_employees, type, ipo_status, full_description, social_media_links, address, website, funding_total, num_investors, lead_investors, founders, products_and_services, monthly_visits, crunchbase_link, ai_analysis.
AI comparative analysis summary (only once per batch – attached to the first company).
All fields from above customizible through the python code (you can add additional ones from Bright Data output).

🔐 Required Credentials

Bright Data – Replace YOUR_API_KEY in 3 HTTP Request nodes.
Google Gemini API – For AI analysis.
Google Sheets OAuth2 – For spreadsheet export.

⚠️ Notes

AI output is shared once per batch of 10 companies, attached to the first company entry. You can configure the limit of batch size in the first "Code" node.

Analyze Crunchbase Startups by Keyword with Bright Data, Gemini AI, and Google Sheets

This n8n workflow automates the process of extracting startup data from Crunchbase using Bright Data, enriching it with AI-powered analysis from Google Gemini, and then storing the results in Google Sheets. It's designed to help you quickly identify and understand startups relevant to specific keywords.

What it does

Triggers on Form Submission: The workflow starts when a new submission is received via an n8n form. This form likely collects the keywords to search for.
Extracts Keywords: It uses a Code node to process the form submission and extract the keywords provided by the user.
Scrapes Crunchbase with Bright Data: An HTTP Request node is configured to interact with the Bright Data API. It sends the extracted keywords to Bright Data to initiate a scraping job on Crunchbase, collecting relevant startup information.
Waits for Scraping Completion: A Wait node pauses the workflow execution for a specified duration to allow the Bright Data scraping job to complete.
Retrieves Scraped Data: After the wait, another HTTP Request node fetches the completed scraping results from Bright Data.
Analyzes Data with Google Gemini AI: The retrieved startup data is then passed to a Basic LLM Chain node, which uses the Google Gemini Chat Model. This AI component is configured to analyze the scraped data (e.g., company descriptions, funding rounds, industries) and provide insights or summaries based on the keywords.
Conditionally Stores Results: An If node checks if the AI analysis yielded meaningful results.
- If successful: The analyzed data is then written to a Google Sheet.
- If no results/error: The workflow proceeds to merge the empty data.
Merges Data: A Merge node combines the successful AI analysis output with any empty data paths, ensuring the workflow completes consistently.
Writes to Google Sheets: Finally, the processed and AI-enriched startup data is appended as new rows to a designated Google Sheet, providing a structured overview of the findings.

Prerequisites/Requirements

n8n Instance: A running n8n instance.
Bright Data Account: An account with Bright Data (formerly Luminati) and an API key for web scraping.
Google Cloud Project & Gemini API: Access to Google Gemini API (requires a Google Cloud project and API key).
Google Sheets Account: A Google account with access to Google Sheets.
Google Sheets Credential in n8n: An n8n credential configured for Google Sheets.
Bright Data Credential in n8n: An n8n credential configured for Bright Data (likely an API key or similar).
Google Gemini Credential in n8n: An n8n credential configured for Google Gemini.

Setup/Usage

Import the Workflow: Download the provided JSON and import it into your n8n instance.
Configure Credentials:
- Set up your Google Sheets credential in n8n.
- Set up your Bright Data credential in n8n.
- Set up your Google Gemini credential in n8n.
Configure the "On form submission" Trigger:
- Activate the form trigger to generate its unique URL.
- Customize the form fields to collect the desired keywords for your Crunchbase search.
Configure "Code" Node (ID: 834):
- Review and adjust the JavaScript code to correctly extract the keywords from the incoming form data.
Configure "HTTP Request" Nodes (IDs: 19):
- Bright Data Scrape Request: Update the URL and body to match your Bright Data API endpoint for Crunchbase scraping, including passing the keywords from the previous node.
- Bright Data Get Results: Update the URL to retrieve the results of your Bright Data job.
Configure "Wait" Node (ID: 514):
- Adjust the wait time as necessary to ensure Bright Data has sufficient time to complete the scraping job.
Configure "Basic LLM Chain" Node (ID: 1123) and "Google Gemini Chat Model" Node (ID: 1262):
- Ensure the Gemini credential is selected.
- Refine the prompt within the LLM Chain to guide Gemini AI on how to analyze the Crunchbase data effectively based on your requirements (e.g., "Summarize the company's mission," "Identify key investors," "Assess market potential for X keyword").
Configure "Google Sheets" Node (ID: 18):
- Select your Google Sheets credential.
- Specify the Spreadsheet ID and Sheet Name where you want to store the analyzed startup data.
- Map the output fields from the AI analysis to the correct columns in your Google Sheet.
Activate the Workflow: Once all configurations are complete, activate the workflow.

Now, whenever you submit the n8n form with new keywords, the workflow will automatically scrape Crunchbase, analyze the results with Google Gemini, and update your Google Sheet.

Related Templates

Auto-create TikTok videos with VEED.io AI avatars, ElevenLabs & GPT-4

💥 Viral TikTok Video Machine: Auto-Create Videos with Your AI Avatar --- 🎯 Who is this for? This workflow is for content creators, marketers, and agencies who want to use Veed.io’s AI avatar technology to produce short, engaging TikTok videos automatically. It’s ideal for creators who want to appear on camera without recording themselves, and for teams managing multiple brands who need to generate videos at scale. --- ⚙️ What problem this workflow solves Manually creating videos for TikTok can take hours — finding trends, writing scripts, recording, and editing. By combining Veed.io, ElevenLabs, and GPT-4, this workflow transforms a simple Telegram input into a ready-to-post TikTok video featuring your AI avatar powered by Veed.io — speaking naturally with your cloned voice. --- 🚀 What this workflow does This automation links Veed.io’s video-generation API with multiple AI tools: Analyzes TikTok trends via Perplexity AI Writes a 10-second viral script using GPT-4 Generates your voiceover via ElevenLabs Uses Veed.io (Fabric 1.0 via FAL.ai) to animate your avatar and sync the lips to the voice Creates an engaging caption + hashtags for TikTok virality Publishes the video automatically via Blotato TikTok API Logs all results to Google Sheets for tracking --- 🧩 Setup Telegram Bot Create your bot via @BotFather Configure it as the trigger for sending your photo and theme Connect Veed.io Create an account on Veed.io Get your FAL.ai API key (Veed Fabric 1.0 model) Use HTTPS image/audio URLs compatible with Veed Fabric Other APIs Add Perplexity, ElevenLabs, and Blotato TikTok keys Connect your Google Sheet for logging results --- 🛠️ How to customize this workflow Change your Avatar: Upload a new image through Telegram, and Veed.io will generate a new talking version automatically. Modify the Script Style: Adjust the GPT prompt for tone (educational, funny, storytelling). Adjust Voice Tone: Tweak ElevenLabs stability and similarity settings. Expand Platforms: Add Instagram, YouTube Shorts, or X (Twitter) posting nodes. Track Performance: Customize your Google Sheet to measure your most successful Veed.io-based videos. --- 🧠 Expected Outcome In just a few seconds after sending your photo and theme, this workflow — powered by Veed.io — creates a fully automated TikTok video featuring your AI avatar with natural lip-sync and voice. The result is a continuous stream of viral short videos, made without cameras, editing, or effort. --- ✅ Import the JSON file in n8n, add your API keys (including Veed.io via FAL.ai), and start generating viral TikTok videos starring your AI avatar today! 🎥 Watch This Tutorial --- 📄 Documentation: Notion Guide Need help customizing? Contact me for consulting and support : Linkedin / Youtube

By Dr. Firas

39510

Automate Dutch Public Procurement Data Collection with TenderNed

TenderNed Public Procurement What This Workflow Does This workflow automates the collection of public procurement data from TenderNed (the official Dutch tender platform). It: Fetches the latest tender publications from the TenderNed API Retrieves detailed information in both XML and JSON formats for each tender Parses and extracts key information like organization names, titles, descriptions, and reference numbers Filters results based on your custom criteria Stores the data in a database for easy querying and analysis Setup Instructions This template comes with sticky notes providing step-by-step instructions in Dutch and various query options you can customize. Prerequisites TenderNed API Access - Register at TenderNed for API credentials Configuration Steps Set up TenderNed credentials: Add HTTP Basic Auth credentials with your TenderNed API username and password Apply these credentials to the three HTTP Request nodes: "Tenderned Publicaties" "Haal XML Details" "Haal JSON Details" Customize filters: Modify the "Filter op ..." node to match your specific requirements Examples: specific organizations, contract values, regions, etc. How It Works Step 1: Trigger The workflow can be triggered either manually for testing or automatically on a daily schedule. Step 2: Fetch Publications Makes an API call to TenderNed to retrieve a list of recent publications (up to 100 per request). Step 3: Process & Split Extracts the tender array from the response and splits it into individual items for processing. Step 4: Fetch Details For each tender, the workflow makes two parallel API calls: XML endpoint - Retrieves the complete tender documentation in XML format JSON endpoint - Fetches metadata including reference numbers and keywords Step 5: Parse & Merge Parses the XML data and merges it with the JSON metadata and batch information into a single data structure. Step 6: Extract Fields Maps the raw API data to clean, structured fields including: Publication ID and date Organization name Tender title and description Reference numbers (kenmerk, TED number) Step 7: Filter Applies your custom filter criteria to focus on relevant tenders only. Step 8: Store Inserts the processed data into your database for storage and future analysis. Customization Tips Modify API Parameters In the "Tenderned Publicaties" node, you can adjust: offset: Starting position for pagination size: Number of results per request (max 100) Add query parameters for date ranges, status filters, etc. Add More Fields Extend the "Splits Alle Velden" node to extract additional fields from the XML/JSON data, such as: Contract value estimates Deadline dates CPV codes (procurement classification) Contact information Integrate Notifications Add a Slack, Email, or Discord node after the filter to get notified about new matching tenders. Incremental Updates Modify the workflow to only fetch new tenders by: Storing the last execution timestamp Adding date filters to the API query Only processing publications newer than the last run Troubleshooting No data returned? Verify your TenderNed API credentials are correct Check that you have setup youre filter proper Need help setting this up or interested in a complete tender analysis solution? Get in touch 🔗 LinkedIn – Wessel Bulte

By Wessel Bulte

247

Automate invoice processing with OCR, GPT-4 & Salesforce opportunity creation

PDF Invoice Extractor (AI) End-to-end pipeline: Watch Drive ➜ Download PDF ➜ OCR text ➜ AI normalize to JSON ➜ Upsert Buyer (Account) ➜ Create Opportunity ➜ Map Products ➜ Create OLI via Composite API ➜ Archive to OneDrive. --- Node by node (what it does & key setup) 1) Google Drive Trigger Purpose: Fire when a new file appears in a specific Google Drive folder. Key settings: Event: fileCreated Folder ID: google drive folder id Polling: everyMinute Creds: googleDriveOAuth2Api Output: Metadata { id, name, ... } for the new file. --- 2) Download File From Google Purpose: Get the file binary for processing and archiving. Key settings: Operation: download File ID: ={{ $json.id }} Creds: googleDriveOAuth2Api Output: Binary (default key: data) and original metadata. --- 3) Extract from File Purpose: Extract text from PDF (OCR as needed) for AI parsing. Key settings: Operation: pdf OCR: enable for scanned PDFs (in options) Output: JSON with OCR text at {{ $json.text }}. --- 4) Message a model (AI JSON Extractor) Purpose: Convert OCR text into strict normalized JSON array (invoice schema). Key settings: Node: @n8n/n8n-nodes-langchain.openAi Model: gpt-4.1 (or gpt-4.1-mini) Message role: system (the strict prompt; references {{ $json.text }}) jsonOutput: true Creds: openAiApi Output (per item): $.message.content → the parsed JSON (ensure it’s an array). --- 5) Create or update an account (Salesforce) Purpose: Upsert Buyer as Account using an external ID. Key settings: Resource: account Operation: upsert External Id Field: taxid_c External Id Value: ={{ $json.message.content.buyer.tax_id }} Name: ={{ $json.message.content.buyer.name }} Creds: salesforceOAuth2Api Output: Account record (captures Id) for downstream Opportunity. --- 6) Create an opportunity (Salesforce) Purpose: Create Opportunity linked to the Buyer (Account). Key settings: Resource: opportunity Name: ={{ $('Message a model').item.json.message.content.invoice.code }} Close Date: ={{ $('Message a model').item.json.message.content.invoice.issue_date }} Stage: Closed Won Amount: ={{ $('Message a model').item.json.message.content.summary.grand_total }} AccountId: ={{ $json.id }} (from Upsert Account output) Creds: salesforceOAuth2Api Output: Opportunity Id for OLI creation. --- 7) Build SOQL (Code / JS) Purpose: Collect unique product codes from AI JSON and build a SOQL query for PricebookEntry by Pricebook2Id. Key settings: pricebook2Id (hardcoded in script): e.g., 01sxxxxxxxxxxxxxxx Source lines: $('Message a model').first().json.message.content.products Output: { soql, codes } --- 8) Query PricebookEntries (Salesforce) Purpose: Fetch PricebookEntry.Id for each Product2.ProductCode. Key settings: Resource: search Query: ={{ $json.soql }} Creds: salesforceOAuth2Api Output: Items with Id, Product2.ProductCode (used for mapping). --- 9) Code in JavaScript (Build OLI payloads) Purpose: Join lines with PBE results and Opportunity Id ➜ build OpportunityLineItem payloads. Inputs: OpportunityId: ={{ $('Create an opportunity').first().json.id }} Lines: ={{ $('Message a model').first().json.message.content.products }} PBE rows: from previous node items Output: { body: { allOrNone:false, records:[{ OpportunityLineItem... }] } } Notes: Converts discount_total ➜ per-unit if needed (currently commented for standard pricing). Throws on missing PBE mapping or empty lines. --- 10) Create Opportunity Line Items (HTTP Request) Purpose: Bulk create OLIs via Salesforce Composite API. Key settings: Method: POST URL: https://<your-instance>.my.salesforce.com/services/data/v65.0/composite/sobjects Auth: salesforceOAuth2Api (predefined credential) Body (JSON): ={{ $json.body }} Output: Composite API results (per-record statuses). --- 11) Update File to One Drive Purpose: Archive the original PDF in OneDrive. Key settings: Operation: upload File Name: ={{ $json.name }} Parent Folder ID: onedrive folder id Binary Data: true (from the Download node) Creds: microsoftOneDriveOAuth2Api Output: Uploaded file metadata. --- Data flow (wiring) Google Drive Trigger → Download File From Google Download File From Google → Extract from File → Update File to One Drive Extract from File → Message a model Message a model → Create or update an account Create or update an account → Create an opportunity Create an opportunity → Build SOQL Build SOQL → Query PricebookEntries Query PricebookEntries → Code in JavaScript Code in JavaScript → Create Opportunity Line Items --- Quick setup checklist 🔐 Credentials: Connect Google Drive, OneDrive, Salesforce, OpenAI. 📂 IDs: Drive Folder ID (watch) OneDrive Parent Folder ID (archive) Salesforce Pricebook2Id (in the JS SOQL builder) 🧠 AI Prompt: Use the strict system prompt; jsonOutput = true. 🧾 Field mappings: Buyer tax id/name → Account upsert fields Invoice code/date/amount → Opportunity fields Product name must equal your Product2.ProductCode in SF. ✅ Test: Drop a sample PDF → verify: AI returns array JSON only Account/Opportunity created OLI records created PDF archived to OneDrive --- Notes & best practices If PDFs are scans, enable OCR in Extract from File. If AI returns non-JSON, keep “Return only a JSON array” as the last line of the prompt and keep jsonOutput enabled. Consider adding validation on parsing.warnings to gate Salesforce writes. For discounts/taxes in OLI: Standard OLI fields don’t support per-line discount amounts directly; model them in UnitPrice or custom fields. Replace the Composite API URL with your org’s domain or use the Salesforce node’s Bulk Upsert for simplicity.

By Le Nguyen

942