Extract and structure Thai documents to Google Sheets using Typhoon OCR and Llama 3.1
⚠️ Note: This template requires a community node and works only on self-hosted n8n installations. It uses the Typhoon OCR Python package and custom command execution. Make sure to install required dependencies locally.
Who is this for?
This template is for developers, operations teams, and automation builders in Thailand (or any Thai-speaking environment) who regularly process PDFs or scanned documents in Thai and want to extract structured text into a Google Sheet.
It is ideal for:
- Local government document processing
- Thai-language enterprise paperwork
- AI automation pipelines requiring Thai OCR
What problem does this solve?
Typhoon OCR is one of the most accurate OCR tools for Thai text. However, integrating it into an end-to-end workflow usually requires manual scripting and data wrangling.
This template solves that by:
- Running Typhoon OCR on PDF files
- Using AI to extract structured data fields
- Automatically storing results in Google Sheets
What this workflow does
- Trigger: Run manually or from any automation source
- Read Files: Load local PDF files from a
doc/folder - Execute Command: Run Typhoon OCR on each file using a Python command
- LLM Extraction: Send the OCR markdown to an AI model (e.g., GPT-4 or OpenRouter) to extract fields
- Code Node: Parse the LLM output as JSON
- Google Sheets: Append structured data into a spreadsheet
Setup
1. Install Requirements
- Python 3.10+
typhoon-ocr:pip install typhoon-ocr- Install Poppler and add to system PATH (needed for
pdftoppm,pdfinfo)
2. Create folders
- Create a folder called
docin the same directory where n8n runs (or mount it via Docker)
3. Google Sheet
Create a Google Sheet with the following column headers:
| book_id | date | subject | detail | signed_by | signed_by2 | contact | download_url | | -------- | ---- | ------- | ------ | ---------- | ----------- | ------- | ------------- |
You can use this example Google Sheet as a reference.
4. API Key
Export your TYPHOON_OCR_API_KEY and OPENAI_API_KEY in your environment (or set inside the command string in Execute Command node).
How to customize this workflow
- Replace the LLM provider in the
Basic LLM Chainnode (currently supports OpenRouter) - Change output fields to match your data structure (adjust the prompt and Google Sheet headers)
- Add trigger nodes (e.g., Dropbox Upload, Webhook) to automate input
About Typhoon OCR
Typhoon is a multilingual LLM and toolkit optimized for Thai NLP. It includes typhoon-ocr, a Python OCR library designed for Thai-centric documents. It is open-source, highly accurate, and works well in automation pipelines. Perfect for government paperwork, PDF reports, and multilingual documents in Southeast Asia.
n8n Workflow: Extract and Structure Thai Documents with Typhoon OCR and Llama 3.1 to Google Sheets
This n8n workflow automates the process of extracting text from a specified document, processing it with an OCR engine (presumably Typhoon OCR, based on the directory name), structuring the extracted data using a large language model (Llama 3.1), and finally writing the structured data to a Google Sheet.
What it does
This workflow performs the following steps:
- Manual Trigger: The workflow is initiated manually, allowing for on-demand processing.
- Read/Write Files from Disk: Reads a document file from the local disk. This node is configured to read binary data, implying it's designed to handle image or PDF files for OCR.
- Execute Command: Executes a shell command. Based on the context, this is likely where an external OCR tool (like Typhoon OCR) would be invoked to process the document and extract text.
- Code: Processes the output from the OCR step using custom JavaScript code. This step likely extracts the relevant text from the OCR output and prepares it for the LLM.
- Basic LLM Chain: Utilizes a LangChain "Basic LLM Chain" to process the extracted text. This chain is configured to use an "OpenRouter Chat Model".
- OpenRouter Chat Model: Interacts with a large language model via the OpenRouter service. This model (likely Llama 3.1, based on the directory name) is responsible for structuring the raw text into a desired format (e.g., JSON, key-value pairs) suitable for a spreadsheet.
- Google Sheets: Appends the structured data received from the LLM to a specified Google Sheet.
- Sticky Note: A sticky note is present in the workflow, likely for documentation or temporary notes.
Prerequisites/Requirements
To use this workflow, you will need:
- n8n Instance: A running n8n instance.
- Google Account: With access to Google Sheets.
- Google Sheets Credential: Configured in n8n to allow access to your Google Sheets.
- OpenRouter Account: An API key for the OpenRouter service.
- OpenRouter Chat Model Credential: Configured in n8n for the OpenRouter service.
- External OCR Tool (e.g., Typhoon OCR): This workflow assumes an external command-line OCR tool is installed and accessible on the server hosting n8n, as indicated by the "Execute Command" node. The specific command and its parameters will need to be configured.
- Document File: The input document file (e.g., image, PDF) that needs to be processed.
Setup/Usage
- Import the Workflow: Download the provided JSON and import it into your n8n instance.
- Configure Credentials:
- Set up your Google Sheets credential in n8n.
- Set up your OpenRouter Chat Model credential in n8n using your OpenRouter API key.
- Configure "Read/Write Files from Disk" Node:
- Specify the path to the document file you want to process.
- Configure "Execute Command" Node:
- Update the "Command" field with the actual command to run your OCR tool (e.g.,
typhoon-ocr --input {{ $node["Read/Write Files from Disk"].binary.data.fileName }} --output output.txt). - Ensure the OCR tool is installed and accessible from your n8n server.
- Update the "Command" field with the actual command to run your OCR tool (e.g.,
- Configure "Code" Node:
- Review and adjust the JavaScript code to correctly parse the output from your specific OCR tool and extract the relevant text for the LLM.
- Configure "Basic LLM Chain" and "OpenRouter Chat Model" Nodes:
- Select your OpenRouter credential.
- Adjust the model and prompt as needed to guide the LLM in structuring the extracted Thai document data into the desired format for your Google Sheet.
- Configure "Google Sheets" Node:
- Select your Google Sheets credential.
- Specify the "Spreadsheet ID" and "Sheet Name" where the data should be written.
- Ensure the "Operation" is set to "Append Row" or a suitable alternative, and map the LLM's output fields to the correct columns in your sheet.
- Execute the Workflow: Click "Execute Workflow" on the "Manual Trigger" node to run the workflow.
Related Templates
Automate task deadline reminders with Google Sheets and Gmail (Today/3-Day/7-Day)
Task Deadline Reminder Workflow (Today / 3-Day / 7-Day) Task deadline management manually is inefficient and leads to missed deadlines—especially when teams rely on spreadsheets and individual reminders. This workflow automates the entire follow-up process by reading a centralized task sheet in Google Sheets every morning, checking the deadline for each task, and sending automatic email notifications to the responsible person based on urgency. Tasks due today, within three days, or within one week are identified and routed to customized Gmail notifications, ensuring that every team member is aware of upcoming deadlines without manual checking. Who’s it for This workflow is ideal for teams and organizations that manage multiple tasks across departments and need a reliable way to stay on top of deadlines. It is especially useful for: Project managers coordinating many deadlines Back-office teams monitoring routine operational tasks Organizations with distributed members Anyone who relies on spreadsheets but needs automated follow-up By integrating Google Sheets, n8n, and Gmail, you gain a proactive notification system that keeps everyone aligned and reduces the risk of forgotten tasks. How it works Daily trigger The workflow runs every morning at 9:00 using a Schedule Trigger. Load task list from Google Sheets The workflow retrieves all rows from the designated spreadsheet, including task name, deadline, responsible person, and email address. Process tasks individually A loop node evaluates each task one by one. Evaluate deadline conditions Due today: Deadline matches today’s date Due within 3 days: Deadline falls between today and three days ahead Due within 7 days: Deadline falls between today and one week ahead Send notifications Depending on urgency: “本日が締め切りです” for tasks due today “タスク期限が三日前となりました” for tasks due within 3 days “タスクの期限が一週間以内です” for tasks due within 7 days Each email is automatically sent to the responsible person based on the “メールアドレス” field in the sheet. Complete processing The loop continues until all task rows have been checked. How to set up Import the workflow into your n8n instance Authenticate Google Sheets and select the task spreadsheet Authenticate Gmail as the sender account Confirm required columns: タスク, 期限, 担当, メールアドレス Adjust time, message text, or conditions based on your internal rules Requirements Active n8n instance Google Sheets access with permission to read the task list Gmail OAuth connection for email sending Spreadsheet with at least: task name, deadline, responsible person, email address How to customize You can expand and refine this workflow to match your company’s processes: Add Slack, Chatwork, or LINE notifications Add overdue task detection Add task priority sorting (High / Medium / Low) Log notifications back into the spreadsheet Send daily summary reports to managers This workflow provides a flexible foundation for building a complete automated task governance system.
Automate job matching with Gemini AI, Decodo scraping & resume analysis to Telegram
AI Job Matcher with Decodo, Gemini AI & Resume Analysis Sign up for Decodo — get better pricing here Who’s it for This workflow is built for job seekers, recruiters, founders, automation builders, and data engineers who want to automate job discovery and intelligently match job listings against resumes using AI. It’s ideal for anyone building job boards, candidate matching systems, hiring pipelines, or personal job alert automations using n8n. What this workflow does This workflow automatically scrapes job listings from SimplyHired using Decodo residential proxies, extracts structured job data with a Gemini AI agent, downloads resumes from Google Drive, extracts and summarizes resume content, and surfaces the most relevant job opportunities. The workflow stores structured results in a database and sends real-time notifications via Telegram, creating a scalable and low-maintenance AI-powered job matching pipeline. How it works A schedule trigger starts the workflow automatically Decodo fetches job search result pages from SimplyHired Job card HTML is extracted from the page A Gemini AI agent converts raw HTML into structured job data Resume PDFs are downloaded from Google Drive Resume text is extracted from PDF files A Gemini AI agent summarizes key resume highlights Job and resume data are stored in a database Matching job alerts are sent via Telegram How to set up Add your Decodo API credentials Add your Google Gemini API key Connect Google Drive for resume access Configure your Telegram bot Set up your database (Google Sheets by default) Update the job search URL with your keywords and location Requirements Self-hosted n8n instance Decodo account (community node) Google Gemini API access Google Drive access Telegram Bot token Google Sheets or another database > Note: This template uses a community node (Decodo) and is intended for self-hosted n8n only. How to customize the workflow Replace SimplyHired with another job board or aggregator Add job–resume matching or scoring logic Extend the resume summary with custom fields Swap Google Sheets for PostgreSQL, Supabase, or Airtable Route notifications to Slack, Email, or Webhooks Add pagination or multi-resume processing
Daily Magento 2 customer sync to Google Contacts & Sheets without duplicates
Automatically sync newly registered Magento 2 customers to Google Contacts and Google Sheets every 24 hours — with full duplication control and seamless automation. This workflow is a plug-and-play customer contact automation system designed for Magento 2 store owners, marketers, and CRM teams. It fetches customer records registered within the last 24 hours (from 00:00:00 to 23:59:59), checks against an existing Google Sheet to avoid reprocessing, and syncs only the new ones into Google Contacts. This ensures your contact list is always fresh and up to date — without clutter or duplicates. ✅ What This Workflow Does: Automates Customer Syncing Every day, it fetches newly registered Magento 2 customers via API based on the exact date range (midnight to midnight). Deduplicates Using Google Sheets A master Google Sheet tracks already-synced emails. Before adding a customer, the workflow checks this list and skips if already present. Creates Google Contacts Automatically For each unique customer, it creates a new contact in your Google Contacts, saving fields like first name, last name, and email. Logs New Entries to Google Sheets In Google Sheets, it even records magento 2 customer group, createdat, websiteid & store_id After syncing, it adds each new email to the tracking sheet, building a cumulative record of synced contacts. Fully Scheduled & Automated Can be scheduled with the Cron node to run daily (e.g., 12:05 AM) with no manual intervention required. 🔧 Modules Used: HTTP Request (Magento 2 API) Date & Time (for filtering registrations) Google Sheets (for reading/writing synced emails) Google Contacts (for contact creation) Set, IF, and Merge nodes (for control logic) Cron (for scheduling the automation) 💼 Use Cases: Keep your email marketing tools synced with Magento 2 customer data. Build a CRM-friendly contact base in Google Contacts without duplicates. Share customer data with sales or support teams through synced Google Sheets. Reduce manual work and human error in data transfer processes. 🔒 Credentials Required Magento 2 Bearer Auth: Set up as a credential in n8n using your Magento 2 API access token. Google API 📂 Category E-commerce → Magento 2 (Adobe Commerce) 💬 Need Help? 💡 Having trouble setting it up or want to customize this workflow further? Feel free to reach out — I’m happy to help with setup, customization, or Magento 2 API integration issues. Contact: Author 👤 Author Kanaka Kishore Kandregula Certified Magento 2 Developer https://gravatar.com/kmyprojects https://www.linkedin.com/in/kanakakishore