rag-chatbot-n8n-google-drive-gemini

Sharing with you how I turned a messy Google Drive into an instant-answer chatbot using n8n, Gemini APIs (embeddings + chat), and Pinecone (Vector DB). It watches a Drive folder, indexes docs, and answers sales questions with exact fields and source citations. This guide breaks down every node, setting, and connection so you can copy it for your own sales agency—for free to demo.

Why this matters (and why it’s free-friendly)

Sales teams lose time hunting across spreadsheets and PDFs during calls. This RAG (Retrieval-Augmented Generation) chatbot answers from your own documents, with strict guardrails: no made-up numbers, always cite the source, and update automatically when files change. You can run the whole demo with free tiers/trials and move to production later—without re-architecting.

What you’ll build

A Drive watcher that reacts to new/updated files in a chosen folder.
An ingest pipeline: download → convert to text → split into chunks → embed with Gemini → upsert to Pinecone.
A hosted chat (n8n) with an AI Agent that must call a Vector Store Tool before answering, so replies are grounded in your documents.

Architecture at a glance

Orchestrator: n8n (cloud)
Knowledge source: Google Drive (PDF, Docs, Slides converted to PDF automatically)
Embeddings + Chat: Google Gemini (Embeddings node + Chat model)
Vector DB: Pinecone (index for your company docs) — index name used here: realstate-prod (rename to your brand).

Node-by-node: exact workflow I shipped

1) Keep the index fresh (Drive triggers → search → download)

Google Drive File Created (poll every minute; scoped to your folder) and Google Drive File Updated (same). Both feed the ingest path so any new/edited file is processed. The workflow points to a folder labeled “RealState” in Drive for the demo.
Search files and folders filters within that folder before download.
Download File From Google Drive with Google file conversion enabled (Docs/Slides/Drawings → PDF). This guarantees consistent text extraction downstream.

Pro tip: Two Wait nodes are included but disabled—use them for throttling/backoff if you expect spikes.

2) Extract, chunk, embed, and index

Default Data Loader set to binary turns the just-downloaded file into a document for processing.
Recursive Character Text Splitter with chunkOverlap = 100 preserves context across chunks for better retrieval quality.
Embeddings Google Gemini generates vectors for each chunk (works seamlessly with the n8n Gemini credentials).
Pinecone Vector Store in insert/upsert mode writes the vectors and metadata to your index (realstate-prod). Use stable IDs so re-ingesting updates instead of duplicating.

3) Chat experience with guarded tool use

When a chat message is received (n8n hosted chat) triggers the inference chain.
AI Agent with a strict system message (sales assistant persona, tool-first answering, no reformatting numbers, fallback if not grounded).
Vector Store Tool named get_data_tool (description: retrieve from company documents). It connects the Agent to Pinecone Vector Store (Retrieval) + Embeddings Google Gemini (retrieval) + Google Gemini Chat Model (retrieval) so the Agent can form a query, embed it, retrieve top-K, and compose an answer—from the data only.
Google Gemini Chat Model is also attached to the Agent (model set to models/gemini-2.0-flash-exp in the demo; you can switch to a stable non-exp model for production).
Window Buffer Memory keeps short conversational context; good for quick follow-ups without losing guardrails.

Step-by-step build guide (you can follow this in n8n)

A) Credentials (one time)

Add Google Drive OAuth2 and pick your Drive account.
Add Google Gemini (PaLM) API credentials (API key from AI Studio).
Add Pinecone API credentials (API key + project).
The workflow’s “Sticky Note” includes a concise setup checklist you can reuse.

B) Create your Pinecone index

Name: e.g., company-files (the demo uses realstate-prod)
Metric: cosine (recommended)
Dimension: match your embedding model (e.g., 768 for Gemini text-embedding-004)

C) Ingest workflow connections

Google Drive File Created and File Updated → (optionally Wait) → Search files and folders → Download File From Google Drive (with conversion) → Default Data Loader → Recursive Character Text Splitter → Embeddings Google Gemini → Pinecone Vector Store (insert/upsert).

D) Chat workflow connections

When chat message received → AI Agent
AI Agent (tools connector) → Vector Store Tool (get_data_tool)
Vector Store Tool → Pinecone Vector Store (Retrieval) + Embeddings Google Gemini (retrieval) + Google Gemini Chat Model (retrieval)
AI Agent (language model connector) → Google Gemini Chat Model
AI Agent (memory connector) → Window Buffer Memory

E) Prompt guardrails that keep answers trustworthy

The system message in the Agent enforces: tool-first retrieval, no answering from memory, no number reformatting, and a fallback:

“I can’t find this info in our knowledge base.”
It also asks for field-by-field outputs (Summary, Property Details, Source) so sales reps see exact values and where they came from.

Production tips & free-tier hygiene

Avoid experimental models in production (swap gemini-2.0-flash-exp for a stable model if you hit tight quotas).
Batch/throttle ingestion with the Wait nodes if you drop many files at once (they’re already in the template and can be enabled).
Store structured fields in Pinecone metadata (e.g., Project, Price, Bedrooms) to enable filters (city/BR/price caps) right in the retrieval tool.
Confidence-gate weak retrievals (if top score is low or 0 docs): auto-respond with the fallback to avoid shaky answers.
Use stable IDs when upserting so modified files update in place.

Who this is for

Any sales agency or team that answers customer questions from internal files: real estate, SaaS plans, pricing sheets, insurance policies, support runbooks, onboarding docs—you name it. The patterns and nodes here are industry-agnostic.

About me (and how I can help)

I’m a Senior Full-Stack Software Engineer & AI Consultant. I build search-quality, sales-ready assistants using your existing tools (Google Drive, SharePoint, Sheets, CRMs) with strict guardrails and immediate ROI.

FAQs

Q: Can I keep my files in Google Drive as my source of truth?
Yes. The workflow watches a specific folder for adds/edits and re-indexes automatically.

Q: Do I need code?
No. This is a no-code/low-code n8n build; all steps are nodes and connections.

Q: What about numbers being misread?
Use Drive conversion to PDF for consistent text extraction, store exact numeric fields in metadata, and keep the “no reformatting” rule in the prompt.

Q: Can I adapt this beyond real estate?
Absolutely. It’s a general RAG pattern for any sales/support team answering from internal docs.

Call to action

If you want the JSON workflow and a quick-start checklist to copy this in under an hour, drop a comment or contact me. I’ll share the exact template I used—including the Agent prompt, tool wiring, and Pinecone settings—and help you adapt it to your business.

This post reflects the exact configuration shipped in my workflow export (node names, settings, and connections). Import, update credentials, point to your Drive folder, and you’re live.

Search

Recent Posts

Categories

How I Built a Free n8n RAG Chatbot That Answers From Google Drive (Powered by Gemini + Pinecone)

Why this matters (and why it’s free-friendly)

What you’ll build

Architecture at a glance

Node-by-node: exact workflow I shipped