Building an AI Chatbot with RAG - Complete Deep Dive

Shreyesh kolhe

January 20, 2026 January 20, 2026

Building an AI Chatbot with RAG - Complete Deep Dive

Hey everyone! Welcome back to another tutorial. Today we are going to understand how to build an AI-powered chatbot that can answer questions about ANY business using their website content!

I am super excited about this topic because this is how modern AI assistants work! Companies like Intercom, Drift, and Zendesk use similar technology. Trust me, understanding RAG will make you stand out in interviews!

What we will cover:

What is RAG (Retrieval Augmented Generation)?
System Architecture Overview
Web Scraping with BeautifulSoup
Vector Databases and Embeddings
Building the RAG Pipeline
FastAPI Backend Server
Frontend Chat Interface
How Everything Works Together
Interview Questions

What is RAG (Retrieval Augmented Generation)?

Let's start with the most important question - What is RAG?

Here's the simple definition:

"RAG is a technique that combines information retrieval with AI text generation to provide accurate, context-aware responses."

Wait, what does that mean? Let me break it down for you!

Traditional ChatGPT:
====================
User: "What are your hotel's check-in hours?"
AI: "I don't have specific information about your hotel..."

RAG-powered Chatbot:
====================
User: "What are your hotel's check-in hours?"
AI: *searches hotel website data*
AI: "Our check-in time is 3:00 PM and check-out is 11:00 AM!"

See the difference? RAG gives the AI access to YOUR specific data!

Retrieval - Find relevant information from a database
Augmented - Add that information to the AI's context
Generation - Generate a response using both AI knowledge + your data

Why is RAG important?

AI can answer questions about YOUR specific business
No need to fine-tune expensive models
Data stays up-to-date (just re-scrape!)
Reduces AI hallucinations

System Architecture Overview

Before diving into code, let's understand the big picture!

AI Chatbot Architecture:
========================

                    ┌─────────────────┐
                    │   User Browser  │
                    │  (Chat Widget)  │
                    └────────┬────────┘
                             │
                             ▼
                    ┌─────────────────┐
                    │  FastAPI Server │
                    │   (server.py)   │
                    └────────┬────────┘
                             │
                             ▼
                    ┌─────────────────┐
                    │   Main Agent    │
                    │   (main.py)     │
                    └────────┬────────┘
                             │
              ┌──────────────┼──────────────┐
              ▼              ▼              ▼
     ┌──────────────┐ ┌──────────────┐ ┌──────────────┐
     │    Qdrant    │ │   OpenAI     │ │   OpenAI     │
     │ Vector Store │ │  Embeddings  │ │   GPT-4o     │
     └──────────────┘ └──────────────┘ └──────────────┘

The system has 3 main phases:

Indexing Phase - Scrape website → Create embeddings → Store in Qdrant
Query Phase - User asks question → Find relevant content → Generate answer
Response Phase - AI generates response using retrieved context

Phase 1: Web Scraping with BeautifulSoup

First, we need to scrape the business website to get all the content!

Let's look at the indexing.py file:

# indexing.py

from bs4 import BeautifulSoup
import requests
from urllib.parse import urljoin, urlparse

def scrape_website(base_url, max_depth=2):
    visited = set()
    documents = []
    headers = {"User-Agent": "aichat-scraper/1.0"}

    def scrape_page(url, depth):
        if depth > max_depth or url in visited:
            return
        visited.add(url)

        try:
            response = requests.get(url, headers=headers, timeout=10)
            soup = BeautifulSoup(response.content, 'html.parser')

            # Remove script and style tags
            for script in soup(["script", "style"]):
                script.decompose()

            # Extract clean text
            text = soup.get_text(separator=' ', strip=True)

            if text:
                documents.append({
                    "page_content": text,
                    "metadata": {"source": url}
                })

            # Find and follow internal links
            if depth < max_depth:
                for link in soup.find_all('a', href=True):
                    href = link['href']
                    full_url = urljoin(url, href)
                    # Only follow same-domain links
                    if urlparse(full_url).netloc == urlparse(base_url).netloc:
                        scrape_page(full_url, depth + 1)

        except Exception as e:
            print(f"Error scraping {url}: {e}")

    scrape_page(base_url, 0)
    return documents

How does web scraping work?

Web Scraping Flow:
==================

1. Start at base URL (e.g., https://myhotel.com/)
   │
2. Download HTML content using requests
   │
3. Parse HTML with BeautifulSoup
   │
4. Remove <script> and <style> tags (noise!)
   │
5. Extract clean text content
   │
6. Find all <a href="..."> links
   │
7. Follow internal links (same domain only)
   │
8. Repeat until max_depth reached!

Key Points:

max_depth=2 - How many levels deep to crawl (home → subpage → sub-subpage)
visited set - Prevents visiting same page twice
User-Agent header - Identifies our scraper to the website
Same domain check - Don't follow external links!

Phase 2: Vector Embeddings and Qdrant

Now comes the magic part - converting text into vectors!

What are Embeddings?

Embeddings convert text into numbers (vectors) that capture meaning!

Text to Embedding:
==================

"Our hotel has a swimming pool"
        │
        ▼ OpenAI Embedding Model
        │
[0.023, -0.156, 0.892, 0.445, ... 1536 numbers]

"We offer a pool for guests"
        │
        ▼
[0.021, -0.148, 0.887, 0.451, ... 1536 numbers]

These vectors are SIMILAR because meanings are similar!

Why vectors? Because we can do similarity search!

# Using OpenAI Embeddings
from langchain_openai import OpenAIEmbeddings

embedding = OpenAIEmbeddings(model="text-embedding-3-small")

# This model converts text → 1536-dimensional vector

What is Qdrant?

Qdrant is a Vector Database - it stores and searches vectors efficiently!

Vector Database (Qdrant):
=========================

┌──────────────────────────────────────────────────┐
│  Collection: "website_collection"                │
├──────────────────────────────────────────────────┤
│  ID   │  Vector (1536 dims)  │  Metadata         │
├───────┼──────────────────────┼───────────────────┤
│  1    │  [0.02, -0.15, ...]  │  {url: "/about"}  │
│  2    │  [0.89, 0.23, ...]   │  {url: "/rooms"}  │
│  3    │  [-0.12, 0.67, ...]  │  {url: "/contact"}│
└──────────────────────────────────────────────────┘

When user asks: "Do you have rooms available?"
→ Convert question to vector
→ Find similar vectors in database
→ Return matching content!

Here's how we index the website content:

# indexing.py

from langchain_qdrant import QdrantVectorStore
from langchain_text_splitters import RecursiveCharacterTextSplitter

def index_website():
    # Load scraped documents
    with open("website_documents.json", "r") as f:
        documents_json = json.load(f)

    # Convert to LangChain Document objects
    documents = [
        Document(page_content=doc["page_content"], metadata=doc["metadata"])
        for doc in documents_json
    ]

    # Split into smaller chunks (important!)
    splitter = RecursiveCharacterTextSplitter(
        chunk_size=1000,      # Max 1000 characters per chunk
        chunk_overlap=200     # 200 char overlap between chunks
    )
    texts = splitter.split_documents(documents)

    # Store in Qdrant with embeddings
    vector_store = QdrantVectorStore.from_documents(
        texts,
        embedding=embedding,
        collection_name="website_collection",
        url="http://localhost:6333"
    )

Why do we split text into chunks?

Why Chunking is Important:
==========================

Original page: 5000 characters
"Welcome to our hotel... [lots of content] ...Contact us!"

Problem: Too big! Embedding loses detail.

Solution: Split into chunks!

Chunk 1: "Welcome to our hotel. We offer luxury rooms..."
Chunk 2: "Our amenities include swimming pool, gym..."
Chunk 3: "Room types: Standard, Deluxe, Suite..."
Chunk 4: "Contact us at email@hotel.com..."

Now each chunk has FOCUSED meaning!

Phase 3: The RAG Query Pipeline

Now let's see how the chatbot answers questions!

Look at the main.py file:

# main.py

def query_website(query: str) -> str:
    # Connect to existing Qdrant collection
    vector_store = QdrantVectorStore.from_existing_collection(
        embedding=embedding,
        collection_name="website_collection",
        url=QDRANT_URL
    )

    # Find top 10 most similar chunks!
    relevant_docs = vector_store.similarity_search(query, k=10)

    # Combine all relevant content
    combined_content = "\n".join([doc.page_content for doc in relevant_docs])

    return combined_content

How similarity search works:

User Query: "What rooms do you have?"
                │
                ▼
        Convert to vector
        [0.45, -0.23, 0.89, ...]
                │
                ▼
    Compare with ALL vectors in Qdrant
                │
                ▼
    Return top 10 most similar chunks:

    1. "Room types: Standard, Deluxe, Suite..." (0.92 similarity)
    2. "Our Deluxe rooms feature king beds..." (0.89 similarity)
    3. "Suite amenities include jacuzzi..." (0.85 similarity)
    ...

Now we pass this context to GPT-4:

# main.py

def get_business_info(query: str) -> str:
    # Get relevant content from vector store
    content = query_website(query)

    llm = ChatOpenAI(model="gpt-4o", temperature=0.8)

    prompt = f"""
You are responding on behalf of {business_name} team.
Always answer as "we" or "our" (the staff), not as a third-party.
If information is not available, politely say so.

Website Content:
{content}

User Query:
{query}

Your answer (as the business team):
"""

    response = llm.invoke(prompt)
    return response.content

This is the complete RAG flow:

Complete RAG Pipeline:
======================

User: "Do you have a gym?"
        │
        ▼
┌─────────────────────────────┐
│ 1. RETRIEVE                 │
│    - Convert query → vector │
│    - Search Qdrant          │
│    - Get relevant chunks    │
└─────────────────────────────┘
        │
        ▼
┌─────────────────────────────┐
│ 2. AUGMENT                  │
│    - Create prompt          │
│    - Add retrieved content  │
│    - Add instructions       │
└─────────────────────────────┘
        │
        ▼
┌─────────────────────────────┐
│ 3. GENERATE                 │
│    - Send to GPT-4          │
│    - Get AI response        │
│    - Return to user         │
└─────────────────────────────┘
        │
        ▼
AI: "Yes! We have a fully equipped fitness center
     on the 3rd floor, open 24/7 for our guests!"

LangChain Tools and Agents

The code also uses LangChain Tools - let's understand them!

# main.py

from langchain.tools import tool

@tool
def business_info_tool(query: str) -> str:
    """
    Tool to get website related information
    """
    return get_business_info(query)

What is a Tool?

A Tool is a function that AI can call when it needs specific capabilities!

LangChain Agent with Tools:
===========================

┌──────────────────────────────────────────────────┐
│                    AI AGENT                       │
│                                                  │
│  "Hmm, user is asking about hotel services..."   │
│  "I should use the business_info_tool!"          │
│                                                  │
│  ┌────────────────────────────────────────────┐  │
│  │ Available Tools:                           │  │
│  │                                            │  │
│  │ 🔧 business_info_tool - Get website info   │  │
│  │ 🔧 book_service - Book a service           │  │
│  └────────────────────────────────────────────┘  │
└──────────────────────────────────────────────────┘

The agent decides which tool to use based on the question!

# main.py

def mainAgent(query: str, message: list) -> str:
    llm = ChatOpenAI(model="gpt-4o", temperature=0)

    # Bind tools to the LLM
    tool_calling_llm = llm.bind_tools([business_info_tool])

    # LLM decides if it needs to call a tool
    ai_message = tool_calling_llm.invoke(messages)

    # If tool was called, execute it
    if ai_message.tool_calls:
        for tool_call in ai_message.tool_calls:
            if tool_call.get("name") == "business_info_tool":
                result = business_info_tool.invoke(tool_call.get("args"))
                return result

    return ai_message.content

FastAPI Backend Server

The backend uses FastAPI to handle HTTP requests!

# server.py

from fastapi import FastAPI, Request
from main import mainAgent

app = FastAPI()

# Serve the chat interface
@app.get("/app")
async def get_app():
    return FileResponse("index.html")

# Chat endpoint - receives user messages
@app.post("/chat")
async def chat_endpoint(request: Request):
    data = await request.json()
    query = data.get("query", "")

    # Process with AI agent
    response = mainAgent(query, [])

    return {"response": response}

API Flow:

Frontend → Backend Flow:
========================

Browser                          Server
   │                               │
   │  POST /chat                   │
   │  {"query": "What rooms?"}     │
   │ ─────────────────────────────→│
   │                               │  → mainAgent()
   │                               │  → query_website()
   │                               │  → GPT-4 response
   │  {"response": "We offer..."}  │
   │ ←─────────────────────────────│
   │                               │
   Display response                │

Frontend Chat Interface

The frontend is a simple HTML + JavaScript chat widget!

// index.html - Key parts

// Send message to backend
form.addEventListener("submit", async (e) => {
    e.preventDefault();
    const userMsg = input.value.trim();

    // Show user message in chat
    addMessage("user", userMsg);

    // Call backend API
    const response = await fetch("http://localhost:8110/chat", {
        method: "POST",
        headers: { "Content-Type": "application/json" },
        body: JSON.stringify({ query: userMsg })
    });

    const result = await response.json();

    // Show AI response (with Markdown support!)
    addMessage("assistant", marked.parse(result.response));
});

The UI uses Tailwind CSS for styling and marked.js for Markdown rendering!

Docker Setup

The project uses Docker Compose to run everything together!

# docker-compose.yml

services:
  qdrant:
    image: qdrant/qdrant:latest
    ports:
      - "6333:6333"   # REST API
    volumes:
      - qdrant_data:/qdrant/storage

  web:
    build: .
    ports:
      - "8110:8000"
    depends_on:
      - qdrant
    command: sh -c "python indexing.py && python server.py"

Docker Architecture:
====================

┌────────────────────────────────────────────┐
│            Docker Compose                   │
│                                            │
│  ┌──────────────┐    ┌──────────────┐      │
│  │   qdrant     │    │     web      │      │
│  │  (Vector DB) │◄───│  (FastAPI)   │      │
│  │  Port: 6333  │    │  Port: 8110  │      │
│  └──────────────┘    └──────────────┘      │
│                                            │
└────────────────────────────────────────────┘
         │                    │
         ▼                    ▼
    localhost:6333      localhost:8110/app

Complete Data Flow - Putting It All Together

Let's see the entire system working together!

Complete System Flow:
=====================

SETUP PHASE (One Time):
───────────────────────
1. Run: docker-compose up -d qdrant
   → Starts Qdrant vector database

2. Run: python indexing.py
   → Scrapes website (BeautifulSoup)
   → Splits text into chunks
   → Creates embeddings (OpenAI)
   → Stores in Qdrant

3. Run: uvicorn server:app --port 8110
   → Starts FastAPI server


RUNTIME PHASE (Every Query):
────────────────────────────
User types: "What amenities do you have?"
                    │
                    ▼
            ┌───────────────┐
            │   Frontend    │
            │  (index.html) │
            └───────┬───────┘
                    │ POST /chat
                    ▼
            ┌───────────────┐
            │   FastAPI     │
            │  (server.py)  │
            └───────┬───────┘
                    │ mainAgent()
                    ▼
            ┌───────────────┐
            │  Main Agent   │
            │  (main.py)    │
            └───────┬───────┘
                    │
        ┌───────────┴───────────┐
        ▼                       ▼
┌───────────────┐       ┌───────────────┐
│    Qdrant     │       │   OpenAI      │
│ Vector Search │       │    GPT-4      │
└───────┬───────┘       └───────┬───────┘
        │                       │
        │ Relevant chunks       │ AI Response
        └───────────┬───────────┘
                    │
                    ▼
            "We offer a swimming pool,
             fitness center, spa, and
             complimentary breakfast!"

Tech Stack Summary

Component	Technology	Purpose
Web Scraping	BeautifulSoup + Requests	Extract website content
Text Splitting	LangChain RecursiveCharacterTextSplitter	Break content into chunks
Embeddings	OpenAI text-embedding-3-small	Convert text to vectors
Vector Database	Qdrant	Store and search vectors
LLM	OpenAI GPT-4o	Generate responses
Framework	LangChain	Orchestrate RAG pipeline
Backend	FastAPI	REST API server
Frontend	HTML + Tailwind CSS + JavaScript	Chat interface
Container	Docker + Docker Compose	Deployment

Interview Questions - Quick Fire!

Q: What is RAG?

"RAG stands for Retrieval Augmented Generation. It's a technique that retrieves relevant information from a knowledge base and uses it to augment the AI's context before generating a response. This allows AI to answer questions about specific data without fine-tuning."

Q: Why use vector embeddings?

"Vector embeddings convert text into numerical representations that capture semantic meaning. This allows us to perform similarity searches - finding text that is semantically similar even if the exact words differ. For example, 'room' and 'accommodation' would have similar vectors."

Q: What is Qdrant?

"Qdrant is an open-source vector database optimized for storing and searching high-dimensional vectors. It's designed for AI/ML applications and provides fast similarity search using algorithms like HNSW."

Q: Why split documents into chunks?

"Splitting into chunks improves retrieval accuracy. Large documents have mixed topics, making embeddings less focused. Smaller chunks have specific meanings, so when we search, we get more relevant results. Overlap ensures context isn't lost at boundaries."

Q: What is LangChain?

"LangChain is a framework for building applications with LLMs. It provides tools for connecting LLMs to external data sources, creating agents that can use tools, and building RAG pipelines. It abstracts away complexity of working with embeddings and vector stores."

Q: How does the agent decide which tool to use?

"The LLM analyzes the user query and the tool descriptions. Based on the query intent and the tool's docstring, it decides if calling a tool would help answer the question. This is called function calling or tool calling in modern LLMs."

Key Points to Remember

RAG = Retrieval + Augmentation + Generation
Embeddings convert text to vectors that capture meaning
Vector databases (Qdrant) enable fast similarity search
Chunking improves retrieval accuracy (chunk_size=1000, overlap=200)
LangChain orchestrates the entire RAG pipeline
Tools give AI agents specific capabilities
FastAPI handles HTTP requests from frontend
Docker Compose runs all services together
AI responds as "we" to represent the business
This architecture is used by modern AI assistants!

Running the Project

Here's how to run this project yourself:

# Step 1: Clone and setup
cd aichat

# Step 2: Configure environment
cp .env.example .env
# Add your OPENAI_API_KEY
# Set BASE_URL to the website you want to index

# Step 3: Start Qdrant
docker-compose up -d qdrant

# Step 4: Index the website
python indexing.py

# Step 5: Start the server
uvicorn server:app --reload --host 0.0.0.0 --port 8110

# Step 6: Open browser
# Go to http://localhost:8110/app

What's Next?

Now that you understand how RAG-powered chatbots work, you can:

Build chatbots for any business website
Add more tools (booking, scheduling, etc.)
Implement streaming responses
Add conversation memory
Deploy to production with Docker

Keep coding, keep learning! See you in the next one!

blog

Shreyash Kolhe

Building an AI Chatbot with RAG - Complete Deep Dive

Building an AI Chatbot with RAG - Complete Deep Dive

What is RAG (Retrieval Augmented Generation)?

System Architecture Overview

Phase 1: Web Scraping with BeautifulSoup

Phase 2: Vector Embeddings and Qdrant

Phase 3: The RAG Query Pipeline

LangChain Tools and Agents

FastAPI Backend Server

Frontend Chat Interface

Docker Setup

Complete Data Flow - Putting It All Together

Tech Stack Summary

Interview Questions - Quick Fire!

Key Points to Remember

Running the Project

What's Next?

Post a Comment