Sarvam.AI RAG Assignment

Building an Interactive Learning Tool with RAG and AI Agents

Introduction

Hello! I'm Shivansh Fulper, a passionate ML and Data science enthusiast currently 3rd year at IIIT Jabalpur and having with a keen interest in leveraging AI to build stuffs. Recently, I found an opportunity to get internship at sarvam.ai .and it involved an assignment to build a RAG system with some agents/tools also a bonus if I would integrate sarvam TTS .

Okay, so, picture this. I get this assignment from Sarvam.ai. It's all about building this cool interactive tool to help students learn about sound from the NCERT textbook. Now, I'm not gonna lie, I was a bit intimidated at first. AI, LLMs, vector databases... it all sounded super fancy and complex. Also this was my first time working on a RAG but hey, challenge accepted!

The problem I set out to solve was multifaceted:

  1. Implement a RAG and develop tools for it.
  2. Add text-to-speech functionality.

Now the question was what tools to develop. So I decided to pinpoint some basic questions like :

  1. How can we make studying more interactive and engaging?
  2. Can we provide instant, accurate answers to students' questions?
  3. Is it possible to generate custom study materials tailored to individual needs?

With these questions in mind, I dove into the world of RAG and AI agents. Let me take you through my development journey.

Developing the RAG System

The core of my project is the RAG (Retrieval-Augmented Generation) system. Here's how I built it:

  1. Document Ingestion: I started by creating an ingest.py script to load and process the NCERT Sound chapter PDF. I used both PyPDFLoader and PDFPlumberLoader to ensure robust PDF parsing. The text was then split into manageable chunks using RecursiveCharacterTextSplitter.
  2. Vector Database: Next, I implemented vector_db.py to create a Chroma vector store. This allows for efficient similarity searches based on text embeddings. I chose the "sentence-transformers/all-MiniLM-L6-v2" model for generating embeddings, striking a balance between performance and accuracy.
  3. RAG System Implementation: The heart of the system is in rag_system.py. Here, I integrated Google's Gemini 1.5 Flash model for generating responses. The RAG system retrieves relevant context from the vector store and uses it to inform the AI's responses to user queries.
  4. API Development: To make the RAG system accessible, I created a FastAPI backend (app.py). This exposes endpoints for various functions like generating responses, creating quizzes, and more.
  5. Frontend Design: For user interaction, I developed a Streamlit frontend (frontend.py). This provides an intuitive interface for students to ask questions, take quizzes, and access other learning tools.

Creating Specialized Agents and Tools

With the RAG system in place, I focused on developing specialized agents and tools to enhance the learning experience: