A Complete Guide to Production-Ready Retrieval Augmented Generation with Modern Technologies
In the rapidly evolving landscape of AI applications, Retrieval Augmented Generation (RAG) has emerged as a game-changing technique that combines the power of large language models with real-time information retrieval. This comprehensive guide will walk you through building a production-ready RAG system that integrates cutting-edge technologies including Streamlit, LangChain, and the innovative Model Context Protocol (MCP).
Our system, dubbed "IntelliSearch RAG", represents a significant advancement over traditional search applications by providing intelligent, context-aware responses powered by real-time web data retrieval and sophisticated AI analysis.
Frontend, Application Logic, and Data Processing Flow
User Interface, Processing Layers, and Service Integration
Streamlit + Custom CSS for modern UI with tabs, search history, and metrics
LangChain for agent management and tool orchestration
High-quality web search results using Exa API
BeautifulSoup + Requests for web page content scraping
Ollama (mxbai-embed-large) for text vectorization
FAISS for fast similarity search on embeddings
# requirements.txt
streamlit
langchain
langchain-community
langchain-core
langchain-groq
langchain-mcp-adapters
python-dotenv
requests
beautifulsoup4
exa-py
firecrawl
faiss-cpu
langchain_ollama
The frontend leverages Streamlit with extensive custom CSS for a modern, responsive interface:
# Modern CSS styling with gradient backgrounds
import streamlit as st
st.set_page_config(
page_title="IntelliSearch RAG",
page_icon="🚀",
layout="wide",
initial_sidebar_state="expanded"
)
# Hero section with gradient background
st.markdown("""
<div class="hero-header">
<h1 class="hero-title">🚀 IntelliSearch RAG</h1>
<p class="hero-subtitle">Advanced AI-Powered Search with RAG</p>
</div>
""", unsafe_allow_html=True)
# Enhanced tabs for results display
tab1, tab2, tab3, tab4 = st.tabs([
"🌐 Web Results",
"🧠 AI Analysis",
"📑 Source Documents",
"📈 Insights"
])
The MCP client handles communication between the frontend and backend services:
class LangchainMCPClient:
def __init__(self, mcp_server_url="http://localhost:8000"):
self.llm = ChatOllama(
model="llama2-70b",
temperature=0.7,
streaming=False
)
# Server configuration with timeout and retry settings
server_config = {
"default": {
"url": f"{mcp_server_url}/sse",
"transport": "sse",
"options": {
"timeout": 30.0,
"retry_connect": True,
"max_retries": 3
}
}
}
self.mcp_client = MultiServerMCPClient(server_config)
async def process_message(self, user_input: str):
# Process query through MCP server
tool = self.tools[0]
result = await tool.coroutine(user_input)
return result
High-quality web search using the Exa API for relevant, up-to-date information:
async def search_web(query: str, num_results: int = 5):
# Initialize Exa client
exa = Exa(api_key=os.getenv("EXA_API_KEY"))
# Perform search with content and summaries
search_results = exa.search_and_contents(
query,
num_results=num_results,
summary={"query": "Main points and key takeaways"}
)
# Format results for display
formatted_results = format_search_results(search_results)
return formatted_results, search_results.results
async def get_web_content(url: str):
# Extract content using BeautifulSoup
headers = {"User-Agent": USER_AGENT}
response = requests.get(url, headers=headers, timeout=30)
soup = BeautifulSoup(response.text, 'html.parser')
# Remove script and style elements
for script in soup(["script", "style"]):
script.decompose()
text = soup.get_text(separator='\n', strip=True)
return [Document(page_content=text, metadata={"source": url})]
The RAG system creates embeddings and performs similarity search for relevant context:
async def create_rag_from_documents(documents: List[Document]):
# Initialize Ollama embeddings
embeddings = OllamaEmbeddings(
model="mxbai-embed-large:latest",
base_url="http://localhost:11434"
)
# Split documents into chunks
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=2000,
chunk_overlap=200,
length_function=len,
)
split_documents = text_splitter.split_documents(documents)
# Create FAISS vector store
vectorstore = FAISS.from_documents(
documents=split_documents,
embedding=embeddings
)
return vectorstore
async def search_rag(query: str, vectorstore: FAISS, k: int = 5):
# Perform similarity search
results = vectorstore.similarity_search(query, k=k)
return results
The MCP server orchestrates the entire search and analysis pipeline:
# Initialize FastMCP server
mcp = FastMCP(
name="web_search_rag",
version="1.0.0",
description="Advanced web search with RAG integration",
host="localhost",
type="sse",
port=8000,
timeout=30,
keep_alive=True
)
@mcp.tool()
async def search_and_analyze(
query: str,
num_results: int = 5,
rag_results: int = 3
):
# Perform web search
formatted_results, raw_results = await search.search_web(query, num_results)
# Extract URLs and create RAG system
urls = [result.url for result in raw_results]
vectorstore = await rag.create_rag(urls)
rag_results = await rag.search_rag(query, vectorstore, k=rag_results)
# Format and return response
return {
"search_results": formatted_results,
"rag_analysis": [
{
"content": doc.page_content,
"metadata": {"source": doc.metadata.get("source")}
} for doc in rag_results
]
}
Real-time web search with AI-powered result analysis and contextual understanding.
Sophisticated retrieval and generation using vector embeddings and similarity search.
Beautiful, responsive interface with real-time updates and progress tracking.
Modular design with MCP protocol for easy scaling and integration.
# Clone the repository
git clone https://github.com/akshayram1/Mcp_Rag_websearch_tool
cd Mcp_Rag_websearch_tool
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Create .env file
EXA_API_KEY=your_exa_api_key_here
FIRECRAWL_API_KEY=your_firecrawl_api_key_here
GROQ_API_KEY=your_groq_api_key_here
# Install Ollama (if not already installed)
curl -fsSL https://ollama.ai/install.sh | sh
# Pull the embedding model
ollama pull mxbai-embed-large
# Start Ollama server
ollama serve
# Start MCP server (Terminal 1)
python mcp_server.py
# Start Streamlit app (Terminal 2)
streamlit run app.py
# Access the application
# Open browser to http://localhost:8501
This comprehensive guide has walked you through building a sophisticated RAG system that combines the latest advances in AI, web search, and user interface design. The IntelliSearch RAG system demonstrates how modern technologies can be integrated to create powerful, user-friendly applications that provide intelligent, context-aware responses.
Key achievements of this implementation include:
Modern architecture with async processing, MCP protocol integration, and efficient vector search capabilities.
Beautiful, responsive interface with real-time feedback and comprehensive result visualization.
As the field of AI continues to evolve rapidly, this foundation provides a solid base for future enhancements and adaptations. The modular architecture ensures that new features and improvements can be easily integrated, keeping the system at the forefront of AI-powered search technology.
Start experimenting with the code, customize it for your needs, and contribute to the future of intelligent search!
| Feature | MCP Protocol | RAG (Retrieval Augmented Generation) |
|---|---|---|
| Purpose | Standardizes communication between AI agents and tools | Combines retrieval of external knowledge with generative models |
| Core Functionality | Defines message formats, tool APIs, and agent orchestration | Retrieves relevant documents and augments LLM responses |
| Architecture | Protocol layer for interoperability and modularity | System design pattern for knowledge-intensive tasks |
| Scalability | Highly scalable, supports distributed agent networks | Scalable with vector databases and retrieval systems |
| Integration | Integrates with any tool or agent supporting MCP | Integrates with search APIs, vector DBs, and LLMs |
| Use Case | Multi-agent orchestration, tool interoperability | Knowledge retrieval, context-aware generation |
| Example | LangChain MCP adapters, FastMCP server | RAG with LangChain, FAISS, and Ollama |