With AI adoption rapidly increasing, businesses and individuals are looking to build smarter search experiences—tools that go beyond simple keyword matching and understand the semantic meaning behind queries. One of the most powerful ways to do this is through Retrieval-Augmented Generation (RAG) using LangChain, a framework that connects LLMs like GPT-4 to external data sources.

In this guide, we’ll walk through how to build a semantic search app from scratch using LangChain, OpenAI, and Chroma (a vector database). This app will allow users to search through a knowledge base of documents and receive contextually relevant, AI-generated answers.

Before diving in, make sure you have the following setup:

Prerequisites

Python 3.8 or higher
An OpenAI API key
A few .txt or .md files to serve as your knowledge base
Basic understanding of Python

Install Required Packages

pip install langchain openai chromadb tiktoken unstructured pdfminer.six pytesseract beautifulsoup4

LangChain handles the framework, OpenAI provides the LLM, Chroma acts as your vector database, and Tiktoken is used for efficient text tokenization.

Step 1: Load and Split Your Documents

The first step is to load documents into LangChain and split them into manageable chunks. This allows the LLM to handle context efficiently.

LangChain provides loaders for a variety of formats:

from langchain.document_loaders import TextLoader, PDFPlumberLoader, UnstructuredHTMLLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Load different types of documents
txt_loader = TextLoader("./data/note.txt")
pdf_loader = PDFPlumberLoader("./data/report.pdf")
html_loader = UnstructuredHTMLLoader("./data/blog.html")

# Combine documents
documents = txt_loader.load() + pdf_loader.load() + html_loader.load()

# Split into chunks
splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
split_docs = splitter.split_documents(documents)

You can also use UnstructuredImageLoader with OCR for images, or NotionDBLoader for Notion databases.

Step 2: Embed the Documents

Now that the data is chunked, we’ll convert each piece into a vector embedding using OpenAI’s embedding model.

from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Chroma

embedding_model = OpenAIEmbeddings()

# Create a persistent Chroma DB
db = Chroma.from_documents(split_docs, embedding_model, persist_directory="db")
db.persist()

This stores your documents as vector embeddings so they can be queried semantically later.

Step 3: Query the Knowledge Base

Now let’s create a retrieval-based question answering chain using the embedded data and OpenAI’s language model.

from langchain.llms import OpenAI
from langchain.chains import RetrievalQA

retriever = db.as_retriever()
llm = OpenAI(temperature=0)

qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=retriever,
    return_source_documents=True
)

query = "What does LangChain do?"
result = qa_chain(query)
print("Answer:", result['result'])

LangChain handles passing the query to the retriever, pulling the most relevant context, and combining it with the LLM for the final answer.

Step 4: Filter and Enrich with Metadata

Want to filter by topic, date, or document source? You can add metadata to each document:

from langchain.schema import Document

docs = [
  Document(
    page_content="LangChain connects LLMs to external tools.",
    metadata={"topic": "LangChain", "type": "guide"}
  )
]

This allows for filtered retrieval (e.g., only from "guides").

Step 5: Build a Frontend (Optional)

You can integrate your backend with a UI using:

Streamlit (great for prototyping)
Next.js + Supabase (scalable for production)
Gradio (minimal, interactive demos)

With Streamlit, for example:

import streamlit as st

st.title("Ask your documents anything")
query = st.text_input("Your question")
if query:
    answer = qa_chain.run(query)
    st.write(answer)

Bonus: Use LangSmith for Logging in Your Application

If you're looking to track and debug your LangChain pipelines more effectively, LangSmith is your go-to developer tool. It lets you visualize every step of your chains, monitor performance, and identify edge cases with ease.

Here’s how to quickly integrate LangSmith logging into your existing project:

1. Install the SDK

pip install langsmith

2. Set up your environment variables

Add the following to your terminal or .env file:

export LANGCHAIN_API_KEY="your-langsmith-api-key"
export LANGCHAIN_PROJECT="your-project-name"

3. Add tracing to your LangChain code

Option A: One-line global tracing

from langchain.callbacks import tracing_v2_enabled

with tracing_v2_enabled():
    result = qa_chain.run("What is LangChain?")
    print(result)

Option B: Manual tracer setup

from langchain.callbacks.tracers.langchain import LangChainTracer

tracer = LangChainTracer()
result = qa_chain.run("What is LangChain?", callbacks=[tracer])

That’s it! Your LangChain application is now wired to log runs on LangSmith. This makes debugging, optimizing, and showcasing your chains significantly easier. Complete code for Semantic Search App will look like following now :

from langchain.document_loaders import TextLoader, PDFPlumberLoader, UnstructuredHTMLLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.llms import OpenAI
from langchain.chains import RetrievalQA
from langchain.schema import Document
from langchain.callbacks import tracing_v2_enabled
import streamlit as st

# Load different types of documents
txt_loader = TextLoader("./data/note.txt")
pdf_loader = PDFPlumberLoader("./data/report.pdf")
html_loader = UnstructuredHTMLLoader("./data/blog.html")

# Combine documents
documents = txt_loader.load() + pdf_loader.load() + html_loader.load()

# Split into chunks
splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
split_docs = splitter.split_documents(documents)

# Embed documents
embedding_model = OpenAIEmbeddings()
db = Chroma.from_documents(split_docs, embedding_model, persist_directory="db")
db.persist()

# Set up retrieval chain
retriever = db.as_retriever()
llm = OpenAI(temperature=0)
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=retriever,
    return_source_documents=True
)

# Optional metadata usage
docs = [
    Document(
        page_content="LangChain connects LLMs to external tools.",
        metadata={"topic": "LangChain", "type": "guide"}
    )
]

# Streamlit frontend
st.title("Ask your documents anything")
query = st.text_input("Your question")
if query:
    with tracing_v2_enabled():  # LangSmith logging
        answer = qa_chain.run(query)
        st.write(answer)

Conclusion

With just a few lines of code, you've created an AI-powered semantic search system that:

Loads and chunks custom documents
Converts them to embeddings
Stores them in a vector DB
Uses OpenAI to answer questions based on semantic context

This RAG architecture is the backbone of tools like ChatPDF, AskYourPDF, and enterprise chatbots. Whether you're building an internal tool, customer support bot, or personal assistant, LangChain provides everything you need.

Got Questions?

Share what you're building or ask questions in the comments. If you found this helpful, check out our full LangChain series and follow along for more advanced tutorials!

Build an End-to-End Smart Semantic Search App Using LangChain