Chat with your data
Provide users with a convenient and interactive way to access information within PDF documents
Document preprocessing: When a user uploads a PDF document, the app preprocesses it to extract and structure the text content.
Chunking: The text content from the document is divided into smaller, manageable chunks. The chunk size multiplied by the number of relevant chunks selected should not exceed the maximum context window supported by the underlying language model (GPT-3.5)
Vectorization: Each chunk of text is converted into a numerical vector representation with Word Embeddings techniques. These vectors capture the semantic meaning of the text and allow for efficient searching and retrieval.
Storage: The vector representations of the text chunks are stored in a vector database. This database serves as a repository of contextual information from the documents.
Context retrieval: The app performs a similarity search within the database of vectorized documents to find the most relevant chunks based on the vector representation of the user’s question. These retrieved chunks serve as context for generating a relevant response.
Chat history: The app is a chatbot, therefore it saves the chat history and uses it to generate a standalone new question based on it.
Generate response: The user’s question and the retrieved context are sent as input to the GPT model to generate a response in natural language based on this input. The generated response is presented to the user through the chatbot interface.
1. Document upload:
The user begins by uploading one or more PDF documents into the app. These documents contain the information the user wants to access and query.
2. Chatbot interaction:
After the documents are uploaded, the user interacts with the chatbot interface provided by the app. The user can type questions in natural language to the chatbot. The user can continue to ask questions and receive responses, creating an iterative conversation with the chatbot.