Chat with your data

large language models

Chat with your data

Provide users with a convenient and interactive way to access information within PDF documents

Technical procedure

Document preprocessing: When a user uploads a PDF document, the app preprocesses it to extract and structure the text content.

Chunking: The text content from the document is divided into smaller, manageable chunks. The chunk size multiplied by the number of relevant chunks selected should not exceed the maximum context window supported by the underlying language model (GPT-3.5)

Vectorization: Each chunk of text is converted into a numerical vector representation with Word Embeddings techniques. These vectors capture the semantic meaning of the text and allow for efficient searching and retrieval.

Storage: The vector representations of the text chunks are stored in a vector database. This database serves as a repository of contextual information from the documents.

Context retrieval: The app performs a similarity search within the database of vectorized documents to find the most relevant chunks based on the vector representation of the user’s question. These retrieved chunks serve as context for generating a relevant response.

Chat history: The app is a chatbot, therefore it saves the chat history and uses it to generate a standalone new question based on it.

Generate response: The user’s question and the retrieved context are sent as input to the GPT model to generate a response in natural language based on this input. The generated response is presented to the user through the chatbot interface.

User instructions

1. Document upload:

The user begins by uploading one or more PDF documents into the app. These documents contain the information the user wants to access and query.

2. Chatbot interaction:

After the documents are uploaded, the user interacts with the chatbot interface provided by the app. The user can type questions in natural language to the chatbot. The user can continue to ask questions and receive responses, creating an iterative conversation with the chatbot.

Download the example document

Click here to upload the document

Questions

What is the full name of the notary public who certified this document?
Who are the individuals involved in establishing the 'CHAHUAN Y FILIPPI LIMITADA' company?
What is the registered capital of 'CHAHUAN Y FILIPPI LIMITADA' and how was it contributed?
What is the stated business objective or purpose of the company?
Where is the registered office of the company?
What is the extent of liability for the partners in this Limited Liability Company (Sociedad de Responsabilidad Limitada)?
Who can administrate, represent, and use the company's business name according to the document?
Is there a specified term for the existence of 'CHAHUAN Y FILIPPI LIMITADA'?
What is the date of the document's certification?

Soko Solutions is headquartered in the Washington DC metro area and has development centers in Latin America & Europe.

We are Global

Washington DC Metro Area
8609 Westwood Center, Tysons Corner, Virginia 22182
Buenos Aires, Argentina
Av. Sánchez de Loria 2395 3rd Floor, Of. A
London, UK
3rd Floor, 12 Gough Square, EC4A 3DW
Stockholm, Sweden
Wallingatan 12 - 111 60 Stockholm
Santiago, Chile
Fidel Oteiza 1921, Piso 5, Providencia
Mendoza, Argentina
Huarpes 2414