Skip to content

A chat-pdf application for interacting with pdf. User can upload pdf and ask questions related to the uploaded documents.

Notifications You must be signed in to change notification settings

mathanprasannakumar/CHAT-PDF

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CHAT-PDF

A chat-pdf application for interacting with pdf. User can upload pdf and ask questions related to the uploaded documents.

DEMO : CHAT-PDF

Video : link

Implementaion


Technologies used : vanilla js (logic) , express(server), nodejs(runtime) , langchain(connect with openai models) , html and css (ui)

With this implementation , you can easily build a chatgpt like application for any docs or for an any organization

Step 1 : Setting up the api keys and environment

  • For embedding the text and for the chat model Open AI model is used

  • For accessing the open ai model through langchain we need to have open ai api key

  • Please visit open ai platform link Make an account and Create apikey and saved it

  • Visit the Pinecone site and create api key , create an index

  • Index is the storage where we can store the vectors an retrieve the vectors from it

  • All the api keys and index information are initalized as a values for variables in inside the .env files

Step 2 : Building the UI for file upload and chat box

Step 3 : Handling the uploaded pdf file

  • file is send to the server , which is then parsed by pdfparser for extracting the text data from the binary data

  • The text is then splittted into chunks of size : 1000 and overlap size : 100

  • All the text is converted to vector embeddings by OpenAI Embeeding model and stored inside the index by using PineconeStore, this will return a vector store which can be used for manipulating the vector embeddings in the database for similarity search during the time of query

Step 4 : Handling the prompt

  • The messages sent through the chatbox is forward to the server

  • Here , we need to consider the chat history and we need to setup openai model to generateresponse as per the externalknowledgebase that we just formed before

  • For the chat history , buffer memory from langchain is created where the last 5 message completion will be tracked

  • For external knowledgebase , ConversationalRetrievalChain is created

  •  
                 1) Here the input prompt is combined with the chat history to forma standalone question
                 2) Then similiar records from the vector database related to the query is retrieved
                 3) Model will receive the standalone question and having the retreived data for the knowledgebase , it will generate a relevent response
           

References

  • Langchain documentation helped a lot , if you dont understand the docs,ask the chatbot in the documentation. As openAi chatgpt dont even know what is langchain.

  • Youtube links - All the implementation is done in python ,streamlit and typescript, i only took the idea from those videos

    Streamlit & python

    Chatgpt - typescript

    Streamlit & python

About

A chat-pdf application for interacting with pdf. User can upload pdf and ask questions related to the uploaded documents.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published