This scripta enable you to ask natural questions about PDF document(s) and get answers generated by a (S)LLM of your choice. It leverages the model's natural language processing capabilities to understand your queries and provide relevant information from the PDF, building a RAG and responds to natural questions.
- Question-Answering: Ask questions in natural language about the content of your PDF.
- Hugging Face Integration: Leverages the Hugging Face Transformers library to access a wide range of state-of-the-art LLM models.
- Sentence Embeddings: Uses sentence embeddings to efficiently find the most relevant parts of the PDF to answer your questions.
- Automatic Dependency Management: Checks and installs required libraries to ensure a smooth setup.
- Python 3.9 or higher: Please ensure you have a compatible version of Python installed.
- Hugging Face Account: You'll need a Hugging Face account to access their models. You can create one for free at https://huggingface.co/.
- Libraries: The following Python libraries are required and will be installed automatically if not present:
langchain
transformers
accelerate
bitsandbytes
sentence_transformers
-
Save the Script: Download this script and save it as
pdf_qa.py
. -
Install Dependencies: Although the script installs and updates all needed libraries, it sometimes fails to do so. In that case open your terminal or command prompt and run:
pip install -r requirements.txt
-
Run the Script:
python3 pdf_qa.py [model_id] [pdf_file_path]
Replace
[model_id]
with the Hugging Face model ID you want to use (e.g.,mistralai/Mistral-7B-Instruct-v0.1
). You can find a list of available models at https://huggingface.co/models. Replace[pdf_file_path]
with the path to your PDF file(s). -
Ask Questions: You'll be prompted to enter questions. Type your questions in natural language and press Enter. The script will provide answers based on the content of the PDF.
-
Exit: Type
exit
and press Enter to quit the script.
This code is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license. See LICENSE.md
for details.
Contributions are welcome! Please feel free to fork this repository and submit pull requests.
This script is provided as-is for educational and personal use. It is not intended for production or commercial applications. The author assumes no liability for any consequences arising from the use of this script.