🤖 End-to-End Text Summarizer

This project is an end-to-end text summarizer that uses modular coding, a pipeline architecture, and the Google Pegasus model. It is trained on the SAMSUM corpus and uses ROUGE as an evaluation metric. The summarizer is deployed via Flask.

Screenshots

Roadmap

The project uses a pipeline architecture, that means the system is divided into several stages that are executed in a sequence. The pipeline includes the following stages:

Data Ingestion: In this stage, data is downloaded from the source and stored by creating a directory.
Data Validation: This stage involves checking the data for errors or invalid format.
Data Transformation: In this stage, the data is transformed into a format that can be used by the model.
Model Trainer: In this stage, the model is trained using the transformed data. The training process involves optimizing the model's parameters to minimize the loss function and improve the model's performance.
Model Evaluation: In this stage, the model is evaluated using mertrics like ROUGE.
Summary Generation: In this stage, the model is used to generate summaries for new input text.

Model

The project uses the Google Pegasus model, which is a state-of-the-art model for text summarization. Pegasus is a pre-trained transformer model that is fine-tuned for summarization tasks. It is designed to generate high-quality summaries that are coherent, fluent, and informative.

Trained on SAMSUM Corpus

The project is trained on the SAMSUM corpus, which is a dataset of human-written conversations between two speakers. The corpus contains 16,336 dialogues and 347,791 utterances. The dataset is used to train the Pegasus model for summarization tasks.

Evaluation Metrics

The project uses ROUGE as an evaluation metric to measure the quality of the summaries. ROUGE is a set of metrics that measure the overlap between the generated summary and the reference summary. It includes ROUGE-1, ROUGE-2, and ROUGE-L, which measure the overlap of unigrams, bigrams, and longest common subsequences, respectively.

Deployed via Flask

The project is deployed via Flask, which is a lightweight web framework for building web applications in Python. Flask provides a simple and flexible way to expose the summarizer as a web service, allowing users to input text and receive a summary as a response.

Lessons Learned

During the development of the text summarizer project, I learned several key lessons:

Data Quality: High-quality data is critical for model performance. Invest time in data validation and cleaning.
Model Training: Careful tuning of hyperparameters can significantly improve model performance.
Evaluation Metrics: Evaluation metrics like ROUGE scores are essential for model selection.

These lessons will be valuable for future projects and will help me continue improving my skills and expertise in natural language processing.

Getting Started

Clone the project

  git clone https://github.com/bhaveshk22/Text_Summarizer.git

Go to the project directory

  cd Text_Summarizer

Install dependencies

  pip install -r requirements.txt

Train the model

    python main.py

Start the server

  python app.py

Workflow

Update config.yaml
Update params.yaml
Update entity
Update configuration manager at src config
Update conponents
Update pipeline
Update main.py
Update app.py

Authors

@bhavesh

Hi, I'm Bhavesh! 👋

🚀 About Me

I'm a Full Stack Data Scientist

🔗 Links

🛠 Skills

C, C++, Python
SQL
Machine Learning
Deep Learning
Data Science

Summary of My Journey

👩‍💻 I'm currently a student

🧠 Btech Computer Science

💬 more details loading

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.github/workflows		.github/workflows
config		config
research		research
src/textSummarizer		src/textSummarizer
static		static
templates		templates
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
app.py		app.py
main.py		main.py
params.yaml		params.yaml
requirements.txt		requirements.txt
setup.py		setup.py
template.py		template.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🤖 End-to-End Text Summarizer

Screenshots

Roadmap

Model

Trained on SAMSUM Corpus

Evaluation Metrics

Deployed via Flask

Lessons Learned

Getting Started

Workflow

Authors

Hi, I'm Bhavesh! 👋

🚀 About Me

🔗 Links

🛠 Skills

Summary of My Journey

About

Releases

Packages

Languages

License

bhaveshk22/Text_Summarizer

Folders and files

Latest commit

History

Repository files navigation

🤖 End-to-End Text Summarizer

Screenshots

Roadmap

Model

Trained on SAMSUM Corpus

Evaluation Metrics

Deployed via Flask

Lessons Learned

Getting Started

Workflow

Authors

Hi, I'm Bhavesh! 👋

🚀 About Me

🔗 Links

🛠 Skills

Summary of My Journey

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages