This repository contains a comprehensive dataset for analyzing the COVID-19 pandemic, with a focus on various data points such as infection rates, mortality rates, testing data, and vaccination statistics from different states in India. The dataset was sourced from Kaggle and can be accessed here.
The goal of this repository is to provide a centralized collection of COVID-19 data and facilitate in-depth analysis to understand the trends, impact, and progression of the pandemic across India. This dataset includes key metrics for monitoring the spread and control of the virus.
Some of the main columns in the dataset are:
- Date: The date when the data was recorded.
- Country/Region: The country or region where the data was collected (focus on India).
- Confirmed Cases: The total number of confirmed COVID-19 cases reported.
- Deaths: The total number of deaths attributed to COVID-19.
- Recovered: The total number of individuals who have recovered from COVID-19.
- Testing: The total number of COVID-19 tests conducted.
- Vaccinations: The total number of COVID-19 vaccine doses administered.
The following key EDA tasks were performed to extract insights from the COVID-19 dataset:
-
Trend Analysis of Confirmed Cases, Recoveries, and Deaths
- Visualized daily and cumulative trends of confirmed cases, recoveries, and deaths.
- Analyzed the growth rate and peak periods of COVID-19 cases.
-
Mortality and Recovery Rate Analysis
- Calculated and visualized the mortality and recovery rates over time.
- Compared rates across different states to identify regions with high impact and successful recovery strategies.
-
Testing and Vaccination Trends
- Analyzed the testing rates over time and their correlation with the number of confirmed cases.
- Studied vaccination rollout progress and its impact on infection rates.
-
State-wise Analysis
- Compared COVID-19 metrics across different states in India to identify regional patterns and hotspots.
- Analyzed state-wise vaccination rates and their effectiveness in controlling the spread.
- Identified significant trends in the spread and control of COVID-19 across different periods and regions.
- Highlighted the impact of vaccination efforts in reducing new infection rates.
- Provided insights into the effectiveness of testing strategies in controlling the spread.
- Python: Programming language used for data analysis.
- Pandas & NumPy: For data manipulation and numerical computations.
- Matplotlib & Seaborn: For creating visualizations to display trends and patterns.
- Plotly: For interactive visualizations and dashboards.
- Incorporate data from additional sources to provide a more comprehensive view of the pandemic.
- Implement predictive modeling to forecast future COVID-19 cases and trends.
- Create an interactive dashboard for real-time visualization of key metrics.
Feel free to contribute to this repository by adding your own data analysis scripts, visualizations, or additional datasets related to the COVID-19 pandemic. Contributions can help provide a deeper understanding of the data and offer valuable insights.
- Kaggle and Sudalai Rajkumar for providing the COVID-19 dataset used in this analysis.
- The data science community for sharing valuable insights and resources for pandemic analysis.