tldr-fijinews-api

A streamlined API for aggregating and summarizing news from Fijian sources. Get concise, up-to-date local news at your fingertips.

Project Overview

tldr-fijinews-api is an educational passion project that automates the collection of news from various Fijian platforms and provides a unified access point through an API. This approach allows for easy consumption of aggregated, summarized news data by any type of application, whether mobile or desktop.

The project:

Scrapes data from multiple local Fijian news platforms - Only Fijivillage is supported for now.
Summarizes and aggregates the collected information
Provides the data via an API endpoint

Disclaimer

This project is an educational passion project and is not intended for commercial use. All data and content scraped and aggregated by this application are copyrighted by their respective sources. The creator of this project has no intention to monetize or sell this software, website, or any of the content it aggregates. This tool is meant for personal use and learning purposes only.

Users of this project should be aware of and respect the copyright and terms of use of the original news sources. Always ensure you have the right to use the data as intended and consider reaching out to the original sources for permission if you plan to use the aggregated data for anything other than personal, non-commercial purposes.

Technologies Used

BeautifulSoup (BS4): For parsing HTML and extracting data from web pages
Sumy: A text summarization library to create concise news summaries
FastAPI: For creating the API
MongoDB: As the database backend
Poetry: For dependency management and packaging
Jupyter Notebooks: For experimenting with and refining scraping methods

Setup and Installation

Ensure you have Python and Poetry installed on your system.
Clone this repository to your local machine.
Navigate to the project directory and run poetry install to install dependencies.

Create a .env file in the project root and add the following:

MONGODB_URI=your_mongodb_uri_here
MONGODB_DBNAME=your_collection_name_here

Jupyter Notebooks for Scraping Experimentation

To help you get started with the scraping process and allow for easy experimentation, we've included Jupyter notebooks in the notebooks/ directory. These notebooks demonstrate various scraping techniques and can be used as a playground to develop and refine your scraping methods.

To use the notebooks:

Ensure you have Jupyter installed. If not, you can install it using:
```
pip install jupyter
```
Navigate to the notebooks/ directory in your terminal.
Start Jupyter by running:
```
jupyter notebook
```
Open the desired notebook in your browser.

Feel free to modify these notebooks, create new ones, and experiment with different scraping techniques. The insights gained from these experiments can be incorporated into the main project to improve its functionality.

Starting the API

To start the API server, run the following command:

poetry run uvicorn api:app --reload

This command starts the FastAPI server with hot-reloading enabled for development purposes.

API Usage

The API provides the following endpoints:

News Endpoint
- URL: /news
- Method: GET
- Description: Retrieve aggregated news data
- Response: 200 OK with JSON content containing news items
Grab News Endpoint
- URL: /grabnews
- Method: POST
- Description: Trigger the news scraping process
- Response: 200 OK with JSON content (likely a confirmation or status message)

All endpoints return JSON responses. For more detailed information about request and response schemas, you can access the OpenAPI documentation by navigating to /docs when the API is running.

Contributing

Contributions to this project are welcome! If you'd like to contribute, please feel free to open a Pull Request (PR) with your proposed changes. Here are some guidelines for contributing:

Fork the repository and create your branch from main.
If you've added code that should be tested, add tests.
Ensure your code follows the project's coding style.
Make sure your code lints.
Issue that pull request!

Name	Name	Last commit message	Last commit date
Latest commit kulahad Merge pull request #5 from kulahad/AddMoreFeatures Sep 19, 2024 fd23fc1 · Sep 19, 2024 History 37 Commits
api	api	add `cors` options to allow access from all origins	Sep 19, 2024
notebooks	notebooks	fix formating issues	Sep 19, 2024
.gitignore	.gitignore	update gitignore	Sep 12, 2024
README.md	README.md	Update readme	Sep 18, 2024
extractors.py	extractors.py	fix formating issues	Sep 19, 2024
main.py	main.py	fix formating issues	Sep 19, 2024
models.py	models.py	fix formating issues	Sep 19, 2024
poetry.lock	poetry.lock	add poetry files	Sep 12, 2024
pyproject.toml	pyproject.toml	update port binding to localhost	Sep 18, 2024
requirements.txt	requirements.txt	add requirements.txt for deployments	Sep 12, 2024
summarize.py	summarize.py	fix formating issues	Sep 19, 2024
vercel.json	vercel.json	add vercel file	Sep 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

tldr-fijinews-api

Project Overview

Disclaimer

Technologies Used

Setup and Installation

Jupyter Notebooks for Scraping Experimentation

Starting the API

API Usage

Contributing

About

Releases

Packages

Languages

kulahad/tldr-fijinews-api

Folders and files

Latest commit

History

Repository files navigation

tldr-fijinews-api

Project Overview

Disclaimer

Technologies Used

Setup and Installation

Jupyter Notebooks for Scraping Experimentation

Starting the API

API Usage

Contributing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages