Project Readerwise is a database analysis tool created as a solution to the "Logs Analysis" project within the Udacity Full Stack Developer Nanodegree Program. The mission of this project is to analyze the contents of Udacity's provided database of articles, authors, and activity (logs) for an online newspaper, then deliver some meaningful insights from that data.
Project ReaderWise is developed by Evan McCullough using Python version 3.5.2.
Follow the instructions and links below to download/install Project ReaderWise's dependencies:
Execution of Project ReaderWise depends on a Linux environment configured very specifically. To easily reproduce this environment, you may utilize VirtualBox to create a virtualized environment, and Vagrant to configure that environment correctly.
Download VirtualBox here: https://www.virtualbox.org/wiki/Downloads
Download Vagrant here: https://www.vagrantup.com/downloads.html
As mentioned briefly above, Vagrant is a system designed to help you automatically configure virtual machines. More specifically, rather than having to go through the manual setup process normally involved in setting up a virtual machine with VirtualBox, Vagrant allows you to write files that help configure your virtual machine automatically. For a more in-depth explanation of Vagrant and its utility, check out this YouTube video.
The folks at Udacity have provided the configuration that we need for the environment that will run Project ReaderWise, as well as some other files that you might need in this GitHub project: https://github.com/udacity/fullstack-nanodegree-vm
cd into the directory where you would like to store these files and run git clone to download a copy of the repo. After that, cd into the fullstack-nanodegree-vm folder created by this clone, then run git clone on this repository to copy the code here. cd into your new project-readerwise folder and you are ready to go! (Once you have the other dependencies installed below, that is)
To make sure nothing residual throws off the operation of Project ReaderWise, it is best to start fresh with a newly installed copy of the news database. You can get the file for that from Udactiy here. Unzip this file, make sure to move it into your project-readerwise directory, then run the following two commands to start up and connect to your virtual machine:
vagrant up
vagrant ssh
Once in your virtual machine, run this command to access your project files:
cd \vagrant
Next, use this command to run the newsdata.sql file and set up the database:
psql -d news -f newsdata.sql
pip3 install psycopg2
sudo pip3 install pycodestyle
Execute the following command to run ReaderWise and view the analytics insights provided by the tool.
python3 readerwise.py
python readerwise.py
Need Git Bash? Download it at https://gitforwindows.org/.
Project ReaderWise has the mission of answering three questions, in particular:
-
What are the newspaper's three most popular articles of all time?
-
Who are the most popular article authors of all time for the newspaper?
-
On which days did more than 1% of page requests by readers result in errors?
The following views were created in the process of developing this project. These views are set up to simplify querying the database and organize data logically for operation. To create these views, you will need to access the news database via psql with the following command:
psql -d news
From here, simply copy and paste any of the code snippets below to re-create my custom views.
This view is a table of author ids (from the articles table) and the total number of article views accrued by each author. This tallies only pages successfully viewed (status = '200 OK') and ignores 'near-match' paths which resulted in 404 errors.
Re-create this view with the following code in psql:
create view author_views as
select author, count(*) as view_count
from articles, log
where log.path = '/article/' || articles.slug
group by author;
This view is a table of each date in the log table, along with a count of all records for that date.
Re-create this view with the following code in psql:
create view views_per_day as
select time::date, count(*) as views
from log
group by time::date;
This view is a table of each date in the log table, along with a count of records for that date where the status code was '404 NOT FOUND'. Analysis of the table showed that the only two status codes listed in the log table were '200 OK' and '404 NOT FOUND', meaning the query constructing this view was safe to search specifically for the '404 NOT FOUND' status.
Re-create this view with the following code in psql:
create view errors_per_day as
select time::date, count(*) as err_count
from log
where status = '404 NOT FOUND'
group by time::date;
Evan McCullough - Python Development
Udacity - Project premise, Database contents
Project ReaderWise is shared under the MIT License. As a note, other Udacity FSND students should not use this code. Such use would constitute a breach of academic integrity.