This is the third project of the Full Stack Web Development Nanodegree program, from Udacity.
This project aim to create an internal reporting tool that will analyzed the provided database, answering three specific questions using good code practices for both technologies involved, Python (PEP-8) and SQL (one single query per question).
The questions were:
- What are the three most popular articles of all time?
- Who are the most popular article authors of all time?
- On which day did more than 1% of requests lead to errors?
To run this project successfully you'll need:
I used a vagrant setup into a virtual machine, which included the database and the needed libraries. To have the exact same setup, you must:
Install VirtualBox and vagrant;
Clone this repository;
In your command prompter (bash) Navigate to the vagrant subdirectory in the repository folder and type vagrant up
. If it's the first time, vagrant will download Linux and install it. When you get your prompt back, follow it by the command vagrant ssh
to log in to your vagrant development environment.
Download and place the newsdata.sql into the vagrant directory inside the repository folder. Use psql -d news -f newsdata.sql
to setup the database.
To run your script: python <script_name>.py
while logged into your development environment, assuming that it has all the requirements correctly installed.
The news database is a large database containing information regarding newspaper articles. It's composed by three tables, as follows:
Column | Type | Modifiers |
---|---|---|
name | text | not null |
bio | text | |
id | integer | not null default nextval('authors_id_seq'::regclass) |
Column | Type | Modifiers |
---|---|---|
author | integer | not null |
title | text | not null |
slug | text | not null |
lead | text | not null |
body | text | |
time | timestamp with time zone | default now() |
id | integer | not null default nextval('articles_id_seq'::regclass) |
Column | Type | Modifiers |
---|---|---|
path | text | |
ip | text | |
method | text | |
status | text | |
time | timestamp with time zone | default now() |
id | integer | not null default nextval('log_id_seq'::regclass) |
Two views were created for the third and last SQL query: logviews
and logerrors
.
CREATE VIEW logviews as SELECT time::date as day, count(*) as views from log GROUP BY time::date;
select * from logviews limit 10;
day | views
------------+-------
2016-07-01 | 38705
2016-07-02 | 55200
2016-07-03 | 54866
2016-07-04 | 54903
2016-07-05 | 54585
2016-07-06 | 54774
2016-07-07 | 54740
2016-07-08 | 55084
2016-07-09 | 55236
2016-07-10 | 54489
(10 rows)
CREATE VIEW logerrors as SELECT time::date as day, count(*) as errors from log WHERE status like '%404%' GROUP BY time::date;
select * from logerrors limit 10;
day | errors
------------+--------
2016-07-31 | 329
2016-07-06 | 420
2016-07-17 | 1265
2016-07-12 | 373
2016-07-10 | 371
2016-07-25 | 391
2016-07-14 | 383
2016-07-28 | 393
2016-07-30 | 397
2016-07-22 | 406
(10 rows)
querying_db()
-> connects to the PostgreSQL database returning the results of the given querywriting_log()
-> writes messages to the log filetop_three_articles()
-> process the results of the first query, addressing the first questiontop_authors()
-> process the results of the second query, addressing the second questionover_1percent_daily_errors()
-> process the results of the third query, addressing the third question
You can check the output text file here.