
I have reached the end of lesson 2. The final project is entitled "Data dig" where we are given a set of interesting
data sets: NASA astronauts, Superbowl results, Pokemon stats, NBA players, Top movies, Top countries by population, Solar system objects by size, Marvel characters, Furniture store sales, Earned KA badges, Winston's donut logs, Card game results, and NFL draft picks.
We are to pick one of those data sets and use advanced SELECT queries to discover things about the data. What sort of questions might one have about that data, like if they were using it for an app or a business idea? Here are some ideas:What are average, max, and min values in the data?
What about those numbers per category in the data (using HAVING)?
What ways are there to group the data values that don’t exist yet (using CASE)?
What interesting ways are there to filter the data (using AND/OR)?
Basically, the above given questions will serve as a guide on what we could dig from the given data set. Since these are generic questions that will serve as guide, we will deal with it later. For now, let us select what data set we shall use. I chose Top movies. Given below is the database schema.
topmovies100 rows |
---|
RankINTEGER |
TitleTEXT |
StudioTEXT |
WorldwideREAL |
DomesticREAL |
DomesticPctREAL |
OverseasREAL |
OverseasPctREAL |
YearINTEGER |
If I have this dataset, the first question I could possibly ask is, "How much is the maximum and minimum worldwide box office earnings for movies that belong to the top 100?" Here, we could use the following code to get the value:
SELECT title, MAX(Worldwide) FROM topmovies;
SELECT title, MIN(Worldwide) FROM topmovies;
QUERY RESULTS
Title | Max(Worldwide) |
---|
Avatar | 2788 |
Title | MIN(Worldwide) |
---|
Tangled | 591.8 |
The next question I could possibly ask is, "How much is the average worldwide, average domestic, and average overseas gross revenue for the top 100 movies?" Here, we could use the following code to get the value:
SELECT title, ROUND(AVG(Worldwide)) AS avg_worldwide, ROUND(AVG(Domestic)) AS avg_domestic, ROUND(AVG(Overseas)) AS avg_overseas FROM topmovies;
By doing this, up and coming movies will be able to assess whether they are below, equal, or above the worldwide, domestic, and overseas average gross revenues.
QUERY RESULTS
avg_worldwide | avg_domestic | avg_overseas |
---|
863 | 307 | 555 |
For example, for the movie, "The Greatest Showman" which I liked watching, its worldwide box office is $430.58 million, domestic box office is $174.34 million, and overseas box office is $256.24 million. All these values are below the calculated averages.
Another thing, I could ask is, "How many movies were included in the top 100 movies when grouped by studio? Which among the studios have the highest number of count and the lowest number of count?" Here, we could use the following code to get the value:
SELECT Studio, COUNT(*) FROM topmovies GROUP BY Studio ORDER BY COUNT(*);
SELECT Studio, COUNT(*) FROM topmovies GROUP BY Studio HAVING COUNT(*) = 23;
SELECT Studio, COUNT(*) FROM topmovies GROUP BY Studio HAVING COUNT(*) = 1;
QUERY RESULTS
Studio | COUNT(*) |
---|
DW | 1 |
LG/S | 1 |
NM | 1 |
LGF | 3 |
NL | 3 |
Sum. | 3 |
Par. | 7 |
Uni. | 8 |
P/DW | 9 |
Sony | 11 |
Fox | 13 |
WB | 17 |
BV | 23 |
Studio | COUNT(*) |
---|
DW | 1 |
LG/S | 1 |
NM | 1 |
Another question would be, "How much is the average worldwide box office revenue when grouped by studio?" Here, we could use the following code to get the value:
SELECT Studio, ROUND(AVG(Worldwide)) AS avg_worldwide FROM topmovies GROUP BY Studio ORDER BY avg_worldwide;
QUERY RESULTS
Studio | avg_worldwide |
---|
NM | 612 |
Sum. | 707 |
P/DW | 763 |
Sony | 768 |
LGF | 769 |
LG/S | 830 |
Uni. | 845 |
BV | 871 |
DW | 920 |
WB | 923 |
Fox | 924 |
Par. | 962 |
NL | 972 |
Comments
Post a Comment