Day 1: Intro to Project 3

Welcome to class!

Spiritual Thought

Announcements

  1. Project 2 Highlights
  2. Project 2 comments
  • Turn them in
  • Clean up graphs (main titles, axis labels, legends)
  • Column headers on tables in your report (don’t include index number either)
  • Technically Proportion of all flights delayed by weather, not the proportion of delayed flights
  • JSON should look like a text example of a record, not a table
  1. Things for next project:
  • Be sure to give section headers meaningful titles (NOT “Question 1”)
  • Drop “my useless chart” from your graphs

What is Structured Query Language (SQL)?



Ok, but how does it work?

SQL uses keywords to pull (or “fetch”, “extract”) the data we want from a database. The computer reads those keywords in a specific order.

From EverSQL we can get some more background:

This is the logical order of operations, also known as the order of execution, for an SQL query:


  1. FROM, including JOINs
  2. WHERE
  3. GROUP BY
  4. HAVING
  5. WINDOW functions
  6. SELECT
  7. DISTINCT
  8. UNION
  9. ORDER BY
  10. LIMIT and OFFSET

But the reality isn’t that easy nor straight forward. As we said, the SQL standard defines the order of execution for the different SQL query clauses. Said that, modern databases are already challenging that default order by applying some optimization tricks which might change the actual order of execution, though they must end up returning the same result as if they were running the query at the default execution order.

For CSE 250: Don’t think too hard about optimization at this point. Let the database figure out the optimized routine.

Most SQL queries are typed in the following pattern:

SELECT -- <columns> and <column calculations>
FROM -- <table name>
  JOIN -- <table name>
  ON -- <columns to join>
WHERE -- <filter condition>
GROUP BY -- <subsets for column calculations>
HAVING -- <grouped filter condition>
ORDER BY -- <how the output is returned in sequence>
LIMIT -- <number of rows to return>

Project 3 - what are our goals?

Do we understand the questions being asked in Project 3?


The baseball data

Let’s start exploring the baseball data!

import pandas as pd
import sqlite3

con = sqlite3.connect('lahmansbaseballdb.sqlite')

df = pd.read_sql_query("SELECT * FROM fielding LIMIT 5", con)
df

How can we see what tables are in the database?

import pandas as pd
import sqlite3

con = sqlite3.connect('lahmansbaseballdb.sqlite')

pd.read_sql_query("""

SELECT name 
FROM sqlite_master 
WHERE type='table'

""", con)


Understanding SQL queries

Make sure you do the project readings!