Day 1: Intro to Project 3

Welcome to class!

Gratitude Journal

Announcements


What is Structured Query Language (SQL)?



Ok, but how does it work?

SQL uses keywords to pull (or “fetch”, “extract”) the data we want from a database. The computer reads those keywords in a specific order.

From EverSQL we can get some more background:

This is the logical order of operations, also known as the order of execution, for an SQL query:


  1. FROM, including JOINs
  2. WHERE
  3. GROUP BY
  4. HAVING
  5. WINDOW functions
  6. SELECT
  7. DISTINCT
  8. UNION
  9. ORDER BY
  10. LIMIT and OFFSET

But the reality isn’t that easy nor straight forward. As we said, the SQL standard defines the order of execution for the different SQL query clauses. Said that, modern databases are already challenging that default order by applying some optimization tricks which might change the actual order of execution, though they must end up returning the same result as if they were running the query at the default execution order.

For CSE 250: Don’t think too hard about optimization at this point. Let the database figure out the optimized routine.

Most SQL queries are typed in the following pattern:

SELECT -- <columns> and <column calculations>
FROM -- <table name>
  JOIN -- <table name>
  ON -- <columns to join>
WHERE -- <filter condition>
GROUP BY -- <subsets for column calculations>
HAVING -- <grouped filter condition>
ORDER BY -- <how the output is returned in sequence>
LIMIT -- <number of rows to return>

Project 3 - what are our goals?

Do we understand the questions being asked in Project 3?


The baseball data

Let’s start exploring the baseball data!

import pandas as pd
import sqlite3

con = sqlite3.connect('lahmansbaseballdb.sqlite')

df = pd.read_sql_query("SELECT * FROM fielding LIMIT 5", con)
df

Understanding SQL queries

Make sure you do the project readings!