Day 12: We should join together in our SQL calculations

SQL queries are typed in the following pattern;

SELECT -- <columns> and <column calculations>
FROM -- <table name>
  JOIN -- <table name>
  ON -- <columns to join>
WHERE -- <filter condition>
GROUP BY -- <subsets for column calculations>
HAVING -- <filter conditions on groups>
ORDER BY -- <how the output is returned in sequence>
LIMIT -- <number of rows to return>

Connecting to data.world in Python

Let’s make the connection

Class reading

import datadotworld as dw

results = dw.query('byuidss/cse-250-baseball-database', 
    'SELECT * FROM batting LIMIT 5')

batting5 = results.dataframe

Let’s view our tables

I want to do a calculation in SQL and return it in a new column in Python?

Use the batting table to show the player and his team with his at batts and runs together with a calculated value of ab / r that is called runs_atbat.

  • Try do complete the above statement without using the info in the questions below.
q = '''
SELECT *
FROM batting
LIMIT 5
'''

dw.query('byuidss/cse-250-baseball-database', q).dataframe

q = '''
SELECT playerid, teamid, ab, r
FROM batting
LIMIT 5
'''

dw.query('byuidss/cse-250-baseball-database', q).dataframe

q = '''
SELECT playerid, teamid, ab, r, ab/r 
FROM batting
LIMIT 5
'''

batting_calc = dw.query('byuidss/cse-250-baseball-database', q).dataframe

q = '''
SELECT playerid, teamid, ab, r, ab/r as runs_atbat
FROM batting
LIMIT 5
'''

batting_calc = dw.query('byuidss/cse-250-baseball-database', q).dataframe

I want to join two tables to help in decision making

For seasons after 1999, which year had the most players selected as All Stars but didn’t play in the All Star game?

  • Provide a summary of how many games, hits, and at bats occurred by those players had in that years post season.
import pandas as pd 
import altair as alt
import numpy as np
import datadotworld as dw

con_url = 'byuidss/cse-250-baseball-database'
# %%
# allstar table

dw.query(con_url, 
'''
SELECT *
FROM AllstarFull
WHERE 
    AND 
LIMIT 5
''').dataframe

dw.query(con_url, 
'''
SELECT yearid, -- <stuff to calculate>
FROM AllstarFull
WHERE yearid > 1999 
    AND gp != 1
GROUP BY --?
ORDER BY --?
''').dataframe
dw.query(con_url, 
'''
SELECT *
FROM BattingPost as bp
LIMIT 5
''').dataframe

Let’s only keep players with at least one at bat in the post season

dw.query(con_url, 
'''
SELECT -- <columns to keep>
FROM BattingPost as bp
JOIN AllstarFull as asf
    ON  -- <two columns for the join>
WHERE bp.yearid > 1999
    AND gp != 1
    AND -- <at bat condition>
LIMIT 15

'''
).dataframe

For seasons after 1999, which year had the most players selected as All Stars but didn’t play in the All Star game?

  • Provide a summary of how many games, hits, and at bats occurred by those players had in that years post season.
dw.query('byuidss/cse-250-baseball-database', 
'''
SELECT -- <lots of calculations>
FROM BattingPost as bp
JOIN AllstarFull as asf
    ON  bp.playerid = asf.playerid AND
        bp.yearid = asf.yearid
WHERE bp.yearid > 1999
    AND gp != 1
    AND ab > 0
GROUP BY -- <column>
ORDER BY -- <column>
'''
).dataframe

I get SQL and want to be challenged.

Do this Math 335 task with SQL commands in Python.