On this page
Project 3 WorkBook
Tutoring Lab Info
The data science lab is a resource you can use in person, online, and in Slack.
SQL Query and Data Retrieval
1. sqlite3.connect()
The sqlite3.connect()
function establishes a connection to an SQLite database file. In this example, the file named ‘lahmansbaseballdb.sqlite’ is being connected to. Use Relative Path
meaning this sqlite file is in the same directory as the file you are working with. This will ensure a connection when using sqlite3
.
2. SELECT/FROM
The SELECT
statement is used to retrieve data from one or more tables in a database. It specifies which columns to include in the result set. The FROM
clause specifies the table or tables from which to retrieve data. It forms the foundation of the SELECT
statement.
3. WHERE
In SQL, when using the WHERE
clause, logical symbols are commonly employed to specify conditions based on column values. These symbols help define the relationship between a column’s value and the desired condition.
Code Snippet
Code Snippet
Code Snippet
5. ORDER BY
The default order for the ORDER BY
clause is ascending (ASC)
. ASC
sorts from the lowest value to the highest value. DESC
sorts from the highest value to the lowest value.
Code Snippet
7. ALIAS
An alias is a temporary name assigned to a table or column in a SQL query. It can be used to make the output more readable or to shorten lengthy column names.
6. CAST
The CAST
function is used to convert data from one data type to another. It is particularly useful for performing calculations or comparisons on data of different types.
8. HAVING
The HAVING
clause is used to filter rows in a result set based on a specified condition. It is similar to the WHERE
clause but is used with aggregate functions in GROUP BY
queries.
9. LIMIT
The LIMIT
clause is used to limit the number of rows returned in a result set. It is often used in combination with the ORDER BY
clause to retrieve a specific number of top or bottom records.
10. GROUP BY
The GROUP BY
clause is used to group rows in a result set based on one or more columns. It is typically used with aggregate functions to perform calculations on grouped data.