Intro to Relational Databases and SQL
Learning Objectives
Students will be able to: |
---|
Describe the use case of databases |
Describe the anatomy of a relational database |
Describe the use case of SQL |
Use the psql Interactive Terminal |
Use SQL to create a database and a table |
Use SQL to perform CRUD data operations |
Use a SQL JOIN clause to combine data from multiple tables |
Database Intro
What is a Database?
The vast majority of applications manipulate and display data.
Early on, our programs held data in "memory" using data structures such as arrays and objects. However, when the app was exited, any changes to the data were lost - databases avoid this...
Simply put, a database provides permanent storage for data.
Different Types of Databases
This site ranks databases according to their popularity.
As you can see, there are several different types of databases and several "brands" within each type.
Most database technologies can be broken down into two main categories:
- Relational databases
- NoSQL databases
Relational databases are by far the most popular type of database technology. Conceived by E.F. Codd while working for IBM back in 1970.
In SEI, we'll be using PostgreSQL since it's arguably the best open-source relational database management system (RDBMS) available.
Anatomy of a Relational Database
Schema
The structure of a particular database is defined by a schema.
Schemas define the database's:
- Tables, including the number and data type of each column
- Indexes for efficient access of data
- Constraints (rules, such as whether a field can be null or not)
Tables
The primary container for data in a relational database is a table:
As you can see, database tables look a lot like a spreadsheet since they consist of columns and rows.
Tables are also known as relations, thus the term relational database.
A single table in a relational database holds data for a particular data entity, for example, customers, orders, reviews, etc.
TABLE: artists
id (PK) | name | nationality |
---|---|---|
1 | Prince | American |
2 | Sir Elton John | British |
TABLE: songs
id (PK) | name | year_released | artist_id (FK) |
---|---|---|---|
1 | Tiny Dancer | 1971 | 2 |
2 | Little Red Corvette | 1982 | 1 |
3 | Raspberry Beret | 1985 | 1 |
4 | Your Song | 1970 | 2 |
The naming convention for tables is typically snake-cased and always plural.
Rows (Records / Tuples)
A row in a table represents a single instance of the data entity.
For example a particular artist in the artists table.
Columns (Fields)
The columns of a table have a:
- Name
- Data type (all data in a column must be of the same type)
- Optional contraints
The typical naming convention is usually snake-cased and singular.
PostgreSQL has many data types for columns, but common ones include:
- integer
- decimal
- varchar (variable-length strings)
- text (unlimited length strings)
- date (does not include time)
- timestamp (both date and time)
- boolean
Common constraints for a column include:
PRIMARY KEY
: column, or group of columns, uniquely identify a rowREFERENCES
(Foreign Key): value in column must match the primary key in another tableNOT NULL
: column must have a value, it cannot be empty (null)UNIQUE
: data in this column must be unique among all rows in the table
Primary Keys (PK) and Foreign Keys (FK)
The field (or fields) that uniquely identify each row in table are know known as that table's primary key (PK).
Since only one type of data entity can be held in a single table, related data, for example, the songs for an artist, are stored in separate tables and "linked" via what is known as a foreign key (FK). Note that foreign key fields hold the value of its related parent's PK.
Database Review Questions
❓ A __ defines the structure of a particular database.
❓ A table in a database consists of __ & __.
❓ A __ key uniquely identifies a row within a table.
❓ A __ key references the primary key in a different table.
SQL
What is SQL?
SQL (Structured Query Language), typically pronounced "sequel", is a programming language used to CRUD data stored in a relational database.
SQL syntax is similar to the English language.
Although SQL is fairly standard, it can vary from depending on the particular RDBMS. For example, the SQLite RDBMS has a minimal implementation of SQL commands.
The psql
Interactive Terminal
There are several GUI tools available for working with PostgreSQL, however, in SEI we won't need one because we'll mostly be accessing the database using Python/Django and it's better to learn SQL commands in terminal.
psql
is a tool that runs in terminal and allows us to work with PostgreSQL databases by typing in commands. It was installed with PostgreSQL.
Open a terminal session and type: psql
.
You'll see your PostgreSQL version and psql's prompt:
$ psql
psql (9.6.3)
Type "help" for help.
jimclark=#
Here are some useful commands (note the use of a backslash):
help -- general help
\? -- help with psql commands
\h -- help with SQL commands
\l -- Lists all databases
\c -- Connect to a database
\q -- exits psql
q -- exits a psql list or dialogue
Creating a Database and a Table
Let's create a database named music
and a bands
table:
CREATE DATABASE music; -- Don't forget the semicolon!
\l -- What changed?
\c music -- Connect to the music database
\d -- Lists all tables
-- Define a table's schema
CREATE TABLE bands (
id serial PRIMARY KEY, -- serial is auto-incrementing integer
name varchar NOT NULL,
genre varchar
);
\d -- There should now be a bands table
The backslash commands, e.g. \d
, are psql commands.
The CREATE DATABASE
and CREATE TABLE
are SQL commands.
Basic Querying and Inserting Data
Now let's write some more SQL to query (SELECT
) and create data (INSERT INTO
):
SELECT * FROM bands; -- The * represents all fields
-- For text, use single quotes, not double
INSERT INTO bands (name) VALUES ('The Smiths');
INSERT INTO bands (name, genre) VALUES ('Rush', 'prog rock');
SELECT * FROM bands; -- Use the up arrow to access previous commands
Because the type of id
is set to serial
, the RDBMS automatically assigns it.
Creating a Table for a Related Data Entity
Let's say we have the following data relationship: Band ---< Musician
A Band has many Musicians and a Musician belongs to a Band
Whenever you have a one:many relationship, the rows in the table for the many-side must include a column that references which row in the table on the one-side it belongs to.
This column is a foreign key (FK) that we discussed earlier.
The FK must be of the same data type as the primary key in the parent table - usually an integer.
Here's how we could define the musicians
table:
-- REFERENCES creates a FK constraint
CREATE TABLE musicians (
id serial PRIMARY KEY,
name varchar NOT NULL,
quote text,
band_id integer NOT NULL REFERENCES bands (id)
);
\d musicians -- details for table
The REFERENCES
constraint is what makes a column a FK.
Now let's attempt to add a musician with a bogus foreign key:
INSERT INTO musicians (name, band_id) VALUES ('Geddy Lee', 999);
-- Let's try again, but first, we need the id of the band
SELECT * FROM bands;
-- Assuming 'Rush' has an id of 2
INSERT INTO musicians (name, band_id) VALUES ('Geddy Lee', 2);
SELECT * FROM musicians; -- There's Geddy!
-- Now let's add Neil
-- Use two single quotes to embed a single quote
INSERT INTO musicians (name, quote, band_id)
VALUES (
'Neil Peart',
'If you''ve got a problem, take it out on a drum',
2);
Note: It's possible to insert multiple rows by providing comma separated value lists:
...VALUES ('Geddy Lee', 2), ('Neil Peart', 2);
Querying Data using a JOIN
Clause
The JOIN
clause is used with a SELECT
to query for data from more than one table.
Let's query for all of the bands with their musicians:
-- table right of JOIN has the FKs
SELECT * FROM bands JOIN musicians ON bands.id = musicians.band_id;
Note that no records are returned for bands without any musicians. This is called an INNER JOIN, which is the default.
There are several types of joins.
If we want to return all bands, regardless of whether or not there's any matches for musicians, we use whats called a LEFT JOIN
:
-- Using aliases for the table names
SELECT *
FROM bands b
LEFT JOIN musicians m ON b.id = m.band_id;
Querying Data using a WHERE
Clause
The WHERE clause allows selecting records that meet a condition or conditions:
SELECT *
FROM bands b
LEFT JOIN musicians m ON b.id = m.band_id
WHERE b.name = 'Rush' AND m.name LIKE 'G%';
The LIKE
operator uses:
%
to match any number of characters_
to match a single character
Conditions can be based on any column:
SELECT * FROM musicians WHERE id > 1;
Updating Data
Time to give Geddy a quote by using the SQL UPDATE
command:
UPDATE musicians
SET quote = 'I love to write, it''s my first love.'
WHERE name = 'Geddy Lee';
Deleting Data
Be careful with this command because if you don't use a WHERE
clause, you can accidentally delete all of the data from a table:
SELECT * FROM bands;
DELETE FROM bands WHERE name LIKE '%Smiths';
SELECT * FROM bands;
SQL - Summary
As much fun as it is to write SQL, most developers don't have many opportunities to do so because they use software known as an Object Relational Mapper (ORM) to automatically write SQL and communicate with the database server.
Regardless, having an understanding of SQL always looks good on a resume!
For additional practice after the lab, check out this interactive site: PG Exercises
Essential Questions
Before moving on to the lab, let's answer a few questions...
❓ A database contains a _ for each data entity that an application has.
❓ True or False: In a relational database, all of the data in a given column must be of the same data type.
❓ A single instance of a data entity is represented by a __ in a table.
❓ The programming language used by relational databases is commonly referred to as __.
Further Study
If you'd like to go deeper into SQL, look into: