Learning PostgreSQL and psycopg2
Table of contents
Prerequisites
Introduction
PostgreSQL Installation
psycopg2 Installation
Setup Database and Basic Table Operations in PostgreSQL
Setup and PostgreSQL Operations in psycopg2
Additional Resources
Prerequisites
Before learning about PostgreSQL and psycopg2, ensure that you have knowledge about the following:
- Python: High level general purpose programming language.
- Relational Algebra: Understanding the behaviour of relational operations.
- Database Theory: Understanding of constraints, tables, relations, and schemas.
- Basic understanding of SQL: Understand the restrictions of SQL and what information is required to perform operations.
Introduction
PostgreSQL is an open-source DBMS. As the name states, PostgreSQL is a relational Database Management System (DBMS) based on relations and queries and is great for dealing with large amounts of data and complex queries. It’s also good for referencing data quickly and allowing the user to have some flexibility in how they want to represent their data. Furthermore, PostgreSQL supports table inheritance and function overloading.
pyscopg2 is an adapter for Python that allows you to easily perform PostgreSQL operations in your Python programs. It’s commonly used with other Python libraries like Flask, which collectively allow you to modify a PostgreSQL database in your application with minimal hassle.
PostgreSQL installation
The following link is to download PostgreSQL onto your computer: https://www.postgresql.org/download/ . This link also shows the different versions of PostgreSQL to match your computer. Follow the instructions after downloading and make sure to remember the username and password during setup. For your convenience, here are the links to download the installer for PostgreSQL on different OS:
- Windows: https://www.postgresql.org/download/windows/
- MacOS: https://www.postgresql.org/download/macosx/
- Linux: https://www.postgresql.org/download/linux/
If you are using MacOS, you can also run the command brew install postgresql
in your terminal. Note that you also have to have Homebrew installed for this command to work. brew install postgresql
allows the user to use PostgreSQL on their command line by simply initiating a PostgreSQL environment with the command psql
. More information can be found here.
For Windows users, the user should download PostgreSQL through the link provided above. If the user wants to utilize PostgreSQL on the command line, they must (after installation):
- Add the PostgreSQL bin directory path to the PATH environment variable.
- Run the command
psql -U username
, whereusername
is the username you selected during installation. - To exit psql, run
\q;
.
Once installed, you can create a database, relation, and tables with the PostgreSQL client called pgAdmin. pgAdmin is automatically downloaded when using the links above and can be accessed by searching your task bar. (If you are using Homebrew, you have to install pgAdmin separately by running brew install --cask pgadmin4
.) You can also perform queries on relations and tables through its query tool. More information on how to use pgAdmin can be found at this link.
psycopg2 Installation
To work with psycopg2, you can use pip to install it by running pip install psycopg2
in your console. Make sure you execute this command in the directory you will be using psycopg2. Once finished, you can import psycopg2 (import psycopg2
at the top of the relevant Python file), and perform the desired operations.
Setup Database and Basic Table Operations in PostgreSQL
A quick and basic runover of PostgreSQL.
Create Database
To create a database in PostgreSQL, you can either use pgAdmin or the command line.
In pgAdmin, you can:
- Click on Servers.
- Click on PostgreSQL. The version of PostgreSQL should follow after the name.
- Right click on Databases and click Create
- Follow the instructions in pgAdmin.
In the command line:
- Type
psql
to initialize the PostgreSQL environment. - Type
CREATE DATABASE dbname;
, wheredbname
is the name of the database to create. Documentation on setting database parameters can be found here. To list all available databases, use\l;
.
Create Schema
To create a table, you must first create a schema in your database.
To do this in pgAdmin:
- Click on your desired database.
- Right click the Schemas directory and select Create.
- Fill in the fields on the dialog and press Save.
To do this in the command line:
- Execute
psql
to initialize the PostgreSQL environment. - Execute
\c dbname;
wheredbname
is the name of the desired database. - Run
CREATE SCHEMA schema_name;
. If the operation was successful, you should see the messageCREATE SCHEMA
on the next line. Run\dn
to get a list of available schemas.
Create Tables
After creating your schema you can create tables/relations to perform operations on. A table contains columns, which describes the attributes of the data being stored, and rows, which are records of instances of the table. Each row contains a set of values that corresponds to the columns of the table.
To create a table in pgAdmin:
- Right click on your desired schema.
- Select Create then Table.
- Follow the instructions in pgAdmin.
To create a table in the command line:
- Execute
psql
to initialize the PostgreSQL environment. - Execute
\c dbname
wheredbname
is the name of the desired database - Run
SET search_path TO schema_name;
whereschema_name
is the name of the schema you want to put your table in.search_path
is a variable that determines the order in which schemas are searched when looking for database objects. This command is allows you to define the schema order. In this example, PostgreSQL searches for objects inschema_name
. Ifsearch_path
were set toSET search_path TO schema1, schema2
and an object was not found inschema1
, PostgreSQL would continue searching in the subsequent schemaschema2
.
- Type
CREATE TABLE table_name (column1 int PRIMARY KEY, ...);
, wheretable_name
is the name of the table to create. Inside the round braces is a list of the columns that you want in the table. For each column, you must specify a name and datatype. For example, a table can be created like this: To see the structure of a specific table, use\d table_name;
or\d+ table_name;
for more detailed information (comments associated with the table columns). To list all tables in the current database, use\dt;
CREATE TABLE sports (
sport_id SERIAL PRIMARY KEY,
sport_name VARCHAR(50) UNIQUE NOT NULL,
);
CREATE TABLE teams (
team_id SERIAL PRIMARY KEY,
team_name VARCHAR(50) NOT NULL,
sport_id INT NOT NULL,
FOREIGN KEY (sport_id) REFERENCES sports(sport_id),
);
CREATE TABLE players {
player_id SERIAL PRIMARY KEY,
first_name VARCHAR(50) NOT NULL,
last_name VARCHAR(50) NOT NULL,
age VARCHAR(50) NOT NULL,
team_id INT NOT NULL,
FOREIGN KEY (team_id) REFERENCES teams(team_id),
);
The text after the datatypes (e.g. UNIQUE NOT NULL
) are optional constraints that define conditions that must be satisfied by the data.
Here is a link for information on datatypes in PostgreSQL: https://www.postgresql.org/docs/current/datatype.html
Here is a link for more details and examples of how to create a table: https://www.postgresql.org/docs/current/sql-createtable.html
Running queries
In SQL, a query is a statement/command used to retrieve or manipulate data in a relational database. To execute SQL scripts, you can use \i path/to/script.sql
. Alternatively, you can run SQL queries directly on psql.
Suppose we have the following tables:
sports
:
teams
:
players
:
SELECT
To retrieve all columns of a table, you can use the SELECT *
statement and FROM
clause:
SELECT * FROM players;
This query outputs all columns of all rows of the table players
, i.e. the entire table players
:
To query only certain columns of a table, you can use the SELECT
statement and specify the columns you want in the clause:
SELECT first_name, last_name FROM players;
In this example, the output is a smaller table that only contains the columns first_name
and last_name
of each row from the table players
:
CROSS JOIN
You can get the Cartesian product of two or more tables by listing the tables in the FROM
clause separated by commas:
SELECT * FROM sports, teams;
or by using the CROSS JOIN
clause:
SELECT * FROM sports CROSS JOIN teams;
In these queries, each row of table sports
is combined with each row of the table teams
, resulting in a combination of all rows:
WHERE
The table teams
has the column sport_id
, but suppose we want to read off the sport name of each team. To do this, we need to combine it with the table sports
to get the name associated with each sport_id
. In the query above, we combined the two tables, but not all rows of the output are meaningful. For example, the fourth row associates basketball with Manchester City F.C., which is a soccer team. To filter out the nonsensical combinations, we can use the WHERE
clause:
SELECT * FROM sports, teams WHERE sports.sport_id = teams.sport_id;
This clause allows you to retrieve only the rows that meet the specified conditions. The output of this query would be:
By using WHERE
, we were able to eliminate the rows where the sport_id
’s didn’t match. The output has one row per team, and each row contains the name of the sport associated with respective team.
Although the query is now more targeted and relevant, the two sport_id
columns are redundant. To get rid of the redundancy, one solution is to change the columns in the SELECT
clause so that only one of the sport_id
columns is retrieved:
SELECT sports.sport_id, sport_name, team_id, team_name FROM sports, teams WHERE sports.sport_id = teams.sport_id;
NATURAL JOIN
Another option is to use NATURAL JOIN
in the FROM
clause:
SELECT * FROM sports NATURAL JOIN teams;
NATURAL JOIN
automatically matches and combines the columns with the same name in the tables being joined. Since sport_id
is an attribute in both sports
and teams
, the resulting set includes all unique columns from both tables and excludes duplicate columns. The output of these queries are:
To select specific columns of the result set, we can use the same syntax that was mentioned before:
SELECT sport_name, team_name FROM sports NATURAL JOIN teams;
Output:
AS
It is possible to rename attributes in the result set using the AS
clause:
SELECT sport_name as sport, team_name as team FROM sports NATURAL JOIN teams;
Output:
ORDER BY
To order the output based on a specific column(s), you can use the ORDER BY
clause:
SELECT * FROM teams ORDER BY team_name;
Output:
SELECT sport_name as sport, team_name as team FROM sports NATURAL JOIN teams ORDER BY sport, team;
Output:
(Note: although using the original column names in the ORDER BY
clause may work, it is generally recommended to use the aliases defined in the SELECT
clause.)
To order in descending order, use the DESC
attribute:
SELECT * FROM teams ORDER BY team_name DESC;
Output:
PostgreSQL has several aggregate functions that perform calculations on a set of rows and returns a single value. Some commonly used functions are COUNT()
, SUM()
, AVG()
, MAX()
and MIN()
. You can read more about them here.
There are many other features of PostgreSQL. Specifics of the syntax of PostgreSQL can be found in this link. More about queries can be found here and types of joins can be found here.
Setup and PostgreSQL Operations in psycopg2
To start using PostgreSQL in a Python program with psycopg2, ensure you have finished the relevant installations, then follow the instructions below:
- Create a database in PostgreSQL. This can be done through the command line or pgAdmin. The following steps occur in your python script.
- Import psycopg2 (see psycopg2 Installation above for details).
- Connect to the new database with connect, eg.
conn = psycopg2.connect("dbname=test user=postgres")
. In this example, test is the name of the database that you created, and postgres is the default username when you install PostgreSQL. conn refers to the Connection (class) you have created, and can be continually referenced until you close the connection, eg.conn.close()
. It is reccommended to close the connection when you are finished with it. - To use your new Connection, you must initialize a Cursor (class) with it. This will allow you to perform PostgreSQL operations on the database. Do this with
cur = conn.cursor()
, where conn is our previous Connection, and cur is our new Cursor. - Using the cursor, you can execute any PostgreSQL command, which will be executed on your database via the settings specified in connect() (see step 3). The PostgreSQL query
SELECT * FROM test;
, for instance, would be executed withcur.execute("SELECT * FROM test;")
. You can make use of Python’s native operations to aid you in your database operations, if you wish. Remember to never give users direct access to the input of the execute function. If users can insert text that directly makes its way into the input string, you may be allowing sql injection, which could damage or destroy the contents of your database, and leak sensitive information. - When you have made your desired changes, use
conn.commit()
to save the changes to your database, and then close your connection withconn.close()
. If you do not commit the changes, your work will not transfer to your database, but note that if you are only querying data and not making changes, the commit is not required.
psycopg2 can also be used for data processing, with built-in functions such as fetchone()
and fetchmany(x)
that return one row and x rows from a query response, respectively. From the example in step 5, after we run cur.execute("SELECT * FROM test;")
, we can do row = cur.fetchone()
to save the first row we selected from test to row.
These functions can be useful in many situations. For example, if you run a pet store, and possess a PostgreSQL database of adoptable pets, you may have a site that has room to show 4 pets, and an option to show more. Using psycopg2, if you wanted to choose 4 pets in your database, after executing your query with a cursor cur, you could call data = cur.fetchmany(x)
, and then pass your data to your website in whatever way you desire, to be extracted and displayed (Flask has potentially useful psycopg2 integration for this task- see Additional Resourses at the bottom of this page).
For a comprehensive collection of psycopg2 information, including all provided functions and syntax, visit the offical documentation.