Intro to Programming Database Internet of Things IT Project Management Networking Web Development For Research Students

Introduction

This section of the website introduces the concept of relational algebra. I have found that substantial official-looking sources about relational algebra on the web are just plain wrong. For example, a site called w3resource (which looks a lot like w3c) has a relational algebra operator for sorting. This is ridiculous. The tuples (rows) of a relation are sets- they can't be sorted. Another website, GeeksForGeeks argued there are performance issues with the relational algebra. This is nonsense, because the relational algebra is math- it is a concept, not an implementation. How something is implemented affects performance and the math concepts behind relational algebra can be implemented in multiple ways, including in ways that would optimize similarly to modern SQL.

Accordingly, I have put together this portion of my website to provide a simple introduction to relational algebra. I will try to introduce what are mathematical concepts in a hopefully not so math way.

This portion of the website assumes familiarity with SQL. I find explaining relational algebra without SQL is a bit of a nightmare. However, once people have played with SQL, it is much easier to understand relational algebra and to realise how kludgy SQL is as compared to what a proper relational algebra-based query language would look like. Also, there is an assumption the reader is familiar with the hospital database used in the SQL examples on this website.

Note I am trying to keep things simple. So, for example, I don't distinguish between natural join, theta join, etc. I just call them all inner join. Similarly, I don't discuss divide by or intersection because divide by isn't implemented in any RDBMS and intersection isn't something I have ever seen used. Likewise, I don't bother formally defining most operations.

The current contents are:

Here are good alternate website to learn about the concept:

Any questions or comments should be directed to: The creator's email

Relations

In a database context, a relation is not-quite-a-table. You can represent it so it looks like a table, but there are notable differences.

Formally (in math terms), a relation is a pair comprising (1) an ordered set of domains, and (2) a set of tuples, where each tuple is an ordered set where each element of the tuple maps to each domain. Basically, you can think of this as (1) a list of attributes or column names with datatypes and (2) a set of rows where the values in each row corresponds to what is in the column names. However, there are key differences between what the simple explanation says and what the math says. First, going from left to right (i.e., across the columns), everything is ordered. This means columns can repeat. However, going from top to bottom (i.e., across the rows), everything is a set. This means rows cannot repeat.

Basically, to understand relations, think about tables. However:

It helps to think of relations like this:

Relational Algebra Concept English Concept Except
Relation Table Tables can repeat rows. Tables allow anything while values in tuples are constrained by a domain.
Ordered set List
Domain Name with a data type
Tuple Row Rows can repeat in a table. Tuples cannot repeat. Rows can be sorted. Tuples cannot be sorted, because in the relational algebra, a relation is defined as a set of tuples, not an ordered set or bag of tuples.

Here's an example of how tables and relations are different. Imagine the following table:

Person ID First Name Last Name
1 John Smith
2 Mary Contrary
3 John Smith

If you remove the first column (Person ID), the result in relational algebra would be:

First Name Last Name
John Smith
Mary Contrary

Notice how the third row disappeared. This is because it now duplicates the first row.

Why Relational Algebra

Relational algebra is a powerful tool for design. Implementation is hard and therefore costly. Every time you implement something wrong, you spend money correcting your mistake. If you don't correct your mistake, bad things can happen. Ideally, you fix the mistake before you start implementing. With relational algebra, you can simulate how your database works before you actually implement your database.

As a mathematical system, relational algebra has certain powerful properties. Notably, every operation you perform in relational algebra produces a relation. This means you can guarantee certain behaviors in relational algebra in the same way arithmetic guarantees certain behaviors in regular algebra.

In arithmetic, except for divide by 0, every operation results in a number. The result of 4+2, 4-2, 4×2, and 4÷2 are all numbers. This allows us to be able to substitute symbols for numbers. We can say let a=4, let b=2 and whether we do c=a+b, c=a-b, c=a×b or c=a÷b, with the exception of when b=0 and the operation is ÷, we know c will be a number.

The same holds true of the relational algebra. This allows us to treat relational algebra just like algebra. We can substitute symbols for things to determine what will happen to our database. We can therefore prove mathematical properties of our database. We can ask questions like:

In other words, if we are very careful and take the time, we can reach a reasonable level of surety our database will work before implementing.

Project (π)

The project operator takes a relation and removes columns from it.

Assume a table called people that looks like this:

PersonID FirstName LastName
1 John Smith
2 Mary Contrary
3 John Smith

πFirstName,LastNamePeople
would be:


FirstName LastName
John Smith
Mary Contrary

It is going to get complex from hereon, so what I will do is assume you know about the hospital database, and I will show the equivalent command in SQL. Where SQL produces a different result, I will explain it.

The project operator (π) is equivalent to the things you put after the Select statement in SQL. So:

English Description Relational Algebra SQL Equivalent
Find the name and doctorno of all patients. πname,doctorno
patient
Select name, doctorno
from patient;

Restrict (σ)

The restrict operator (π) is equivalent to the where clause in SQL. So:

English Description Relational Algebra SQL Equivalent
Give all patient information about the patient named Bell. σname='Bell'
patient
Select *
from patient
where name='Bell';
Give the name and doctor ID of the patient named Bell. πname,doctorno
σname ='Bell'
patient
Select name, doctorno
from patient
where name='Bell';

And (∧), Or (∨) and Not (¬)

Relational algebra is based on sets and thus uses the and, or and not operators from boolean logic. So:

English Description Relational Algebra SQL Equivalent
Find ward information about wards with wardnos greater than or equal to w2 and less than or equal to w7. σwardno ≥'w2' ∧ wardno ≤'w7'
ward
Select *
from ward
where wardno>='w2' and
wardno<='w7';
Find patient information for the patients named Bell and Robinson. σname ='Bell' ∨ name='Robinson'
patient
Select *
from patient
where name='Bell' or
name='Robinson';
Find patient information about patients whose name is not Bell. σ¬(name ='Bell')
patient
Select *
from patient
where not (name='Bell');

Between

Use the less than and equal or greater than and equal operators to do between:

English Description Relational Algebra SQL Equivalent
Find ward information for wards with wardnos between w2 and w7. σ'w2' ≤ wardno ≤'w7'
ward
Select *
from ward
where wardno between 'w2' and 'w7';

Simple In (∈) and Not In (∉)

Use the symbols ∈ and ∉ for in and not in. These are set operators and take sets:

English Description Relational Algebra SQL Equivalent
Find patient information about patients with the names Bell or Robinson. σname ∈ {'Bell','Robinson'}
patient
Select *
from patient
where name in ('Bell','Robinson');
Find ward information about wards with the ward numbers w2, w3, or w7. σwardno ∉ {'w2','w3','w7'}
ward
Select *
from ward
where wardno not in ('w2','w3','w7');

Do not use ∈ and ∉ to do the equivalent of type I and type II nested queries. Instead, use join (⨝) and antijoin (▷).

Regular Expressions (Like)

The SQL operator Like is a travesty. There is no equivalent concept in math. Instead, use the more powerful concept of regular expressions. Regular expressions are a very powerful way of matching strings and their permutations are beyond the scope of what I can cover here. Here are some examples:

English Description Relational Algebra SQL Equivalent
Find patients who have an e in their name. σRegExp(name,'(.*)e(.*)')
patient
Select *
from patient
where name like '%e%';
Find patients who have an e as the second letter in their name. σRegExp(name,'.e(.*)')
patient
Select *
from patient
where name like '_e%';

Cartesian Product (×)

The cartesian product combines all the tuples of two relations together. For example, given the two below tables:

Table: People

PersonID PersonName
1 Albert
2 Bertha
3 Charlie

   

Table: Fruit

FruitID FruitName
A Apple
B Banana
C Cherry


People × Fruit
would be:

PersonID PersonName FruitID FruitName
1 Albert A Apple
1 Albert B Banana
1 Albert C Cherry
2 Bertha A Apple
2 Bertha B Banana
2 Bertha C Cherry
3 Charlie A Apple
3 Charlie B Banana
3 Charlie C Cherry

The relational algebra and SQL equivalents look like:

English Description Relational Algebra SQL Equivalent
Combine every row in patient and doctor. patient×doctor Select *
from patient,doctor;

Inner Join (⨝)

Inner join works in a way similar to that of SQL's inner join. It connects two relations (tables) together based on matching attributes.

English Description Relational Algebra SQL Equivalent
Combine the staff and doctor tables. staffstaff.staffno=doctor.staffno
doctor
Select *
from staff
inner join doctor
on staff.staffno=doctor.staffno;
List the staff ID, name, rank and specialization of every doctor. πstaff.staffno,staffname,docposition,specialty
(staffstaff.staffno=doctor.staffno
doctor)
Select staff.staffno,staffname,docposition,specialty
from staff
inner join doctor
on staff.staffno=doctor.staffno;

As with SQL, an inner join is the same as a cartesian product followed by a restrict:

staffstaff.staffno=doctor.staffno
doctor
=
Select *
from staff
inner join doctor
on staff.staffno=doctor.staffno
=
σstaff.staffno=doctor.staffno
(staff × doctor)
=
Select *
from staff, doctor
where staff.staffno=doctor.staffno;

Outer Joins (⟕, ⟖, ⟗)

The symbols are: left join (⟕), right join (⟖) and full outer join (⟗).

In the hospital database, a ward can have zero or more nurses.

English Description Relational Algebra SQL Equivalent
Find the staff IDs, names, ward numbers, wardnames and max bed capacity for all wards. Include wards that have no staff. πstaff.staffno,staffname,ward.wardno,wardname,maxbeds
wardward.wardno=nurse.wardno
(nursenurse.staffno=staff.staffno
staff)
Select staff.staffno,staffname,ward.wardno,wardname,maxbeds
from ward
left join
(nurse inner join
staff
on nurse.staffno=staff.staffno)
on ward.wardno=nurse.wardno;

Null (⊥)

Use an upside down T to represent null (⊥).

Union (⋃)

The union operator requires both of the relations to have the same domains. In other words, like SQL, the columns for both relations (tables) in a union need to be the same.

English Description Relational Algebra SQL Equivalent
List the staff ID, and names. For doctors, show their rank and specialty. For nurses, show their ward number. πstaff.staffno,staffname,docposition,specialty,⊥
staffstaff.staffno=doctor.staffno
doctor

πstaff.staffno,staffname,⊥,⊥,wardno
staffstaff.staffno=nurse.staffno
nurse
Select staff.staffno,staffname,docposition,specialty,null
from staff
inner join doctor
on staff.staffno=doctor.staffno
union
Select staff.staffno,staffname,null,null,wardno
from staff
inner join nurse
on staff.staffno=nurse.staffno;

Rename (ρ)

Allows you to refer to a relation or the domain of the relation by another name. It is the same concept as aliasing in SQL. If it is outside the parenthesis, it refers to the relation name. If it is inside the parenthesis, it is the domain name. Remember domains in relations are an ordered list, so you need to name the domains in order.

English Description Relational Algebra SQL Equivalent
Rename the staff relation and its domains staffno and staffname as employees, employeeid and employeename. ρemployees(employeeid,employeename)
πstaffno,staffname
staff
Select staffno as employeeid, staffname as employeename
from staff employees;

Self Join

There are two ways to do a self join in the relational algebra, the implicit and explicit way.

Implicit

Implicitly, in math (and therefore the relational algebra), you can represent a copy of something using the prime (') symbol. Thus, the programming concept:

x = x + 1

Can be represented mathematically as:

So, the following is valid relational algebra syntax:

English Description Relational Algebra SQL Equivalent
Find all patients with the same name. Display their names and patient IDs. πpatient.name, patient.patientid, patient'.patientid
(patient ⨝patient.name=patient'.name ∧ patient.patientid<patient'.patientid
patient')
Select p1.name, p1.patientid, p2.patientid
from patient p1
inner join patient p2
on p1.name=p2.name and p1.patientid<p2.patientid;

Explicit

You can use the rename operator to explicitly cast a relation:

English Description Relational Algebra SQL Equivalent
Find all patients with the same name. Display their names and patient IDs. πp1.name,p1.patientid,p2.patientid
[(ρp1patient)p1.name=p2.name ∧ p1.patientid<p2.patientid
(ρp2patient)]
Select p1.name, p1.patientid, p2.patientid
from patient p1
inner join patient p2
on p1.name=p2.name and p1.patientid<p2.patientid;

Minus (-)

Minus is used to remove records from a relation. It works like minus in SQL. Minus must be 'Union Compatible,' i.e., just like the union operator, the domains in the relations must be exactly the same.

Also, note that because the relational algebra is based on sets, doing a minus can cause records to disappear when there are duplicates. Thus, minus in relational algebra is equivalent to an SQL minus followed by a distinct. So:

English Description Relational Algebra SQL Equivalent
Find all doctors who have no patients. πstaffnodoctor
-
πdoctornopatient
Select staffno
from doctor
minus
select doctorno
from patient;

Antijoin (▷)

The antijoin is the mechanism by which we do the equivalent of type I and type II nested queries.

English Description Relational Algebra SQL Equivalent
Give the IDs and names of all doctors who have no patients. πstaffno, staffname
(staff staffno=doctorno
patient)
Select staffno, staffname
from staff
where staffno not in (
(select doctorno
from patient);
Find all unpaid items from the bills. billitem
billitem.billno=rcptdetail.billno ∧
billitem.lineno=rcptdetail.lineno

rcptdetail
Select *
from billitem
where not exists (
(select null
from rcptdetail
where billitem.billno=rcptdetail.billno
and billitem.lineno=rcptdetail.lineno);

Aggregation (γ)

The aggregation operator works just like the SQL group by. It takes three inputs, two in the subscript. The subscript input pair are an ordered list of domains to group by and an ordered list of domains to calculate- similar to the attributes in the group by and in the select clause in a regular group by. The third input is the relation to process.

Some sources separate the two subscripted parts into a left and right subscript. This ends up being a very awkward way of writing things so I have opted to put them in parenthesis.

English Description Relational Algebra SQL Equivalent
How many patients does each doctor have? γ[(staffno, staffname),(staffno, staffname, count(*))]
(staff staffno=doctorno
patient)
Select staffno, staffname, count(*)
from staff
inner join patient
on staffno=doctorno
group by staffno, staffname;

Having Does Not Exist - Use Restrict

There is no need for a having clause in relational algebra. Indeed, there really isn't a need for a having clause in SQL either- it is just bad design. Use a restrict operator (i.e., the equivalent of the SQL where clause).

In the below example, we calculate the number of patients for each doctor. We get the max of that. The relation that is the max number of patients is renamed as maxcount. We then find the staffno, staffname and number of patients for each doctor. We rename this relation as doctorwpatient. We then join doctorwpatient with maxcount by the count of patients.

English Description Relational Algebra SQL Equivalent
Which doctor(s) have the most patients? ρdoctorwpatient
[(γ[(staffno, staffname),(staffno, staffname, count(*))]
(staff staffno=doctorno
patient)]
doctorwpatient.count(*)=maxcount.max(count(*))
ρmaxcount{
γ[(),(max(count(*)))]
γ[(doctorno),(count(*))]
patient}
Select staffno, staffname, count(*)
from staff
inner join patient
on staffno=doctorno
group by staffno, staffname having count(*)>=All (Select count(*)
from patient
group by doctorno);

Sorting

You can't sort a relation. Period. Tuples in a relation are grouped together as sets. Mathematical sets have no ordering. Some websites introduce a sort operator τ. Those websites don't know what they are talking about.

Given relational algebra doesn't have sorting, doesn't that mean relational algebra sucks? No! Relational algebra is meant to allow us to study the behavior of the relations, not the way the relations are displayed. This is like the difference between content and markup. Relational algebra is a tool for manipulating the content of a database design, not the markup of the database design.

In fact, sorting in SQL is the bad decision. Since we can sort in SQL, why not add features like bolding, italics or highlighting? Sorting is a visual concept, not a data manipulation concept. Introducing sorting into SQL has meant SQL has a number of silly kludges. For example, the order by clause must be the last clause in an SQL command. In the relational algebra, the operations can occur in any order!

Another analogy is the difference between a relational database management system and a report writer like PowerBI or Tableau. That relational database management systems typically don't have good reporting tools doesn't mean they suck. You are supposed to combine tools to get your business needs fulfilled.

Assignment (←)

The assignment operator changes the value of a variable.

English Description Relational Algebra SQL Equivalent
Make x be 1. x1 Select 1 as x
from dual;

Insert

Insertion is an assignment with a union. Recall a relation is a pair comprising an ordered set of domains and a set of tuples where each tuple contains an ordered set of values corresponding to those domains. Thus, the below is a relation:

( (staffno,staffname), {(111,'New Person')} )

Given this, I can add the above to the staff relation by:

English Description Relational Algebra SQL Equivalent
Add the record to the staff relation staff'
staff

( (staffno,staffname),
{(111,'New Person')}
)
insert into staff(staffno, staffname)
values
(111,'New Person');

Delete

Similar to insertion, deletion is a minus followed by an assignment.

English Description Relational Algebra SQL Equivalent
Remove the record from the staff relation staff'
staff
-
( (staffno,staffname),
{(111,'New Person')}
)
delete from staff
where
staffno=111 and
staffname='New Person';

Update

An update is a deletion followed by an insertion.

English Description Relational Algebra SQL Equivalent
Change the record from the staff relation so 'New Person' becomes 'A New Person' staff'
staff
-
( (staffno,staffname),
{(111,'New Person')}
)

( (staffno,staffname),
{(111,'A New Person')}
)
update staff
set staffno=111,
staffname='A New Person'
where staffno=111 and
staffname='New Person';

Writing in Relational Algebra

Relational algebra is math. So, we use symbols from other math systems like discrete mathematics (set theory) and boolean logic. Most new operations used in relational algebra are lowercase greek letters.

Many relational algebra operations take multiple arguments. Often, the second or third argument is written as a subscript. For example, the project operation (π) takes two arguments, (1) a relation, and (2) a set of domains to project. Project is written like this:

πstaffno,staffnamestaff

Notice how staffno and staffname are subscripted.

Likewise, the relational algebra inner join operation (strictly theta-join) takes three arguments, the two relations to join and the join condition. The join condition is subscripted as follows:

staffstaff.staffno=doctor.staffnodoctor

The subscript is important!