A database is a collection of data that is organised in a way that makes it easily accessed and maintained.
We normally store data using a relational database. This means that we’ll store the data in more than one table, which will be linked together by certain fields.
To better understand databases some key terms are worth learning.
In this lesson, we will learn about:
When talking about types of databases, we often mean between flat-file databases and relational databases.
Flat-file databases are where all data is stored in a single table, whereas relational databases are where the data is split across several tables.
However, there are also different types of relational databases. We will normally classify them as:
Let’s look into each of these types of relational databases in more detail.
This is a database management system that is designed to store data locally, to be accessed on an individual personal computer system.
An example of this is Microsoft Access.
These systems are often specifically targeted to be simple to use, often not requiring any understanding of the SQL that underlines relational databases
Instead, they use an easy-to-use GUI that makes it easier to navigate and perform actions with.
This is a database management system that is designed to be stored on a server that can be accessed over a network by a large number of users on client devices.
This is like a website’s database. Most websites store usernames and passwords, as well as many other things, in a database kept on the website server. Then, hundreds or even thousands of users can submit data to the database and retrieve data from it.
These systems often require far greater technical knowledge, requiring users to perform operations using SQL in a command-line interface rather than through a GUI.
Proprietary server RDBMS software, such as Oracle, is often very expensive.
However, a number of free open-source options, such as MySQL, are very commonly used, particularly on the web.
To understand the structure of a database, we need to know a few important terms. These are:
Let’s look at each of these in more detail.
Relations
Relations are the tables in a database. Each table is used to store information about a specific topic or type of data.
Think of a table that you might use to keep track of your friends’ contact information, with each row containing details about a different friend. That whole table is called a relation.
Tuples
Tuples are the rows in a table. Each row (or tuple) represents a single item or record in the table.
If we go back to our friends’ contact information table, one row (tuple) might have the details for one of your friends, such as their name, phone number, and email address.
Domains
Domains are the set of possible values that an attribute can have. It’s like a rule for what kind of data can go into each column.
For instance, the domain for the “Phone Number” attribute might be that it must be a string of digits, while the domain for the “Email Address” attribute might be that it must be a string that contains an “@” symbol.
Cardinality
Cardinality refers to how unique an attribute is in terms of its data values. Where there are many unique values, this is known as “High Cardinality”. Where there are many repeated values, this is known as “Low Cardinality”.
For example, in the friends table several friends may have the same name, meaning the name attribute has low cardinality.
Imagine you have a table called “Friends” in a database. Here’s how these terms fit together:
To manipulate and retrieve data from a database, we use Structured Query Language.
The basis of SQL, and other query languages, is defined by a mathematical query language called relational algebra.
Relational algebra uses several operations to allow us to work with our database. These include:
Let’s look at each of these in more detail.
Select
This retrieves a subset of tuples from a relation. This subset is made up of those tuples that meet certain criteria.
We represent the select operation with the symbol σ.
Union
This retrieves all the tuples from two relations combined together. This requires both relations to have the same number of fields, with matching data types. Any duplicate tuples following the union are removed.
We represent a union operation with the symbol ∪.
Intersect
This is similar to a Union, but only retrieves the tuples from two relations that appear in both relations.
We represent an intersect operation with the symbol ∩.
Join
This combines the attributes of two relations. So, the columns from the first table will be shown alongside those in the second. This requires the two relations to have one attribute in common.
We represent a join operation with the symbol ⋈.
A relational database stores data across multiple tables that are linked together through related fields.
Desktop databases are designed to store data locally.
Server databases are designed to be accessed over a network.
A relation is a table in a database, a tuple is a row in a database table, and an attribute is an individual data item in a record.
The domain is all the possible allowable values for an attribute, and cardinality is how unique an attribute is in terms of its data values.
A select operation retrieves a subset of tuples from a relation, and a union operation retrieves all the tuples from two relations.
An intersect operation retrieves the tuples from two relations that appear in both relations, and a join operation combines the attributes from two relations.