This is the Data models course theme. In this section we introduce concepts for modelling data and language for communicating those models.
[Complete set of notes PDF 870Kb].
Databases are pervasive in modern society. So many of our actions and attributes are logged and stored in organised information repositories, or Databases
[Section notes PDF 227Kb].
These are all examples of relatively simple databases. All of the information is textual or referential.
Databases don't have to store just text. Increasingly Database servers are storing, indexing and delivering rich-media content, explicitly images, audio and video.
Microsoft's next generation File storage system (WinFS) is a relational database. From a user perspective, searching (the process of indexing content by keyword) is already mainstream. Users are moving away from rigid directory structures (files and folders) and towards keyword-tagged content.
We've seen that databases are used in a variety of contexts. Those roles imply properties of each of the systems. An interlinked text-only database (such as Unix/Linux's MAN pages) will require much less storage than a video archive.
Some databases are perceptually more complex. Ford's staff management model would be represented as a matrix (in this case 2 dimensional). Computers are very good at organising multi-dimensional space.
A single definition of a database is hard to come by. Dictionary.com defines a database as: a comprehensive collection of related data organized for convenient access, generally in a computer. The Wikipedia definition runs for several pages.
In some of your previous lab assignments, or practical experience, you may have been faced with the problem of caching information persistently in a file, later to be reloaded.
When writing the data into the file, we are storing more than just that information. We are storing implicitly a design/grammar for that data. That implicit design is evident when accessing the file with a naive interface. If you try to read the data out in a different order, it fails.
A better solution is to split the way the information is stored from the actually information stored.
When considering the database systems as a whole, we need to look at all the components, including elements that interact with the DBMS (users, whom we categorise for simplicity).
This course will contain a discussion of the components that make up the system and the way they interact (system architecture).
A data model is an invention. It is a construct that allows us to share an understanding of how the system works. As with all good constructs, it's an abstraction; a simplification; a story.
In this course we're going to look at the Relational model, where the database is organised into tables (relationals) and each row (tuple) within that relation is coded (keyed) to allow referencing between the relationals.
The Relational model, inspite of being innovated in the 1970s is still the most popular, underpining mainstream modern databases such as Oracle 10i and MySQL 5.0
As programming languages are becoming increasingly Object orientated, programmers require a means of persistently storing their Objects. Object Orientated Databases (OODBs) exist to fulfil this purpose. OODBs may ultimately replace relational databases, but it's not clear at this stage when.
MySQL is a highly general database system, in that it supports many different designs. My mobile phone address book is a highly specific database system and as such is not easily extensible.
Earlier I made mention of this problem. Databases tie into the wider Software Engineering field. Within Software Engineering, post-development issues of code re-use, maintenance, future evolution etc. necessitate a logical flexible approach to program design. Databases are such an approach. In order to store information in a database you invest a small amount of time in explicitly structuring it, however you then get things like flexibility (data independence etc.) for free.
Here's a summary of what we need from a DBMS
...and this is what you get for free. These are the consequences, largely positive, of adopting a database approach to an information storage problem.
What follows here is an introduction to the terms which make up the language that we use to describe data models.
A data model is a model that describes in an abstract way how data are represented in a business organization, an information system or a database management system - Wikipedia
[Section notes PDF 310Kb].
The concept of a relation is abstract, therefore we have a number of different ways of visualising it.
All the operations described in the next few sections are abstract. We're going to see how valuable they can be in processing real world data later.
At this stage we're talking about set theoretical operators on the Relational model, not SQL instructions which confusingly have identical names and only similar behaviour.
The result set of the operation is itself a relational. That output relation will contain the same number of rows as the input, however it may contain a different number of columns; fewer if a subset of attributes is projected; more if derived or aggregated attributes are included.
In this lecture we look at...
[Section notes PDF 233Kb].
One number identifies a single tuple in one relation (local), one number identifies a single tuple in another relation (foreign).
int a=0; int b=0; a = &b;
In databases, typically done with unique identifiers (IDs) rather than memory addresses.
typedef struct car { int ID; char[] make; char[] model; char[] derivative; int optionID; } car; typedef struct option { int ID; char[] name; int price; } option; car c; option o; //...data structure populating c.optionID = o.ID;
In this lecture we look at...
[Section notes PDF 75Kb].
In this lecture we look at...
[Section notes PDF 86Kb].