DinS Written on 2018/5/3

1. Some basic knowledge about database

When we talk about database nowadays, we usually refer to relational database. A plain definition about relational database is that it models data by storing rows and columns in tables. In fact there’re two sets of terminology. One is called table, row and column/field, which is used by most programmers. The other set is called relation, tuple and attribute, which has more math-flavor in it and points to the essence of relational database. I’m a programmer so I’ll use the first set here.

Database is a complex field. As users we need an easy way to talk to it. The industry standard is called SQL, structured query language. The main operations of SQL are CRUD: create, read, update and delete. I suppose the readers know the basic knowledge of SQL. If not, you can easily find it on the web. SQL is pretty clear and easy-to-write. Normally learning SQL won’t take you a day.

In practice there’re two types of databases. One is client-server database. Each time you manipulate the database you send your request through the net to the servers and get feedback. Famous providers include Oracle, MySQL and SQLServer. The other type is embedded database. Your database is a file on disk. Each time you manipulate the database you’re doing some I/O on disk. A famous one is SQLite.

2. Database design: logical model

Why do we need to design database? If your program is well-structured, you’ll save yourself much pain in the future maintenance. The same goes for database.

Database design is not an easy task, same for program design. It is an art form that structures multiple tables in an easy-to-read model. Notice the word “art” here. Whenever this word appears, it means this task needs experience and insight. However, there’re certain ways to help you design a good database, or at least provide you some perspectives on how to design a good database.

A word in advance: although database design and object-oriented approach are totally different realms, they do share certain subtle similarity, much to my surprise and please. We’ll return to this discussion once we finish this part.

The first thing to do is writing a list of all the concepts in your project. For example, we want to make an application that organizes the tracks of artists. In this particular project, we can get the following list of concepts: tracks, album, artist, genre, rating, length, count of plays. These concepts should be enough for our simple app. Notice that locating concepts is not an easy task. Sometimes it takes years to extract concepts from real world problems. That’s why database design needs experience. Let’s just take that you’re already a field experts and can do this step well. If not, start to study soon.

The next step is to ask yourself a question for each piece of concepts: is this concept an object or an attribute of an object? The aim of this step is to get all objects and their related attributes for the app. The object will stand for a table, and attributes will be columns. There’ll usually be many concepts. the rule of thumb is to start from the central concept of the app.

We’re doing a track-managing app, thus track should be the first place to start. It’s an object, no doubt. Once we got that, let’s scan through other concepts and find that rating, length and count of plays are attributes of track, clear. Our first track table will be something like this.

Track must have a title. We’ll use title here. The table is called track.

Let’s continue. Album is not an attribute of track. Rather an album has many tracks. In other words, album is an object. An album has many tracks. Notice here that I’m not saying track is an attribute of album. Let’s put that later.

The same idea applies to artist. Artist is an object.

The hard point in this example is what genre is? At first genre seems to be an attribute of tracks, but if you look deeper it’s not that obvious. What if users want to select all tracks under certain genre? Surely you can traverse all tracks and check each genre. That, however, is not what database is supposed to work, too much time. The basic rule is: don’t put the same string in table twice; use a relation instead. Genre is an independent concept and has something to do with tracks. Let’s make it a table.

One way to help you is to argue with if-statement. If genre is an attribute of tracks, it means that if genre changes the track changes too. Is this what we want? Certainly no. Then genre is an object. In fact for some problem there’s no conclusion. As long as you can persuade others, it’s fine.

Now that we’ve identified objects and attributes, the last step is to define the relationships between objects. In relational database, there’re mainly two types of relationships: one-to-many and many-to-many. To define relationships between objects basically means that we need to decide whether a table links to another via one-to-many or many-to-many. The trick here is to ask yourself: Does A has many B or B has many A?

An album has many tracks, which is the same as saying track belongs to album. For this kind of relationship, album is one and track is many.

What about artist and album? To make matters simple, let’s now assume an album has only one artist. This means that artist is one and album is many. We’ll revisit the assumption later.

What about genre? Under certain genre there’re many tracks. Genre is one and track is many.

When we draw all the relationships out, we’ll get the logical model.

Let’s discuss what we’ve left before: DD (database design) and OO (object oriented).

First about the subtle similarity part. If you look back to what we’ve done to get the logical model, it’s very similar to OO, where we also analyze objects and attributes, and come up with UML. Putting into analogy, the table is the class, column is the attribute, and each row is an instance of the class. Behold that we can write a class to represent the track table.

| class Track

| {

| Private:

| String Title;

| Int Rating;

| Double Length;

| Int CountOfPlays;

| };

But don’t be led astray. DD and OO are totally different ways of thinking. The similarity ends here. The most obvious difference is the relationship between tables and classes. In OO classes are inherited to form a giant web, while in DD tables are linked by one-to-many or many-to-many to form a giant web. If you really want to push through, one-to-many may look like “has-a” in OO. T table contains another table. A class contains another class. As for many-to-many, though we haven’t covered yet, there’s no corresponding concept in OO.

I bring up this subject not to confuse you but aid you. Remember we’ve done three steps to come up with a logical model. First get all concepts; second get all objects and attributes; third determine relationships between object. In fact during the first and second step, OO experience can help you in DD. They’re similar because they all aim to analyze the world and organize it in structured ways. They’re not totally split.

However, the aim of OO and DD are opposite, so to speak. OO aims to organize codes in flexible ways so that future maintenance won’t change previous codes. DD aims to organize data in a way that allow rapid access of data.

So much discussion now. Let’s move next.

3. Database design: physical model

After we got the logical model, we need to turn it into physical model. If the logical model is beautiful, this step will lead to no further problem.

Some terminology first:

Primary key(PK) : a unique number for each row, usually 1,2,3,4… and stores in a column called id.

Foreign key(FK) : a column that stores the primary key of other table. Functions like a pointer.

Logical key(LK) : a column that provides easy keywords for query, usually appears in WHERE clause.

Let’s see an example how to turn logical model to physical model.

This is part of the logical model we got. Now all we have to do is adding PK to each table and project the “belong to” relation to a FK.

The physical model will look like this:

It’s a convention to use tablename_id to name the column of FK.

By the same method we can add artist and genre table to it. Here’s the outcome:

When we’ve come this far, we’re almost finished. The last thing to do is implementation. In fact using a database tool will be easier than wring code to do it.

Now when we want to inset data, we need to designate the FK. The question remains is how to get data from many tables linked by FK?We can use JOIN … ON …, a powerful SQL command.

For example, if we want to see track.title-album.title-artist.name, we can write this SQL:

SELECT Track.title, Album.title, Artist.name FROM Track JOIN Album JOIN Artist ON Track.Album_id=Album.id AND Album.Artist_id=Artist.id

It may seem complicated at first glance, but naming convention saves the day.

4. Many-to-many relationships

I’ve been avoiding this topic. This is the hard part. Once you got the hang of one-to-many, let’s look at this problem.

I said that “to make matters simple, let’s now assume an album has only one artist”. This is not the fact however. an artist has many albums for sure, and sometimes an album has many artists, say a collection of different singers. This suggests that the relationship between album and artist is many-to-many. If we put that into logical model, it look like this.

The problem is we can’t add FK of each other in tables. We need to do something else.

The answer is called junction table. Junction table has only FK. Its aim is to establish the connection between tables.

The physical model like be like this:

Of course we can add extra information in junction table to describe the attributes of relations. We’ll save it here. How to get the data using junction table? We can still use JOIN … ON …

SELECT Artist.name, Album.title FROM Artist JOIN A-A JOIN Album ON A-A.Artist_id=Artist.id AND A-A.Album_id=Album.id

Same as we do one-to-many relationships.

Well, how to understand it?

Many-to-many is one way to describe relationship. The meaning of junction table is to describe how to describe relationship. By providing a description of describing relationship we can decompose complex relations and reconstruct them to our need.