An introduction to structured data modelling

This is an introductory chapter of my upcoming book ‘Data Badass’ (pictured below):

Data modelling is all about designing the way that your data is going to be organized. Think about it like building a house; you wouldn’t start laying bricks, without a plan. How would you know that the end result would meet your needs? How would you ensure it wouldn’t fall apart? You plan accordingly to ensure these things.

Similarly, when we are planning our data structures, we start to design them, in a process we call data modelling.

There are some key terms you should know. Firstly, an entity is the thing we want to store data about. Examples of entities are Products, Customers and Orders – they are something in the business environment that we wish to store data about. We can map entities to tables when we think about a database – the customers table is the implementation of the customer entity.

An attribute is a piece of information about the entity. For example, the Customer entity may have the attributes: Name, Age, Address, etc.. An attribute is a column in a database table.

Sometimes, entities are linked. For example, the customers entity and the orders entity are likely linked together – because a customer places an order. We call this a relationship; we can have three types of relationship:

  • One to one: 
    • One customer can have one order
    • One order can have one customer
  • One to many: 
    • One customer can have many orders
    • One order can have one customer
  • Many to many: 
    • One customer can have many orders
    • One order can have many customers

In the above example, we would have a one to many relationship. Because a customer can create many orders, but an order can only be placed by one customer. We would draw this relationship as below; where the forked end to the line depicts the many end of the relationship.

Another example would be about a student and online classes. A student can register for many classes and a class can include many students. Hence, this is a many to many relationship. It would be depicted as below.

Conceptual Data Models

Above, we have a very simple conceptual data model. It doesn’t provide granular details of the attributes that reside within an entity; it simply gives a high level view of the relationships we expect to have between them. In the below, I have outlined a bit more of a complex example showing the kind of data we collect about our online learning platform.

  • A course can have one instructor; but an instructor can have many courses. Hence, this is a one to many relationship.
  • One course can have many reviews, but a review can only be associated to one course. This also is a one to many relationship.
  • One student can leave many reviews but one review can only be left by one student. So again, we have a one to many relationship.
  • One course can have many students and one student can be enrolled on many courses. In this instance, we have a many to many relationship.
  • One instructor can have one bank account and one bank account can only be associated with one instructor. We now have a one to one relationship.
  • A student can add many courses to their wishlist, but that wishlist is associated to only one student. We therefore have a one to many relationship.
  • Finally, one student can have many payment methods for their courses, but each payment method will only be associated to one student. This is a one to many relationship.

These diagrams, while high level, start the thought process around how we should structure our data sources. Without these diagrams, there would be much more trial and error associated with the design and implementation of your schemas.

Logical data models

A logical data model is the next level of detail. It provides the same structural overview, but it also provides information around the attributes and the primary/foreign keys that will be used to make the relationship between the entities.

In each of the boxes above, I have identified the primary key and the foreign key for the table. We can now identify from the diagram exactly how the entities link to one another and which fields they join on.

Physical Data Model

A physical data model is yet a further extension to the modelling we have completed above. Here, we start to provide all of the fields in each of the tables along with their properties, required for the physical implementation. For example, field types, lengths and whether it can be nullable.

This diagram should now give enough information to make the implementation of your model.