Dev Journal #3 - ERD and Data Normalization
Entity Relationship Diagram
Initial draft of ERD
Exercise tracking consists of the four main entities: Users
, Exercises
, Sessions
, and Logs
. Users
is the entity that will represent the user accounts and profile information. Exercises
will hold information about each type of exercise that can be tracked. Sessions
will represent each workout session. Finally, Logs
will hold information of each set performed within the workout sessions.
In order to organize each workout so that I can query and create statistics, I must normalize the information into a separate table listed above. This process of breaking down the information into granular bits is called data normalization. It is crucial to properly normalize the data to avoid redundant or inconsistent data. There are multiple levels of normalizations that can be applied to the relational database. For most cases, level three, or also known as the third normal form is considered an adequate level of normalization. In the following sections, I will give a brief explanation of how to apply the first, second and third normal forms with related examples.
First Normal Form
For the sake of creating an example, let’s say I performed
- 1 set of pull ups on an angled pull up bar to exercise my back muscles
- 1 set of pull ups on a straight pull up bar to exercise my back muscles
- 1 set of muscle up on the rings to exercise my back muscles
- 1 set of dips on a straight pull up bar to exercise my chest muscles
- 2 sets of dips on the parallel bars to exercise my c hest muscles
- 2 sets of dips on the rings to exercise my chest muscles
The equivalent piece of information translated into the database without any normalization would look like this:
exercise_info |
---|
set 1 of pull ups on angled pull up bar for back muscle |
set 2 of pull ups on straight pull up bar for back muscle |
set 1 of muscle ups on rings for back muscle |
set 1 of dips on straight pull up bars for chest muscle |
set 2 of dips on parallel bars for chest muscle |
set 3 of dips on parallel bars for chest muscle |
set 4 of dips on rings for chest muscle |
set 5 of dips on rings for chest muscle |
A relation in the first normal form (1NF) should only store atomic values to each table attributes, and each row should be uniquely identified. To achieve 1NF, I split each row into multiple columns where each column holds indivisible information. Note that I don’t actually have the column bar_type in my real application. This info was added as an afterthought to explain 3NF violation in the later section.
set_number | exercise_name | equipment | muscle_group | bar_type |
---|---|---|---|---|
1 | pull ups | pull up bar | back | angled |
2 | pull ups | pull up bar | back | straight |
1 | muscle ups | rings | back | null |
1 | dips | pull up bar | chest | straight |
2 | dips | parallel bars | chest | null |
3 | dips | parallel bars | chest | null |
4 | dips | rings | chest | null |
5 | dips | rings | chest | null |
Second Normal Form
The second normal form (2NF) aims to remove the functional dependency on anything other than the entity’s candidate key. The candidate key is the combination of attributes that uniquely identifies a row in the database. In my example, the candidate key is (set_number, exercise_name, equipment) and the functional dependency formula can be defined as:
FD = { set_number, exercise_name → equipment, muscle_group, bar_type }
This schema is not in 2NF because exercise_name → muscle_group, as in exercise_name alone can uniquely determine the value of muscle_group. Because exercise_name is a subset of the candidate key, this violates 2NF. To convert my example into 2NF, I can move the exercising muscle group information into a separate table named Exercises
. ExerciseLogs
table should now refer to this data in Exercises
table by a foreign key.
Exercises
exercise_name | muscle_group |
---|---|
pull ups | back |
muscle ups | back |
dips | chest |
ExerciseLogs
set_number | exercise_name | equipment | bar_type |
---|---|---|---|
1 | pull ups | pull up bar | angled |
2 | pull ups | pull up bar | straight |
1 | muscle ups | rings | null |
1 | dips | pull up bar | straight |
2 | dips | parallel bars | null |
3 | dips | parallel bars | null |
4 | dips | rings | null |
5 | dips | rings | null |
Third Normal Form
A relation is in third normal form (3NF) if it already satisfies 2NF requirements and no non-prime attribute is transitively dependent on the candidate keys. The newly created Exercises
table meets the 3NF requirements. The candidate key is exercise_name and { exercise_name → muscle_group } is the only functional dependency. However, ExerciseLogs
table fails 3NF due to the transitive functional dependency { bar_type → equipment } which is a dependency between a non-prime attribute to another non-prime attribute. bar_type values are non-trivially dependent on specific equipment values which in this case only applies to the pull up bar. To fix 3NF violations, I can further extract the equipment information into a separate entity with an identifiable primary key like the following:
Exercises
exercise_name | muscle_group |
---|---|
pull ups | back |
muscle ups | back |
dips | chest |
Equipments
equipment_id | equipment_type | bar_type |
---|---|---|
0 | pull up bar | angled |
1 | pull up bar | straight |
2 | parallel bars | null |
3 | rings | null |
ExerciseLogs
set_number | exercise_name | equipment_id |
---|---|---|
1 | pull ups | 0 |
2 | pull ups | 1 |
1 | muscle ups | 3 |
1 | dips | 1 |
2 | dips | 2 |
3 | dips | 2 |
4 | dips | 3 |
5 | dips | 3 |
In some cases where 3NF relation has multiple overlapping candidate keys, it may require additional normalization to satisfy the Boyce Codd Normal Form (BCNF). But as mentioned earlier, generally normalizing to 3NF is enough to avoid anomalies during an update, insert and delete operation. Higher level of normalization also comes down with performance costs because it increases the number of tables and joins. I am sure I will come across more complicated normalization problems later down the road but I think this is good enough of a review for now.