Introduction
Database Normalization is a technique of organizing the data in the database. Normalization is a systematic approach of decomposing tables to eliminate data redundancy(repetition) and undesirable characteristics like Insertion, Update and Deletion Anomalies. It is a multi-step process that puts data into tabular form, removing duplicated data from the relation tables. Normalization is used for mainly two purposes,
- Eliminating redundant(useless) data.
- Ensuring data dependencies make sense i.e data is logically stored.
Normalization of a Database is achieved by following a set of rules called ‘forms’ in creating the database
- First Normal Form (1NF)
- Second Normal Form (2NF)
- Third Normal Form (3NF)
- Boyce-Codd Normal Form (BCNF)
First Normal Form
The first normal form expects you to follow a few simple rules while designing your database, and they are:
- Each column contains atomic values, and there are not repeating groups of columns.
Each column of your table should be single valued which means they should not contain multiple values. - Attribute Domain should not change
In each column the values stored must be of the same kind or type. - Unique name for Attributes/Columns
This rule expects that each column in a table should have a unique name. This is to avoid confusion at the time of retrieving data or performing any other operation on the stored data. - Order doesn’t matters
This rule says that the order in which you store the data in your table doesn’t matter. - No rows are duplicate.
Second Normal Form
For a table to be in the Second Normal Form, it must satisfy two conditions:
- The table should be in the First Normal Form.
- There should be no Partial Dependency i.e. All the non-key columns are dependent on the table’s primary key.
- All Non-key attributes are fully functionally dependent on primary key. If primary is is not composite key then all non key attributes are fully functionally dependent on primary key.
The primary key provides a means to uniquely identify each row in a table. When we talk about columns depending on the primary key, we mean, that in order to find a particular value, you would first have to know the primary key. Partial Dependency exists, when for a composite primary key, any attribute in the table depends only on a part of the primary key and not on the complete primary key.
Third Normal Form
A table is in third normal form if:
- A table is in 2nd normal form.
- It contains only columns that are non-transitively dependent on the primary key
When something is transitive, then if something applies from the beginning to the end, it also applies from the middle to the end. When Table 1 is functionally dependent on Table 2, and Table 2 is functionally dependent on Table 3 then Table 3 is transitively dependent on Table 1 via Table 2.
An object has a dependence on another object when it relies upon it. In the case of databases, when we say that a column has a dependence on another column, we mean that the value can be derived from the other. For example, my age is dependent on my birthday.
Putting the two words together to formulate a meaning for transitive dependence, it is simplest to think of transitive dependence to mean a column’s value relies upon another column through a second intermediate column.
For example, in Table 1 Employee No determines the Salary Slip No, and Salary Slip no Determines Employee name. Therefore Employee No determines Employee Name.We have transitive functional dependency so that this structure not satisfying Third Normal Form. For that split tables into two tables as shown in Table 2.
# Table 1 Emp No SS No EmpName Salary 1 0001 Amit 50000 2 0002 Divya 40000 3 0003 Rama 57000 # Table 2 Employee table: EmpNo SS No EmpName 1 0001 Amit 2 0002 Divya 3 0003 Rama Salary Table: SS No Salary 0001 50000 0002 40000 0003 57000