Introduction

Database Normalization is a technique of organizing the data in the database. Normalization is a systematic approach of decomposing tables to eliminate data redundancy(repetition) and undesirable characteristics like Insertion, Update and Deletion Anomalies. It is a multi-step process that puts data into tabular form, removing duplicated data from the relation tables. Normalization is used for mainly two purposes,

  • Eliminating redundant(useless) data.
  • Ensuring data dependencies make sense i.e data is logically stored.

Normalization of a Database is achieved by following a set of rules called ‘forms’ in creating the database

  • First Normal Form (1NF)
  • Second Normal Form (2NF)
  • Third Normal Form (3NF)
  • Boyce-Codd Normal Form (BCNF)

First Normal Form

The first normal form expects you to follow a few simple rules while designing your database, and they are:

  • Each column contains atomic values, and there are not repeating groups of columns.
    Each column of your table should be single valued which means they should not contain multiple values.
  • Attribute Domain should not change
    In each column the values stored must be of the same kind or type.
  • Unique name for Attributes/Columns
    This rule expects that each column in a table should have a unique name. This is to avoid confusion at the time of retrieving data or performing any other operation on the stored data.
  • Order doesn’t matters
    This rule says that the order in which you store the data in your table doesn’t matter.
  • No rows are duplicate.

Second Normal Form

For a table to be in the Second Normal Form, it must satisfy two conditions:

  • The table should be in the First Normal Form.
  • There should be no Partial Dependency i.e. All the non-key columns are dependent on the table’s primary key.
  • All Non-key attributes are fully functionally dependent on primary key. If primary is is not composite key then all non key attributes are fully functionally dependent on primary key.

The primary key provides a means to uniquely identify each row in a table. When we talk about columns depending on the primary key, we mean, that in order to find a particular value, you would first have to know the primary key. Partial Dependency exists, when for a composite primary key, any attribute in the table depends only on a part of the primary key and not on the complete primary key.

Third Normal Form

A table is in third normal form if:

  • A table is in 2nd normal form.
  • It contains only columns that are non-transitively dependent on the primary key

When something is transitive, then if something applies from the beginning to the end, it also applies from the middle to the end. When Table 1 is functionally dependent on Table 2, and Table 2 is functionally dependent on Table 3 then Table 3 is transitively dependent on Table 1 via Table 2.

An object has a dependence on another object when it relies upon it.  In the case of databases, when we say that a column has a dependence on another column, we mean that the value can be derived from the other.  For example, my age is dependent on my birthday.

Putting the two words together to formulate a meaning for transitive dependence, it is simplest to think of transitive dependence to mean a column’s value relies upon another column through a second intermediate column.

For example, in Table 1 Employee No determines the Salary Slip No, and Salary Slip no Determines Employee name. Therefore Employee No determines Employee Name.We have transitive functional dependency so that this structure not satisfying Third Normal Form. For that split tables into two tables as shown in Table 2.

# Table 1
Emp No	SS No	EmpName	Salary
1	       0001	       Amit	                50000
2	       0002	       Divya	        40000
3	       0003	      Rama	        57000
 
# Table 2
Employee table:
EmpNo	SS No	EmpName
1	        0001	Amit
2	        0002	Divya
3	        0003	Rama

Salary Table:
SS No	Salary
0001	50000
0002	40000
0003	57000

Reference

Database Normalization