In this article, I am going to discuss Database Normalization in SQL Server with real-time examples.
What is Normalization?
Database normalization is the process of reorganizing data in a relational database in accordance with the series of so-called normal forms in order to reduce the amount of redundant data and also, at the same time, to improve upon the data integrity. Normalization is a process to organize the data into database tables. To make a good database design, you have to follow Normalization practices. Without normalization, a database system might be slow, inefficient and might not produce the expected result. Normalization reduces data redundancy and inconsistent data dependency. The two principles that govern the process of Database Normalization are:
- There is no redundant data available (all the data is stored only at one place)
- Data dependencies should be logical (all related data should be stored together)
The database is a centralized place to store the data and it should not accept duplicate, bad, and redundant data to store in it. So, that the end-user can trust the data i.e. the Data Integrity should be there.
First Normal Form (1NF):
- The First Normal Form is achieved when each of the table cell contains only a single value, and also, each record needs to be unique.
- The first point seems to be clear enough that there can’t be more than one value assigned for each column in a database table.
- The second point is achieved by the usage of Primary Key (A Primary key is the single column of a table that uniquely identifies a database record.)
- A point to note here is that the Primary Key can be composed of more than one key. A Composite Key is the set of columns that can be used in conjunction with each other to uniquely identify a database record.
These definitions would help in understanding the concepts of a table, database and etc. Please take a look at these if you feel are required:
- A Table can be defined as a set of data elements using database rows and columns
- A Table cell can be defined as the value when a specific table row and column intersect
- A Table record / Row is a specific set of values for a set of columns that define a table
- A Column is defined as a specific element of a table and a combination of such columns constitute a table
Second Normal Form (2NF)
A database table is said to be in 2NF if it is in 1NF and contains only those fields/columns that are functionally dependent(means the value of the field is determined by the value of another field(s)) on the primary key. In 2NF we remove the partial dependencies of any non-key field.The process of converting the database table into 2NF is as follows:
- Remove the partial dependencies(A type of functional dependency where a field is only functionally dependent on the part of primary key) of any non-key field.
- If field B depends on field A and vice versa. Also for a given value of B, we have only one possible value of A and vice versa, Then we put the field B into new database table where B will be the primary key and also marked as a foreign key in a parent table.
Third Normal Form (3NF):
- A table can be taken to the 3rd Normal Form only if it is fully compliant with the rules laid for being called 2nd NF.
- The next rule that comes into picture is that there should be no transitive functional dependencies.
- A transitive functional dependency is that scenario when changing a non-key column may affect in changing any other non-key column values.
- Ideally, a table cannot go further than this level in the process of normalization, but for the sake of knowing things, we will go through the others as well.
BCNF (Boyce-Codd Normal Form)
Even when a database is in 3rd Normal Form, still there would be anomalies resulted if it has more than one Candidate Key.
Sometimes is BCNF is also referred as 3.5 Normal Form.
4NF (Fourth Normal Form)
If no database table instance contains two or more, independent and multivalued data describing the relevant entity, then it is in 4th Normal Form.
Fifth Normal Form (5NF):
- Firstly, a table can be in the 5th NF only if it is compliant with all the rules laid for it to be called 4th NF.
- A database table should no longer be in a state to be decomposed to further tables without losing any data, only then it will be called to be in the 5th NF form.
In order to achieve this, the first and most important thing is how you design your database? Below are the some points which we need to know before creating database tables.
- The first database design mistake the developers are doing is they don’t put the proper primary key and unique keys or candidate keys.
- The second design mistake people do is, they are adding multiple values into a single column. For example, let say a customer buys multiple products, then what people are doing is, they are adding all products in a single column separated them either by comma or pipe symbol.
- Repeating group problems means we are creating columns that are exactly the same. In order to solve the above repeating group problem what we need to do is we need to create another table called Product i.e. to resolve this problem we need to split the table. So, what we are going to do is, we are going to create two tables, one table is going to hold Customer data and the other table is going to hold product information.
- The fourth design problem that lots of developers do is, they make non-key columns (columns without primary key) in the tables which are not dependent on the primary key column.
So if we take care of the above problems before creating the database tables then we are able to create a proper and efficient database. And this is called the Normalization of Database.