First normal form

Property of a relation in a relational database From Wikipedia, the free encyclopedia

First normal form (1NF) is a property of a relation in a relational database. A relation is in first normal form when no attribute domain has relations as elements.[1] Or in SQL terminology, when no table column can have tables as values. The purpose of 1NF is to make the data model more flexible, to simplify the relational algebra and query language, and to enable further normalization levels which reduce redundancy and anomalies.[2] Standard SQL does not support creating or using table-valued columns, which means most relational databases will be in first normal form by necessity. Normalization to 1NF is the process of eliminating nested data records and instead representing relationships between relations through foreign keys[3]. This in contrast to hierarchical databases and document databases where complex data structures are represented as nested records. Such database systems are sometimes called NoSQL systems.

Overview

In a hierarchical database, a record can contain sets of child records ― known as repeating groups or table-valued attributes. If such a data model is represented as relations, a repeating group would be an attribute where the value is itself a relation. First normal form eliminates nested relations by turning them into separate "top-level" relations associated with the parent row through foreign keys rather than through direct containment.[4]

The purpose of this normalization is to increase flexibility and data independence, and to simplify the data language.[5] It also opens the door to further normalization, which eliminates redundancy and anomalies.[6]

Most relational database management systems do not support nested records, so tables are in first normal form by default. In particular, SQL does not have any facilities for creating or exploiting nested tables. Normalization to first normal form would therefore be a necessary step when moving data from a hierarchical database to a relational database.

Definition

A relation is in first normal form when no attribute domain has relations as elements.[7]

In a relation, each attribute has a domain which is the set of possible values. (For example the set of integers in a given range.) But a domain can be any set, hence it might also contain relations as elements.

A tuple contains exactly one element from the attribute domain per attribute, but if the domain contains relations, this value would be a relation which in turn may contain multiple tuples and attributes.

A domain which contains relations is called a nonsimple domain, while a domain which does not contain relations is called a simple domain. The term repeating group is analogous to a nonsimple domain.[8] Normalization to 1NF is thus a process of eliminating nonsimple domains from all relations.

E. F. Codd uses the terms atomic and nondecomposable for elements of simple domains[9]. Thus an atomic value is any value which is not a relation. Such values cannot be decomposed using just relational algebra operators like selection or projection.

Rationale

The rationale for normalizing to 1NF:[10]

  • Allows presenting, storing and interchanging relational data in the form of regular two-dimensional arrays. Supporting nested relations would require more complex data structures.
  • Simplifies the data language, since any data item can be identified just by relation name, attribute name and key. Supporting nested relations would require a more complex language with support for hierarchical data paths in order to address nested data items.
  • Representing relationships using foreign keys is more flexible, where a hierarchical model only can represent one-to many relationships.
  • Since locating data items is not directly coupled to the parent-child hierarchy, the database is more resilient to structural changes over time.
  • Makes further normalization levels possible which eliminate data redundancy and anomalies.

Drawbacks and criticism

  • Performance worsens for certain operations. In a hierarchical model, nested records are physically stored after the parent record, which means a whole sub-tree can be retrieved in a single read operation. In a 1NF form, it will require a join operation per record type, which can be costly, especially for complex trees. For this reason document databases eschew 1NF.
  • Object-oriented languages represent runtime state as trees or directed graphs of objects connected by pointers or references. This does not map cleanly to a 1NF relational database, a problem sometimes called the object-relational impedance mismatch and which object relational mapper (ORM) libraries try to bridge.
  • 1NF has been interpreted as not allowing complex data types for values. This is open to interpretation though, and C. J. Date has argued that values can be arbitrarily complex objects.[citation needed]

History

First normal form was introduced in 1970 by Edgar F. Codd in the paper A Relational Model of Data for Large Shared Data Banks, although it was initially just called "Normal Form". It was renamed to "First Normal Form" when additional normal forms were introduced in the paper Further Normalization of the Relational Model in 1971.[11]

Examples

Summarize
Perspective

The following scenarios first illustrate how a database design might violate first normal form, followed by examples that comply.

Designs that violate 1NF

This table over customers' credit card transactions does not conform to first normal form:

More information Customer, Customer ID ...
CustomerCustomer IDTransactions
Abraham1
Transaction IDDateAmount
12890 2003-10-14 87
12904 2003-10-15 50
Isaac2
Transaction IDDateAmount
12898 2003-10-14 21
Jacob3
Transaction IDDateAmount
12907 2003-10-15 18
14920 2003-11-20 70
15003 2003-11-27 60
Close

To each customer corresponds a 'repeating group' of transactions. Such a design can be represented in a hierarchical database but not a SQL database, since SQL does not support nested tables.

The automated evaluation of any query relating to customers' transactions would broadly involve two stages:

  1. Unpacking one or more customers' groups of transactions allowing the individual transactions in a group to be examined, and
  2. Deriving a query result based on the results of the first stage

For example, in order to find out the monetary sum of all transactions that occurred in October 2003 for all customers, the system would have to know that it must first unpack the Transactions group of each customer, then sum the Amounts of all transactions thus obtained where the Date of the transaction falls in October 2003.

One of Codd's important insights was that structural complexity can be reduced. Reduced structural complexity gives users, applications, and DBMSs more power and flexibility to formulate and evaluate the queries. A more normalized equivalent of the structure above might look like this:

Designs that comply with 1NF

To bring the model into the first normal form, we can perform normalization. Normalization (to first normal form) is a process where attributes with non-simple domains are extracted to separate stand-alone relations. The extracted relations are amended with foreign keys referring to the primary key of the relation which contained it. The process can be applied recursively to non-simple domains nested in multiple levels.[12]

In this example, Customer ID is the primary key of the containing relations and will therefore be appended as foreign key to the new relation:

More information Customer, Customer ID ...
CustomerCustomer ID
Abraham1
Isaac2
Jacob3
Close
More information Customer ID, Transaction ID ...
Customer IDTransaction IDDateAmount
1128902003-10-1487
1129042003-10-1550
2128982003-10-1421
3129072003-10-1518
3149202003-11-2070
3150032003-11-2760
Close

In the modified structure, the primary key is {Customer ID} in the first relation, and {Customer ID, Transaction ID} in the second relation.

Now each row represents an individual credit card transaction, and the DBMS can obtain the answer of interest, simply by finding all rows with a Date falling in October, and summing their Amounts. The data structure places all of the values on an equal footing, exposing each to the DBMS directly, so each can potentially participate directly in queries; whereas in the previous situation some values were embedded in lower-level structures that had to be handled specially. Accordingly, the normalized design lends itself to general-purpose query processing, whereas the unnormalized design does not.

It is worth noting that this design meets the additional requirements for second and third normal form.

Atomicity

Summarize
Perspective

Edgar F. Codd's definition of 1NF makes reference to the concept of 'atomicity'. Codd states that the "values in the domains on which each relation is defined are required to be atomic with respect to the DBMS."[13] Codd defines an atomic value as one that "cannot be decomposed into smaller pieces by the DBMS (excluding certain special functions)"[14] meaning a column should not be divided into parts with more than one kind of data in it such that what one part means to the DBMS depends on another part of the same column.

Hugh Darwen and Chris Date have suggested that Codd's concept of an "atomic value" is ambiguous, and that this ambiguity has led to widespread confusion about how 1NF should be understood.[15][16] In particular, the notion of a "value that cannot be decomposed" is problematic, as it would seem to imply that few, if any, data types are atomic:

  • A character string would seem not to be atomic, as the RDBMS typically provides operators to decompose it into substrings.
  • A fixed-point number would seem not to be atomic, as the RDBMS typically provides operators to decompose it into integer and fractional components.
  • An ISBN would seem not to be atomic, as it includes language and publisher identifier.

Date suggests that "the notion of atomicity has no absolute meaning":[17][18] a value may be considered atomic for some purposes, but may be considered an assemblage of more basic elements for other purposes. If this position is accepted, 1NF cannot be defined with reference to atomicity. Columns of any conceivable data type (from string types and numeric types to array types and table types) are then acceptable in a 1NF table—although perhaps not always desirable; for example, it may be more desirable to separate a Customer Name column into two separate columns as First Name, Surname.

1NF tables as representations of relations

Summarize
Perspective

According to Date's definition, a table is in first normal form if and only if it is "isomorphic to some relation", which means, specifically, that it satisfies the following five conditions:[19]

  1. There's no top-to-bottom ordering to the rows.
  2. There's no left-to-right ordering to the columns.
  3. There are no duplicate rows.
  4. Every row-and-column intersection contains exactly one value from the applicable domain (and nothing else).
  5. All columns are regular [i.e. rows have no hidden components such as row IDs, object IDs, or hidden timestamps].

Violation of any of these conditions would mean that the table is not strictly relational, and therefore that it is not in first normal form.

Examples of tables (or views) that would not meet this definition of first normal form are:

  • A table that lacks a unique key constraint. Such a table would be able to accommodate duplicate rows, in violation of condition 3.
  • A view whose definition mandates that results be returned in a particular order, so that the row-ordering is an intrinsic and meaningful aspect of the view. (Such views cannot be created using SQL that conforms to the SQL:2003 standard.) This violates condition 1. The tuples in true relations are not ordered with respect to each other.
  • A table with at least one nullable attribute. A nullable attribute would be in violation of condition 4, which requires every column to contain exactly one value from its column's domain. This aspect of condition 4 is controversial. It marks an important departure from Codd's later vision of the relational model,[20] which made explicit provision for nulls.[21] First normal form, as defined by Chris Date, permits relation-valued attributes (tables within tables). Date argues that relation-valued attributes, by means of which a column within a table can contain a table, are useful in rare cases.[22]

See also

References

Further reading

Loading related searches...

Wikiwand - on

Seamless Wikipedia browsing. On steroids.