Loading presentation...

Present Remotely

Send the link below via email or IM


Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.



Database Design I - 1DL300; Normalization

Sobhan Badiozamany

on 4 February 2014

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of db1-vt14-Normalization

What is a bad relational database design?
What is relational database design?
The grouping of attributes to form the relation schema.
Normal forms:
DATABASE DESIGN I - 1DL300 Spring 2014
Sobhan badiozamany
Department of Information Technology, Uppsala University

Elmasri/Navathe ch 14
Padron-McCarthy/Risch ch 11

How "grouping of attributes" affects the quality of the design?
Informal design guidelines for relational databases
Attributes of different entities (EMPLOYEEs, DEPARTMENTs, PROJECTs) should not be mixed in the same relation.
In addition, redundant data causes logical design problems:
Modification anomalies
Insertion anomalies
Deletion anomalies
Update Anomaly:
Assume we want to change the name of project number 30 [Pnumber=30] from “Newbenefits” to “software_prj”.
This requires updating all rows whose employees work on the project.
Inserting an employee that is not assigned to project is not possible.
Defining a project to which no employee is assigned yet is not possible.
How can one delete a project?

How can one delete an employee?
Insert Anomaly:
A bad design causes data redundancy
Storage space is wasted
Let's apply the informal guidelines!
1-Separate attributes of different entities.
4-Eliminate the need for NULL values.
3-Eliminate insertion, update and deletion anomalities.
2-Use foreign keys to relate entities.
What is a good design here?
and Why?
The first step towards formal specification of a "good" relational database design
First normal form - 1NF
Second normal form - 2NF
Third normal form - 3NF
Boyce-Codd normal form - BCNF
Functional Dependencies
formally specify how good a relational design is.
are constraints deduced from the mini-world concepts That describe the relationships between attributes.
set of attributes X
functionally determines a
set of attributes Y
if the value of X determines a unique value for Y.
X -> Y holds if whenever two tuples have the same value for X, they must have the same value for Y, in other words:

For any two tuples t1 and t2 in any relation instance r(R): If t1[X]=t2[X], then t1[Y]=t2[Y]

X -> Y in R specifies a constraint on all relation instances r(R)

Written as X -> Y; can be displayed graphically on a relation schema as in Figures. ( denoted by the arrow: ).

FDs are derived from the real-world constraints on the attributes
Examples in EMP_PROJ relation:
SSN --> Ename
Pnumber --> Pname
{SSN,Pnumber} --> Hours
Candidate keys, by definition, functionally determine all other attributes of the relation.
Armstrong's axioms
X, Y, Z and W are sets of attributes
Full Functional Dependency (FFD)
Prime attribute
Definition: an attribute that is a member in any of the candidate keys is called a prime attribute of R.
Attributes that is not part of any candidate key is called non-prime or non-key attributes.
Prime attributes:
non-prime attributes:
Breaking up tables to smaller ones to improve relational database design
Are quality metrics:
Set of Criteria expressed in terms of FFDs/keys
The criteria get more strict as the level of normalization increases
Following are not allowed:
composite attributes
multivalued attributes
nested relations; attributes whose values for an individual tuple are non-atomic
Only atomic values are allowed as attribute values in the relational model.
A relation schema R is in 2NF if:
It is in 1NF
partial dependencies
are allowed, that is, a part of a candidate key shall not determine any none-prime attribute in R.
Here SSN-->Ename is a partial dependency, which violates 2NF.
A relation schema R is in 3NF if:
It is in 2NF
No non-prime attribute A in R is allowed to be FFD of any other non-prime attribute.
That is,Transitive (indirect) functional dependencies shall not exist:
if X -> Y and Y -> Z hold, X -> Z also holds (transitivity)
Z is indirectly determined by X
[here Y and Z are none-prime attributes, and X is a candidate key]
Here we have SSN->Dnumber and Dnumber ->{Dname,DMGR_ssn}
So there is a transitional functional dependency:
which violates 3NF criteria.
A relation schema R is in BCNF if:
It is in 1NF
Every determinant X is a candidate key.

here C-->B violates BCNF criteria
The difference between BCNF and 3NF is that in BCNF, a prime attribute can neither be FFD of a non-key (or non-prime) attribute nor of a prime attribute (i.e. a partial key). Therefore BCNF is a more strict condition.
• BCNF is a strong condition. It is not always possible to transform (through decomposition) a schema to BCNF and keep the dependencies.
• 3NF has the most of BCNF's advantages and can still be fulfilled without giving up dependencies.
• In most practical cases, when a relation is in 3NF it is also in BCNF.
"Bad" Relational database design
"Good" Relational database design
1 NF ................. Only atomic attribute values
2 NF ................. No partial dependencies allowed
3 NF ................. No transitive FDs allowed
BCNF ................. Left hand sides of FFDs should be candidate keys
It is written as X-->Y
More formally
Functional dependency
Here is the Delete Anomaly:
Note the compound key: {SSN,Pnumber}
Relations should be designed such that their tuples will have as few NULL values as possible
Design a schema that does not suffer from the insertion, deletion and update anomalies.
Design a schema that can be explained easily relation by relation.
Foreign keys should be used to refer to other entities.
{SNAME,INAME} -->{PRICE} . . . . . . . . . . . . . . (FD/FFD?)
{SNAME,INAME} -->{SADDR} . . . . . . . . . . . . . (FD/FFD?)
{SNAME} -->{SADDR} . . . . . . . . . . . . . . . . . . . (FD/FFD?)
How should we normalize this?
Other explanations? why this table has to be broken like this?
Functional dependencies (FDs):
What is the other partial dependency?
By normalizing relations, we avoid modification, insertion and deletion anomalies.
BCNF <- 3NF <- 2NF <- 1NF
Least strict
Most strict
Ideally, no FDs should be lost during normalization
By deleting a project, all employees working in that project who don't work in any other project, will be deleted from the database.
OK, employee rows need to be deleted. What could go wrong?
Come up with a scenario where deleting an employee could potentially lead to losing information.
(Supplier name, Item name, Supplier address, price)
A | B | C
Wake up! the lecture is over!
To see the animated Prezi click:
Full transcript