Loading presentation...

Present Remotely

Send the link below via email or IM


Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.



Database Design I - 1DL300; Normalization

Sobhan Badiozamany

on 31 January 2013

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of db1-vt13-Normalization

What is a bad relational database design? Summary What is relational database design?
The grouping of attributes to form the relation schema. Normal forms: DATABASE DESIGN I - 1DL300 Spring 2013
Sobhan badiozamany
Department of Information Technology, Uppsala University Normalization Elmasri/Navathe ch 14
Padron-McCarthy/Risch ch 11 How "grouping of attributes" affects the quality of the design? Informal design guidelines for relational databases Attributes of different entities (EMPLOYEEs, DEPARTMENTs, PROJECTs) should not be mixed in the same relation. In addition, redundant data causes logical design problems: Modification anomalies Insertion anomalies Deletion anomalies Update Anomaly: Assume we want to change the name of project number 30 [Pnumber=30] from “Newbenefits” to “software_prj”. This causes updating all records whose employees work on the project. Inserting an employee that is not assigned to project is not possible.
Defining a project to which no employee is assigned yet is not possible. If you delete a project, all employees working in that project, who don't work in any other project, will be deleted from the database.
If only one employee works on a given project, deleting her would delete the project from the database. Here is the Delete Anomaly: Insert Anomaly: A bad design causes data redundancy Storage space is wasted Let's apply the informal guidelines! 1-Separate attributes of different entities. 4-Eliminate the need for NULL values. 3-Eliminate insertion, update and deletion anomalities. 2-Use foreign keys to relate entities. What is a good design here?
and Why? The first step towards formal specification of a "good" relational database design First normal form - 1NF Second normal form - 2NF Third normal form - 3NF Boyce-Codd normal form - BCNF Functional Dependencies formally specify how good a relational design is.
are constraints deduced from the mini-world concepts That describe the relationships between attributes. A set of attributes X functionally determines a set of attributes Y if the value of X determines a unique value for Y. X -> Y holds if whenever two tuples have the same value for X, they must have the same value for Y, in other words:

For any two tuples t1 and t2 in any relation instance r(R): If t1[X]=t2[X], then t1[Y]=t2[Y]

X -> Y in R specifies a constraint on all relation instances r(R)

Written as X -> Y; can be displayed graphically on a relation schema as in Figures. ( denoted by the arrow: ).

FDs are derived from the real-world constraints on the attributes Examples in EMP_PROJ relation:
SSN --> Ename
Pnumber --> Pname
{SSN,Pnumber} --> Hours Candidate keys, by definition, functionally determine all other attributes of the relation. Armstrong's axioms X, Y, Z and W are sets of attributes Full Functional Dependency (FFD) Prime attribute Definition: an attribute that is a member in any of the candidate keys is called a prime attribute of R. Attributes that is not part of any candidate key is called non-prime or non-key attributes. Prime attributes:
Pnumber non-prime attributes:
Pname Normalization Breaking up tables to smaller ones to improve relational database design Are quality metrics:
Set of Criteria expressed in terms of FFDs/keys Conditions get more strict as the level of normalization increases Following are not allowed:
composite attributes
multivalued attributes
nested relations; attributes whose values for an individual tuple are non-atomic Only atomic values are allowed as attribute values in the relational model. A relation schema R is in 2NF if:
It is in 1NF
No partial dependencies are allowed, that is, a part of a candidate key shall not determine any none-prime attribute in R. Here SSN-->Ename reveals a partial dependency, which violates 2NF. A relation schema R is in 3NF if:
It is in 2NF
No non-prime attribute A in R is allowed to be FFD of any other non-prime attribute.
That is,Transitive (indirect) functional dependencies shall not exist:
if X -> Y and Y -> Z hold, X -> Z also holds (transitivity)
Z is indirectly determined by X
[here Y and Z are none-prime attributes, and X is a candidate key] Here we have SSN->Dnumber and Dnumber ->{Dname,DMGR_ssn}
So there is a transitional functional dependency:
which violates 3NF criteria. A relation schema R is in BCNF if:
It is in 1NF
Every determinant X is a candidate key.
The difference between BCNF and 3NF is that in BCNF, a prime attribute can not be FFD of a non-key (or non-prime) attribute or of a prime attribute (i.e. a partial key). Therefore BCNF is a more strict condition. • BCNF is a strong condition. It is not always possible to transform (through
decomposition) a schema to BCNF and keep the dependencies.
• 3NF has the most of BCNF's advantages and can still be fulfilled without giving up dependencies.
• In most practical cases, when a relation is in 3NF it is also in BCNF. "Bad" Relational database design "Good" Relational database design 1 NF ................. Only atomic attribute values
2 NF ................. No partial dependencies allowed
3 NF ................. No transitive FDs allowed
BCNF ................. Left hand sides of FFDs should be candidate keys It is written as X-->Y More formally Functional dependency Note the compound key: {SSN,Pnumber} Note the compound key: {SSN,Pnumber} Relations should be designed such that their tuples will have as few NULL values as possible Design a schema that does not suffer from the insertion, deletion and update anomalies. Design a schema that can be explained easily relation by relation. Foreign keys should be used to refer to other entities. {SNAME,INAME} -->{PRICE} . . . . . . . . . . . . . . (FD/FFD?)
{SNAME,INAME} -->{SADDR} . . . . . . . . . . . . . (FD/FFD?)
{SNAME} -->{SADDR} . . . . . . . . . . . . . . . . . . . (FD/FFD?) FFD FD FFD How should we normalize this? Other explanations? why this table has to be broken like this? Functional dependencies (FDs): What is the other partial dependency? By normalizing relations, we avoid modification, insertion and deletion anomalies. BCNF <- 3NF <- 2NF <- 1NF Least strict Most strict Ideally, no FDs should be lost during normalization
Full transcript