Send the link below via email or IMCopy
Present to your audienceStart remote presentation
- Invited audience members will follow you as you navigate and present
- People invited to a presentation do not need a Prezi account
- This link expires 10 minutes after you close the presentation
- A maximum of 30 users can follow your presentation
- Learn more about this feature in our knowledge base article
Do you really want to delete this prezi?
Neither you, nor the coeditors you shared it with will be able to recover it again.
Make your likes visible on Facebook?
You can change this under Settings & Account at any time.
Transcript of db1-vt13-physical
An storage/retrieval program Index Files Preferred secondary storage device for high storage capacity and low cost.
Data stored as magnetized areas on magnetic disk surfaces. Disks Disks are divided into concentric circular tracks on each disk surface. A track is divided into smaller blocks or sectors:
The block size B is fixed for each system.
Typical block sizes range from B=512 bytes to B=4096 bytes.
Whole blocks are transferred between disk and main memory for processing. A read-write head moves to the track that contains the block to be transferred.
Disk rotation moves the block under the read-write head for reading or writing.
Reading or writing a disk block is time consuming because of:
the seek time: the time it takes to move the arm.
rotational delay: the time it takes for the disk to rotate such that the head reaches the desired sector. The goal:
Minimize the number of blocks transferred.
Minimize the arms movements. The physical database is a collection of stored records that have been organized in files on the hard disk. Records are used for physical storage of:
Tuples, where each attribute in the tuple is stored as a field.
Objects in object-oriented databases. Data files Records contain fields which have values of a particular type. Block transfer is slow Block 1 Block 2 Block 3 Block 4 Block 5 Block 6 Block 7 Block 8 Block 9 Block 10 Block 11 Block 12 Block 13 Block 14 Block 15 Block 16 Block 17 Block 18 Block 19 Block 20 Block 21 Block 22 Block 23 Block 24 Block 25 Block 26 Block 27 File header Block 0 Block 1 Block 2 Block 3 Block 4 Block 5 Block 6 Block 7 Block 8 Block 9 Block 10 Block 11 Block 12 Block 13 Block 14 Block 15 Block 16 Block 17 Block 18 Block 19 Block 20 Block 21 Block 22 Block 23 Block 24 Block 25 Block 26 Block 27 Record 0 Record 1 Record 2 Record 3 Record 4 Record 5 Record 6 Record 7 Record 8 Record 9 Record 10 Record 11 Record 12 Record 13 Record 14 Record 15 Record 16 Record 17 Record 18 Record 19 Record 20 Record 21 Record 22 Record 24 Record 25 Record 26 Record 27 Record 28 File header mix of fixed and variable length records 11 bytes [characters] 7 bytes [characters] 4 bytes [integer] 11 bytes [characters] 22 bytes [characters] Field separator: space is wasted, but only where NULL values occur. locating records and fields needs more processing:
For example, location of the salary field depends on the length of name in the record. Advantages: Disadvantage: Block 1 Block 2 Block 3 Block 4 Block 5 Block 6 Block 7 Block 8 Block 9 Block 10 Block 11 Block 12 Block 13 Block 14 Block 15 Block 16 Block 17 Block 18 Block 19 Block 20 Block 21 Block 22 Block 23 Block 24 Block 25 Block 26 Block 27 File header files with Variable-field records Field separator: Record terminator: Less space is wasted, specially if there might be many NULL values. More processing is needed for:
Identifying NULL values
need to see the whole record to identify a NULL value Advantages: Disadvantage: Files with fixed length record. All fields have fixed length. Easy to locate record r, given its record number n and size of each record s:
position(r)= n * s
Easy to locate field values Space is wasted For example, consider a files containing records of this type: Advantages: Disadvantage: 50 bytes [characters] 7 bytes [characters] 4 bytes [integer] 11 bytes [characters] 40 bytes [characters] Block 0 Block 0 Block 2 Block 3 Block 4 Block 5 Block 6 Block 7 Block 8 Block 9 Block 10 Block 11 Block 12 Block 13 Block 14 Block 15 Block 16 Block 17 Block 18 Block 19 Block 20 Block 21 Block 22 Block 23 Block 24 Block 25 Block 26 Block 27 File header Block 0 There might be left over space in the file blocks Record 1 Record 2 There is space, but only two records can fit, so part of the block is wasted. Blocking factor The highest number of records that can be contained in a block is called the block factor (here: bfr) for the file of records. bfr = floor[B/R] where R is the record size and B the block size. In this example, assume a block size B is 512 bytes, and record size R is 200 bytes.
– B / R = 512/200 = 2.56
– Rounded off downwards gives bfr = 2, i.e. we can store 2 records per block. A file with r no. of records therefore require:
b = ceiling[r/bfr]
in our example, bfr=2, so a file with r=605 records needs 605/2=302.5 ~> 303 blocks. similarly, on file level: Q: How much space is wasted in total? Buffer manager Hard Disk System Data in Blocks Buffer:
portion of main memory available to store copies of disk blocks. Database system tries to minimize the number of block transfers between the disk and main memory.
We can reduce the number of disk accesses by keeping as many blocks as possible in main memory. The subsystem responsible for allocating buffer space in main memory. The DBMS engine calls the buffer manager when it needs a block from disk. The block that is thrown out is written back to disk only if it was modified since the most recent time that it was written to/fetched from the disk.
Once space is allocated in the buffer, the buffer manager reads in the block from the disk to the buffer, and passes the address of the block in main memory to the requester. The DBMS is given the address of the block in the main memory, if it is already present in the buffer. If the block is not in the buffer:
the buffer manager allocates space in the buffer for the block.
if buffer is full, it has to through out some other block. To find records, one or several blocks transferred to (one or several buffers in) main memory. These blocks can then be searched to find the records that were sought.
But which blocks?
If the address to the block containing the record is unknown one has to search through all block in the file (so called linear search). Contains information that is needed for record access:
record format etc. The database is stored as a collection of files. Each file is a sequence of records.
Records can have constant (simplest) or variable length
A file can store records of the same type (simplest) or of different type.
Specific files can be used to store specific relations (simplest) or the same file can store different relations (maybe even the whole database). File organization The organization of records within the files Heap files A record can be placed anywhere in the file where there is space Ordered files store records in sequential order, based on the value of the search key of each record. External Hashing A hash function is computed on some attribute of each record; the result specifies in which block of the file the record should be placed O o!
I need to start studying for the exam :-( Where is it in the book that
the relational algebra
is covered? Chapter 1: introduction
Chapter 6: Formal Relational Languages: the Algebra and Calculus.
... Table of contents ...
relational data model .... 55-80
relational algebra .......... 143-170
... Index of key words Relational algebra 143 bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla If there was no index/table-of-content the student had to "scan" the whole book! Indexing in DBMS's An index (or index file) is an extra file structure that is used to make the retrieval of records faster. An index file consists of records (called index entries) of the form: These entries determine the physical address for records having a certain value in their index field.
Index files are typically much smaller than the original file.
The file that should be indexed is called the data file. Search key pointer Search key (or index field): Attribute or set of attributes (data fields) used to look up records in a file. pointer (or address): The address of blocks that contain records with the given search key. Select *
from employee e
where e.manager=10; Query parser and optimizer Indexing: Book example Index evaluation metrics e.g. where salary=2000; e.g.
where salary between 2000 and 3000; Primary index Cluster index Secondary index Data Index The first record in each block is called the Anchor Record of the block. Search key pointer the same type as the ordering field (index field) for the data file. pointer to a block (block pointer) A primary index has one index record for each block in the data file, and therefore is called a sparse index (or “nondense index”)
A dense index consists of one record for each record in the data file. Cluster index is defined for files that are ordered according to a non-key field (the cluster field), i.e. several records in the data file can have the same value for the cluster field. Search key pointer the same type as the Cluster field for the data file. pointer to a block (block pointer) Search key pointer the same type as the indexing field(any field in the data
file) pointer to a block (block pointer) The data file is not sorted according to the index field. There are two different cases:
1. The index field has unique values for each record (the figure).
2. Several records in the data file can have the same values for the index field (see Elmasri/Navathe Fig 18.5) Disks The physical database is a collection of stored records that have been organized in files on the hard disk. The records in the file are ordered according to the value of a certain field, in the figure according to name. – Ordered retrieval very fast (no sorting needed).
– Next record in the order is found on the same block (except for the last record in the block)
– Search is fast (binary search - log2b)
– Insert and delete are expensive since the file must be kept sorted.
– Suitable for applications that require sequential processing of the entire file
– Need to reorganize the file from time to time to restore sequential order • To make record insertions cheaper:
– Create a temporary unsorted file a so called overflow file or transaction file (the main file is then called “the master file”)
– Update the master file periodically in accordance with the transaction file.
– These measures improve insertion time but search for records bceomes more complicated.
• Ordered files are not used that often in databases.
– Exception: when extra access paths are created, so called primary indexes. Binary search Watch between 0:50 to 01:32 The file blocks are divided into M equal-sized buckets, numbered bucket0, bucket1, ..., bucketM-1.
Typically, a bucket corresponds to one (or a fixed number of) disk block. One of the file fields is designated to be the hash key of the file.
The record with hash key value K is stored in bucket i, where i=h(K), and h is the hashing function. Search is very efficient on the hash key.
Collisions occur when a new record hashes to a bucket that is already full.
An overflow file is kept for storing such records.
Overflow records that hash to each bucket can be linked together. Hash key:
employee number emp_num
Number of Buckets M=100
h(emp_num) = emp_num mod 100 To find the block that contains the record for employee number 102 we should compute h(102)=102 mod 100 =2 Example: The block that contains emp_num=102 record. Properties Data files Physical database design DATABASE DESIGN I - 1DL300 Spring 2013
Department of Information Technology, Uppsala University Elmasri/Navathe ch 17 and18
Padron-McCarthy/Risch ch 21 and 22 Indexing in SQL Creates an index on a table. Duplicate values are allowed. CREATE INDEX index_name
ON table_name (column_name); Creates a unique index on a table. Duplicate values are not allowed. CREATE UNIQUE INDEX index_name
ON table_name (column_name); The data file is sorted on the primary key field.