Introducing 

Prezi AI.

Your new presentation assistant.

Refine, enhance, and tailor your content, source relevant images, and edit visuals quicker than ever before.

Loading…
Transcript

Data Representation in Computer Systems

Edgar C.

The general idea behind positional numbering systems is that a numeric value is represented through increasing powers of a radix. This is often a referred to as a weighted numbering system because each position is weighted by a power of the radix.

Positional Numbering Systems

Gottfried Leibniz was the first to generalize the idea of the decimal system to other bases. Because of its simplicity, the binary numbering system translates easily into electronic circuitry.

Decimal to Binary Conversions

Converting Unsigned Whole Numbers

Conversion between base systems can be done by using either repeated subtraction or a division-remainder method. The subtraction method is cumbersome and requires a familiarity with the powers of the radix being used.

Converting Fractions

We can convert fractions between different bases using methods analogous to the repeated subtraction and division-remainder methods for converting integers.

Binary numbers are often expressed in hexadecimal and sometimes octal to improve their readability. A group of 4 bits is easily recognized as a hexadecimal digit. Similarly, a group of 3 bits is expressible as one octal digit. Using these relationships, we can therefore convert a number from binary to octal or hexadecimal by doing little more than looking at it.

Converting between Power-of-Two Radices

Signed Integer Representation

The storage location can be as small as an 8-bit byte or as a large as several words, depending on the programming language and the computer system. There are three commonly used approaches. The most intuitive methods, signed magnitude.

Note: A simple rule for detecting an overflow condition: If the carry into the sign bit equals the carry out the bit, no overflow has occurred. If the carry into the sign bit is different from the carry out of the sign bit, overflow has occurred.

Has a sign as its left-most bit, also referred to as the high order bit or the most significant bit, while the remaining bits represent the magnitude or absolute value of the numeric value. Computers must be able to perform arithmetic calculations on integers that are represented using this notation. The sign of the result is the same as the sign of the operand with the larger magnitude, and the magnitude must be obtained by subtracting the smaller one from the larger one.

Signed Magnitude

Number theorists have known for hundreds of years that one decimal number can be subtracted from another by adding the difference of the subtrahend from all nines and adding back a carry. This is called taking the nine’s complement of the subtrahend, or more formally, finding the diminished radix complement of the subtrahend.

Complement Systems

The diminished radix complement of a number is base 10 is found by subtracting the subtrahend from the base minus one, which is 9 in decimal.

It is important to note at this point that although we can find the nine’s complement of any decimal number or the one’s complement of any binary number, we are most interested in using complement notation to represent negative numbers.

One’s complement

Two’s complement

Is an example of a radix complement. Given a number N in base r having d digits, the radix complement of N is defined to be r^d –N for N not 0 and for N =0. The radix complement is often more intuitive than the diminished radix complement. Since the subtrahend is incremented at the outset, there is no end carry around to worry about. We simply discard any carries involving the high order bits.

Floating point numbers consist of three parts: a sign bit, an exponent part, and a fractional part called significant. The number of bits used for the exponent and significant depend on whether we would like to optimize for range or precision. For the remainder of this section, we will use a 14 bit model with a 5 bit exponent, an 8 bit significant and a sign bit. The idea behind using a bias value is to convert every integer in the range into a non negative integer, which is then stored as a binary numeral. The integers in the desired range of exponents are first adjusted by adding of the range of possible values that we select to represent zero.

Floating-Point Representation

if we want to add two decimal numbers that are expressed in scientific notation, we would change one of the numbers so that both of them are expressed in the same power of the base.

Floating point Arithmetic

When we call upon our computers to carry out floating point calculations, we are modeling the infinite system of real numbers in a finite system of integers. What we have, in truth, is an approximation. Floating point errors can be blatant, subtle, or unnoticed. The blatant errors, such as numeric overflow or underflow, are the ones that cause programs to crash. Subtle errors can lead to wildly erroneous results that are often hard to detect before they cause real problems.

Floating point errors

IEEE–754 floating point standard:

  • Single-precision standard uses an excess 127 bias over an 8 bit exponent. The significant is 23 bit, total 32 bits.
  • Double-precision numbers use a signed 64 bit word consisting of an 11 bit exponent and 52 bit significant. The bias is 1023.

IEEE–754

Character Codes

Is a numeric coding system used primarily in IBM mainframe and midrange systems. As its name implies,

BCD encodes each digit of a decimal number to a 4 bit binary form. When stored in an 8 bit byte, the

upper nibble is called the zone and the lower part is called the digit.

Binary coded decimal

This code was severely limited in how it could represent and manipulate data; in fact, lowercase letters

were not part of its repertoire. The designers of the system/360 needed more information processing

capability as well as a uniform manner in which to store both numbers and data.

EBCDIC

ASCII

Is a direct descendant of the coding schemes used for decades by teletype devices. These devices used a 5 bit code that was derived from the Baudot code, which was invented in the 1880s. ASCII defines codes for 32 control characters, 10 digits, 52 letters, 32 special characters, and the space character. To allow compatibility with telecommunications equipment, computer manufactures gravitated toward ASCII code.

Is a 16 bit alphabet that is downward compatible with ASCII and the Latin-1 character set. It has the capacity to encode the majority of characters used in every language of the world. Unicode, also defines an extension mechanism that will allow for the coding of an additional million character.

Unicode

Codes for Data Recording and Transmission

The simplest data encoding method. We use this code implicitly when we say that “highs” and “Lows” represent ones and zeros: ones are usually high voltage, and zeros are low voltage. Typically, high voltage is positive 3 or 5; low voltage is negative 3 or 5 volts.

Non return to zero code

Addresses part of the problem of synchronization loss. NRZI provides a transition either high to low or low to high for each binary one, and no transition for binary zero. Although NRZI eliminates the problem of dropping binary ones, we are still faced with the problem of long strings of zeros causing the receiver or reader to drift out of phase, potentially dropping bits along the way.

Non return to zero invert encoding

PM provides a transition for each bit, whether a one or a zero. In PM, each binary one is signaled by an “up” transition, and a binary zero with a “down” transition. Extra transitions are provided at bit cell boundaries when necessary. Phase modulation is often used in data transmission applications such as local area networks.

Phase Modulation (Manchester coding)

Is similar to phase modulation in that at least one transition is supplied for each bit cell. These synchronizing transitions occur at the beginning of each bit cell. To encode a binary one, an additional transition is provided in the center of the bit cell

Frequency Modulation

RLL is a coding method in which block character code words such as ASCII or EBCDIC are translated into code words specially designed to limit the number of consecutive zeros appearing in the code. An RLL code allows a minimum of d and a maximum of k consecutive zeros to appear between any pair of consecutive ones. However, because RLL is coded using NRZI on the disk, RLL-code data actually occupies less space on magnetic media because fewer flux transitions are involved.

Run length limited code

Error Detection and Correction

Checksums are used in a wide variety of coding systems, from bar codes to International Standard Book Numbers. These are self-checking codes that will quickly indicate whether the preceding digits have been misread. CRC is a type of checksum used primarily in data communications that determines whether an error has occurred within a large block or stream of information bytes. The larger the block to be checked, the larger the checksum must be to provide adequate protection. The group of error checking bits is called a syndrome.

Cyclic redundancy check

Data communications channels are simultaneously more error prone and more tolerant of errors than disk systems. In data communications, it is sufficient to have only the ability to detect errors. A disk can sometimes be the sole repository of a financial transaction, or other collection of nonreproductible real time data. Storage devices and memory must therefore have the ability to not only detect but to correct a reasonable number of errors. Error recovery coding has been studied intensively over the past century. One of the most effective codes and the oldest is hamming code. Hamming codes are used in situations where random errors are likely to occur. With random errors, we assume each bit failure has a fixed probability of occurrence independent of other bit failures.

When mentioned that Hamming codes use parity bits, also called check bits or redundant bits. The memory word itself consist of m bits, but r redundant bits are added to allow for error detection and/or correction.

Given an algorithm for computing check bits, it is possible to construct a complete list of legal code words. The smallest Hamming distance found among all pairs of code words in this code is called the minimum Hamming distance for the code.

Hamming Codes

Hamming codes work well in situations where one can reasonably expect errors to be rare events. Fixed magnetic disk drives have errors ratings on the order of 1 bit in 100 million. The 3 bit hamming code will easily correct this type of error. However, Hamming codes are useless in situations where there is a likelihood that multiple adjacent bits will be damaged. These kinds of errors are called burst errors. RS(n.k) codes are defined using the following parameters: s = The number of bits in a character, k = The number of s-bit characters comprising the data block, and n = The number of bits in the code word. Despite the daunting algebra behind them, Reed-Soloman error correction algorithms lend themselves well to implementation in computer hardware.

Reed Soloman

Learn more about creating dynamic, engaging presentations with Prezi