Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

Barriers of Hardware Development

New materials and ideas
by

Vladyslav Babych

on 3 December 2013

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Barriers of Hardware Development

Barriers of Hardware Development
New materials and ideas
Vladyslav Babych

Cracow University of Economics
Applied Informatics

Prepared in the course
Modern Trends of Computer Science
Table of Contents
Why do we need computing performance?
Moore's Law
CMOS scaling
Can it continue forever?
Part II. CMOS solutions
Part I. Moore's Law and CMOS Scaling barriers
Multi-core Scaling and the problem of “Dark Silicon”
Application-Specific Integrated Circuits and Reconfigurable Computing
General-Purpose Graphics Processing Units
Probabilistic computing
Near-threshold computing (NTC)
Table of Contents (cont.)
Optical chips
Nano technologies
Magnetic Structures (e.g. spin transistors)
Biological Computation
Quantum Computation
Part III. Non-CMOS solutions and radically new technologies
Moore's Law and CMOS scaling barriers
Part I
It affects every imaginable aspect of our lives

education
government
business
medicine
social interactions
entertainment
Why do we need increasing computing performance?
Faster computers create an ability to build completely new things, which were not feasible before.
Why do we need increasing computing performance? (cont.)
smartphones
MP3 players
GPS devices
MRI scanners
Why should this pattern stop?
In 1965 Gordon Moore has made a prediction that the number of transistors on a chip will double once every two years.
Moore's Law
On 13 April 2005, Gordon Moore himself stated in an interview that the law cannot be sustained indefinitely: “
It can't continue forever. We have another 10 to 20 years before we reach a fundamental limit
.”.
CMOS Scaling
1960’s - introduction of a transistor
-smaller, lower cost, lower power and greater reliability (compared to vacuum tubes)
1960’s-1970s - integrated circuits (IC)
late 1970s - bipolar transistors were replaced by NMOS (N-channel) transistors
1980s - CMOS (Complementary Metal-Oxide Semiconductor) technology.
CMOS Scaling (cont.)
If the inputs were stable, the circuit would not practically lose any energy.
A further advantage of CMOS gates was that their performance and power were completely determined by the MOS transistor properties.
Robert Dennard in a 1974 paper showed that the MOS transistor has a set of very convenient scaling properties.
It is often refereed to as "Dennard Scaling".
Can it continue forever?
Clock frequencies were increasing faster than the basic assumption in "Dennard scaling".
Non-perfect scaling of interconnecting wires between CMOS devices
By the early 2000s, processors had attained power dissipation levels that were becoming difficult to handle, so processor power started to level out. Single processor performance improvements began to slow.
Can it continue forever? (cont.)
130nm node - classical "Dennard Scaling" was not influential
CMOS solutions
Part II
The solution to the power dissipation problem is to use multiple parallel processors.

Multi-core Scaling and the problem of “Dark Silicon”
Consumed power
= energy per instruction * performance (instructions per second)
Multi-core processors reduce energy dissipation per instruction by using less aggressive processor-core design and at the same time use multiple-processor cores to improve and scale overall chip performance.
Challenges
:

Multi-core Scaling and the problem of “Dark Silicon”
Cannot be programmed the same way the single-core processors
Leakage current and the problem of "
Dark Silicon
"
The gap between an amount of transistors can be put into the chip vs the amount you can be used simultaneously given a power budget.
Multi-core Scaling and the problem of “Dark Silicon”
"Dark Silicon" problem visualization
Any light can be turned on, but not all of them simultaneously, or the chip will overheat and burn out.
Multi-core Scaling and the problem of “Dark Silicon”
Conclusions
Scientists in 2012 paper "
Dark Silicon and the end of multi-core scaling
" had modeled this problem. They drew up two conclusions:
Pessimistic
huge performance improvement cannot be sustained with multi-core scaling as the main driver
at 22 nm, 21% of the chip will be "dark", at 8 nm, as much as 50% cannot be utilized
the transition to a new technology must occur in a few years to sustain the flow of Moore's Law
Multi-core Scaling and the problem of “Dark Silicon”
Conclusions
Optimistic
if energy-efficiency breakthroughs will be made on process scaling and supply voltage, the potential for performance improvement with parallelism will be high.
My opinion
Neither CMOS nor chip multiprocessors can overcome the power limits facing modern computer systems for a long time. The growth in the performance of computer systems will become limited by their power and thermal requirements within the next decade
Application-Specific Integrated Circuits and Reconfigurable Computing
The solution to energy efficiency problem may be creating more application-optimized processing units.
Example: digital watch, where hardware is specifically tuned for the purpose

Application-Specific Integrated Circuits and Reconfigurable Computing (cont.)
Researchers at Lawrence Berkeley National Laboratory were constructing peta-scale supercomputers to model climate change. They stated that designing a specialized supercomputer based on highly efficient customizable embedded processors is very efficient in term of power cost
supercomputer with custom chips - consumed 2.5 MW (mega watts) of power.
supercomputer built using general-purpose AMD processors - 180 MW of power.
Application-Specific Integrated Circuits and Reconfigurable Computing (cont.)
However, it is very expensive to engineer such systems.
It limits the range of market segments - it is hard to justify initial engineering investment.
The solution -
Reconfigurable Computing
It is like a bridge between application-specific hardware and general purpose microprocessors
Application-Specific Integrated Circuits and Reconfigurable Computing (cont.)
Reconfigurable systems use
runtime

reconfiguration
of the hardware in order to perform some specific task.
the computational performance of custom hardware

the flexibility of a general-purpose microprocessor (capability to re-program the underlying hardware to implement different circuits)
However this advantages are obtained at the cost of an expensive compilation process
Application-Specific Integrated Circuits and Reconfigurable Computing (cont.)
Limitations:
large silicon area and performance overhead when compared with Application-Specific Integrated Circuit technology.

the structure of the reconfigurable fabric and the interfaces between the fabric, processor and memory must be very efficient.

complicated and expensive compilation process
Application-Specific Integrated Circuits and Reconfigurable Computing (cont.)
Conclusion:
Researchers in 2008 paper “
The Promise of High-Performance Reconfigurable Computing
” revealed that High-Performance Reconfigurable Computers can achieve up to
four orders of magnitude improvement in performance
, up to
three orders of magnitude reduction in power consumption
, and
two orders of magnitude savings in cost and size requirements
compared with contemporary microprocessors.
The technology of Reconfigurable Computing looks very promising to help to remedy many modern processor production issues and serves as a bridge between general-purpose and application-specific circuits. However, it is no widespread yet and a lot of challenges must be solved.
General-Purpose Graphics Processing Units
The modern GPU is not only a powerful graphics engine but also a highly parallel programmable processor featuring peak arithmetic and memory bandwidth
The idea is to offload compute-intensive portions of the application to the GPU, while the remainder of the code still runs on the CPU.
General-Purpose Graphics Processing Units
From a user's perspective, applications simply run significantly faster.
General-Purpose Graphics Processing Units
GPU’s are designed for a specific class of applications with distinct characteristics:
Computational requirements are large
Parallelism is substantial
Throughput is more important than latency
GPU consists of thousands of smaller, more efficient cores designed for handling multiple tasks simultaneously.
Also, general-purpose GPU's are playing an increasing role in scientific computing applications. (e.g. protein folding simulations)
General-Purpose Graphics Processing Units
Conclusion
The importance of GPU computing is rising with time; however there are some problems which need to be addressed.
finding an application that will drive purchase of millions of general-purpose GPU's
need for standard tools, languages and API's which work across GPU's of multiple vendors
Probabilistic computing
The idea is to use circuits which are manufactured to be imprecise on purpose.It is borrowed from biological systems, which hold the key to ultra-powerful computing using just microwatts of power.
Major goals:
power savings and simplified design
Today, a great deal of work is done to ensure that circuits will return the proper results every single time. Since chips are extremely complex and defect densities are extremely difficult to control, engineers compensate with additional circuitry that adds die size and reduces the performance and increases power consumption.
Probabilistic computing (cont.)
Solution
- intentionally create imperfect designs, but these imperfections are placed very carefully in areas where humans can control the final output.
The
issue
with imprecision is that the chip has to be able to differentiate between a “good enough answer” and a “wrong answer”.
Applications where "good enough" is unacceptable: GPS navigation, scientific computation, autopilots etc.
However it is useful in audio/video playback, gaming, web browsing etc.
Probabilistic computing (cont.)
At the ACM International Conference in 2012, researchers from Rice University had presented a first probabilistic microprocessor prototype
This an example of how this prototype renders images compared to a conventional image (on the far left).
The middle image is rendered with an allowed error rate of 0.54% and the far right with 7.54%.
The chip on the right had only used 1/15 of the power of the traditional chip (on the left)
Probabilistic computing (cont.)
Conclusion
The ability to be elegantly wrong is how our brains operate. We can tolerate a high signal-to-noise ratio while still seeing patterns or events in the visual field. Duplicating some of that ability in silicon could be critical to designing computer systems of the future that use less power, while still being capable of delivering desired results.This technology is still in its infancy and a lot of work has to be done
Near-Threshold computing
The main idea is that the processor operates at or near its threshold voltage. For example, by reducing supply voltage from a nominal 1.1 V to 400-500 mV, you can obtain as much as 10X energy efficiency gains
There are barriers to address:
10X or greater loss in performance;
5X increase in performance variation
5 orders of magnitude increase in functional failure of memory as well as increased logic failures.
Near-Threshold computing
Conclusion
The traditional device scaling no longer provides energy efficiency improvements, one of the solutions to this energy crisis can be the widespread application of aggressive low-voltage operation. Energy savings on the order of 10X can be achieved, with “only” a 10X degradation in performance (compared with sub-threshold operations, which have 500X decrease in performance).
However, three main barriers have to be dealt with: performance loss and variation, increased functional failure.
Non-CMOS solutions and radically new technologies
Part III
Researchers at the University of Colorado have developed a new technique that allows microprocessors to use light, instead of electrical wires, to communicate with transistors on a single chip

Optical chips
Optical communication circuits have two main advantages:
using light has the potential to be extremely energy efficient

single fiber-optic strand can carry a thousand different wavelengths of light at the same time, allowing for multiple communications to be carried simultaneously in a small space and eliminating "cross talk".
Optical chips
However photonics have to be integrated side-by-side with the electronics in order to make an optical communication an economically viable option
It is unlikely that optical chips will replace traditional electrical CMOS circuits in one moment. Photonics integrated into CMOS circuits with no process changes may provide enormous cost-benefits and advantage over traditional photonic systems for now.
Nano technologies
Nanowire Field-Effect Transistors (FET)
In order to understand the concept of Nanowire FET, it is useful to understand, how the regular FET works
Nanowire Field-Effect Transistors (FET)
Developments in nanowire growth have led to the demonstration of a wide range of nanowire materials with precise control of composition, morphology, and electrical properties. But because of their small size - single nanowires can't make an effective transistor.
French researchers have built a “gate-all-around” transistor made of 225 nanowires, each with its own 14nm-thick chromium layer that serves as a gate.
Nanowire Field-Effect Transistors (FET)
The proposed architecture offers several advantages:
better immunity to short channel effects
reduction of device-to-device variability
nanometer gate length patterning without the need for high-resolution lithography
This is a breakthrough technology and whether this effort will allow Moore’s Law to live longer and fit even more transistors on a chip depends on many factors. One of them is how efficient the fabrication process and structure will be.
Single-atom transistor
The smallest transistor ever built has been created using a single phosphorous atom by an international team of researchers at the University of New South Wales.

The tiny electronic device uses an individual phosphorus atom patterned between atomic-scale electrodes and electrostatic control gates as its active component.
The atom, shown here in the center of an image, sits in a channel in a silicon crystal (computer model)
Single-atom transistor
However the single-atom transistor does have one serious limitation: it must be kept very cold, at least as cold as liquid nitrogen (-196 C)
If a technique to contain electrons is created, building a computer which will work at room temperature may be possible. However this is a fundamental challenge for technology now.
Magnetic structures
Spin transistors
Electronic systems combining computing and storage capabilities could be realized based on magnetic structures
The control of single spins of either atoms or electrons has also been proven a promising new way to achieve electronic functionalities.
The possibility to build logic circuits with magnetic nanostructures has been demonstrated at a prototypical level.
Spin-wave devices
Spin wave devices (SWDs) are a type of magnetic logic exploiting collective spin oscillation (spin waves) for information transmission and processing.
The spin waves are generated in a magneto-electric cell which is driven by external voltage pulses. Such a cell also acts as detector and storage element.
The information is encoded into the initial phase of the spin wave. The result of computation can be stored in the magnetization or converted into the voltage pulse by the output magneto-electric cells.
Spin-wave devices
Advantages:
The ability to utilize phase in addition to amplitude for building logic devices with a fewer number of elements than required for transistor based approach
nonvolatile magnetic logic circuits
parallel data processing on multiple frequencies at the same device structure by exploiting each frequency as a distinct information channel.
Prototypes operating at room temperature and at gigahertz frequency have been already demonstrated.
Biological computation
Biological computation
Living cell can be viewed as an information processor that is extraordinarily efficient in the execution of its functions.
As is argued in a number of studies, individual living cells, such as bacteria, have the attributes of a Turing machine, capable of a general purpose computation. It can also be viewed as universal constructor in the sense of von Neumann because it manufactures copies of itself, thus a computer making computers.
Biological computation
How does the cell implement its information processing system? The cell is a very complex organism:
1)
Reproduction
: making cells by acquiring/processing information from internal storage (DNA) and utilizing the structural building blocks and energy from the nutrients.
2)
Adaptation
for survival: Acquiring/processing information from external stimuli with feedback from DNA.
3)
Extracellular communication
: Sending and receiving signals to coordinate community behavior.
Biological computation
Hardware from cells
Cellular organism as an information processor
Biological computation
Hardware from cells
1)
Logic hardware
: many proteins in living cells have transfer and processing of information as their primary function, and are therefore regarded as logic elements of the in E. coli processor.
In further example, an E.coli bacteria will be used as a building block
2) Memory Hardware: all data about structure and operation of a living cell are stored in the long DNA molecule. DNA coding uses a base-4 (quaternary) system:
adenine
(A),
cytonine
(C),
guanine
(G), and
thymine
(T).
Each state symbol (base) on the first tape forms a pair (base pair) with a complementary state symbol on the second tape: adenine forms a pair with thymine, while cytosine forms a pair with guanine. Thus, the
base pair (bp)
is a natural unit of information stored in DNA.
Biological computation
Hardware from cells
One bp equals to two bits of binary information and corresponds to approximately 0.34 nm of length along the tape, as shown in a picture:
Researches from Semiconductor Research Corporation, Research Triangle Park, USA, have made quantitative estimates for the Bio-cell and the regular Si-cell and the conclusion was that Si-cell fundamentally cannot match the Bio-cell in the density of memory and logic elements, or operational speed, or operational energy.
Biological computation
Conclusion
Perhaps, the design of nature’s information processors can inspire radical breakthroughs in inorganic information processing. However, much research and time is needed to be able to build computers with this technology industrially.
Quantum computation
Quantum computers aren't limited to two states, like regular ones, they encode information as quantum bits, or qubits, which can exist in superposition (0 and 1 at the same time). And this gives a quantum computer its superior computing power. There are a number of physical objects that can be used as a qubit – a single photon, a nucleus or an electron.
The Bloch sphere, a representation of a qubit, the fundamental building block of quantum computers
Quantum computation
Researchers at the University of New South Wales are using the outer-most electron in phosphorus as a qubit. The technology for this was described in the slides “Single-atom transistor”
How does it work?
All electrons have magnetic fields, so they are basically tiny magnets and this property is called
spin
. If you place them in a magnetic field they will align with that field.
Normally electrons have the lowest energy state, or
spin-down
.
You can also put it in another state –
spin up
, but that requires some energy, and that is the highest energy level.
So far it looks just as a classical bit, two states –
spin up
or
spin-down
(1 and 0).
Quantum computation
But the funny thing about quantum objects is that they can be in both states at once.
Before
we measure the spin, the electron can exist in what’s called
superposition
, where the coefficients shown in the picture below indicate the exemplary relative probability of finding the electron in one state, or the other.
Quantum computation
Let's imagine two interacting qubits. Now there are four possible states of these electrons. You may think, that this is just like two classical bits
With two bits you can write 4 numbers: 00, 01, 10, and 11.
To determine the state of these two spin system, we need to get four numbers, four coefficients (Picture below)
However quantum mechanics allows superpositions of these states
Quantum computation
So this is the way to understand how just two qubits actually contain 4 bits of information.
If you keep going, what you will find is that the amount of equivalent classical information contained in N qubits is 2^N classical bits.
Although the qubits can exist in any combination of states, when they are measured, they must fall in one of the basic states and all the other information about their state before the measurement is lost.
The logic operations must be designed in such a way, that the final result is something that we are able to measure and it must be a unique state
Quantum computation
Earlier this year Google bought a quantum computer called D-Wave Two, which can perform tasks 3,600 times faster than normal computers. The company is hoping to use it to find cures for diseases, fix climate problems and help robots better understand human speech.

Conclusion
It appears that the benefits of Moore’s law will continue for some time, aided by the advent of new materials, processes, device structures and radically new technologies.
Because CMOS is a multi-billion dollar industry, it will not be replaced overnight.
There is intense ongoing research to find alternatives to CMOS technology that have the potential to extend the benefits of Moore’s law scaling for decades into the future.
Although the road ahead is not well marked, there are many indications that there are no overwhelming barriers that would prevent progress in information processing technologies in the foreseeable future.
Conclusion
Thank you!
Full transcript