Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

NDM 2014

No description
by

Christina Mao

on 18 May 2015

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of NDM 2014

Analysis of the Effect of Core Affinity
on High-Throughput Flows


Background
Our Previous Work
INTIME, CAAD and FBM
Throughput Evaluation
OProfile
Much more detailed counters
End-system introspection
Tested many counters
but CPU clock frequencies are not.
Important Trends
Network speeds are scaling up,
Nathan Hanford
Vishal Ahuja, Matthew K. Farrens, Dipak Ghosal, Mehmet Balman, Eric Pouyoul, and Brian Tierney

Network Receive Process
(Flow)
Application
Uniprocessor Design
Core
Socket
Uniprocessor Design
Core
Socket
Multi-Socket Design
North
Bridge
Application
Multicore Design
Core
Socket
Multi-Socket, Multicore Design
North
Bridge
Multicore Design
Socket
PCI
Core
Core
Core
Flow
Core
North
Bridge
Core
Core
Core
PCI
QPI
PCI
Level 2 Cache Transactions
Memory Transactions Retired
Instructions Retired
Last-Level Cache Transactions
Offcore Requests
LLC Demand Misses
CPU Unhalted ClockCycles
Cycles due to LOCK Prefixes
Hardware Interrupts
Load Blocks
Resource Stalls
Experimental Setup
40 Gbps
100 Gbps
40 Gbps
Router
Host
Host
~34 usec RTT
Intel Sandy Bridge
PCI Generation 3
Intel Sandy Bridge
PCI Generation 3
Results
Throughput
The NIC Driver
Bottleneck
The Memory Hierarchy
Bottleneck
The "Missing" Inter-Socket Communication Bottleneck
End-Systems are scaling out to multiple cores.
No tested counters account for the low throughput performance.
There is currently no method for counting QPI utilization on these systems.
Conclusion
Current Best Practice: "Brute Force" TCP Flow Striping
Spatial Locality matters in scale-out systems
Exacerbated by higher data rates
Future Work
Find the "missing" counters, if possible
Latency
Continued work on middleware tools
Applications
Acknowledgments
This research used resources of the ESnet Testbed, which is supported by the Office of Science of the U.S. Department of Energy under contract DE-AC02-05CH11231.
This research was also supported by NSF grant CNS-0917315.
Related Work graphic designed by Christina Mao
Affinity, or Core Binding, refers to which core handles a particular process.
Network Receive Process --> "Flow"
Application Processes --> "Application"
Application
Xeon E5-2667
Core 0
Socket 0
System Under Test
North
Bridge
Socket 1
PCI
Core 2
Core 1
Core 5
Flow
Core 11
North
Bridge
Core 6
Core 8
Core 9
PCI
QPI
Core 4
Core 3
Core 7
Core 10
Xeon E5-2667
NIC
Full transcript