Send the link below via email or IMCopy
Present to your audienceStart remote presentation
- Invited audience members will follow you as you navigate and present
- People invited to a presentation do not need a Prezi account
- This link expires 10 minutes after you close the presentation
- A maximum of 30 users can follow your presentation
- Learn more about this feature in our knowledge base article
Do you really want to delete this prezi?
Neither you, nor the coeditors you shared it with will be able to recover it again.
Make your likes visible on Facebook?
Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.
Transcript of NDM-2013
Wu et al A-TFN 2012
A Worst-Case Scenario
Shared Last-Level Cache
Flow Affinity: Core receiving the network flow
Interrupt Affinity: Core the NIC interrupts
Impact of different types of affinity
Impact of NUMA on end-to-end transfer at very high line rates
Tools and Parameters
CentOS Linux version 2.6.32-220
TCP Implementation: Cubic, HTCP and Reno all had similar results
SKB Size: Automatic (up to 100MB)
MTU: 9000 bytes
RPS/RFS/irqbalance remained off
netperf Version 2.6.0
Intel PCM Version 2.5.1
The End-System Bottleneck
The network is no longer the limiting factor
Tuning in Linux kernel
MTU at 9000 (“Jumbo Frames”)
SKB Size left variable
Tuning in the Application
Limited knowledge of I/O architecture
Test a data transfer using every combination of flow and application affinity
netperf –T option
Capture performance data with Intel’s Performance Counter Monitor
Hold all other parameters constant
Run images directly on hardware
Benefits of ESnet
Sharing Sockets, but not cores produces maximum throughput
Memory is utilized less on socket1
The NIC is directly connected to this socket
Run more statistical analyses on more data
Pinpoint the location of the end-system bottleneck in the receive and send processes
CAAD initiative: Automatically determine the optimal affinitization parameters
Do similar research and benchmarks on RDMA on NUMA end systems
Application Affinity: Core that runs the receiving application
Characterizing the Impact of
on the End-to-End Performance of
Matthew K. Farrens
CPU Clock-Network Line Rate Gap
The End-System Bottleneck
Non-Uniform Memory Access
Guy, Ben Haim, Intel, 2012
"Characterization of Input/Output Bandwidth
Performance Models in NUMA Architecture forData Intensive Applications"
"Efficient Wide Area Data Transfer Protocols for 100 Gbps and Networks and Beyond"
More Related Work
"Introspective end-system modeling to optimize the transfer time of rate based protocols"
"A Transport-Friendly NIC for Multicore Multiprocessor Systems"
ESnet's 100G Testbed
System Under Test
Tan Li, Yufei Ren, Dantong Yu, Shudong Jin,
and Thomas Robertazzi
Ezra Kissel, Martin Swany, Brian Tierney,
and Eric Pouyoul
Wenji Wu, Phil DeMar, and Matt Crawford
Vishal Ahuja, Amitabha Banerjee, Matthew Farrens, Dipak Ghosal, and Giuseppe Serazzi
Cores and Sockets
Theme & Layout
by Christina Mao