Loading presentation...

Present Remotely

Send the link below via email or IM


Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.



Characterizing the Impact of End-System Affinities on the End-to-End Performance of High-Speed Flows

Nathan Hanford

on 19 November 2013

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of NDM-2013

Receive Process
Wu et al A-TFN 2012
End-System Affinities:
A Worst-Case Scenario

Shared Last-Level Cache
Middle-Level Cache
Core 1
Lowest-Level Cache
Lowest-Level Cache
Lowest-Level Cache
Core 2
Core 4
Core 3
Flow Affinity: Core receiving the network flow
Interrupt Affinity: Core the NIC interrupts
Research Goals
Impact of different types of affinity
Impact of NUMA on end-to-end transfer at very high line rates
Tools and Parameters
CentOS Linux version 2.6.32-220
TCP Implementation: Cubic, HTCP and Reno all had similar results
SKB Size: Automatic (up to 100MB)
MTU: 9000 bytes
RPS/RFS/irqbalance remained off
netperf Version 2.6.0
Intel PCM Version 2.5.1
Performance Characteristics
The End-System Bottleneck
The network is no longer the limiting factor
Protocol processing
Acknowledging data
Tuning in Linux kernel
MTU at 9000 (“Jumbo Frames”)
SKB Size left variable
Tuning in the Application
Zero-copy (TCP_SENDFILE)
Human involvement
Limited knowledge of I/O architecture
Performance Tuning
Test a data transfer using every combination of flow and application affinity
netperf –T option
Capture performance data with Intel’s Performance Counter Monitor
Hold all other parameters constant

Experimental Method
Rapidly reconfigurable
Run images directly on hardware
“Quiet” network

Benefits of ESnet
Sharing Sockets, but not cores produces maximum throughput
Memory is utilized less on socket1
The NIC is directly connected to this socket
Run more statistical analyses on more data
Pinpoint the location of the end-system bottleneck in the receive and send processes
CAAD initiative: Automatically determine the optimal affinitization parameters
Do similar research and benchmarks on RDMA on NUMA end systems
Future Work
Application Affinity: Core that runs the receiving application
Characterizing the Impact of

End-System Affinities

on the End-to-End Performance of
High-Speed Flows

Nathan Hanford
Vishal Ahuja
Mehmet Balman
Matthew K. Farrens
Dipak Ghosal
Eric Pouyoul
Brian Tierney
CPU Clock-Network Line Rate Gap
The End-System Bottleneck
Middle-Level Cache
Ring Buffer
Receiving Application
Non-Uniform Memory Access
Guy, Ben Haim, Intel, 2012
Related Work
"Characterization of Input/Output Bandwidth
Performance Models in NUMA Architecture forData Intensive Applications"
"Efficient Wide Area Data Transfer Protocols for 100 Gbps and Networks and Beyond"
More Related Work
"Introspective end-system modeling to optimize the transfer time of rate based protocols"
"A Transport-Friendly NIC for Multicore Multiprocessor Systems"

ESnet's 100G Testbed
System Under Test
Tan Li, Yufei Ren, Dantong Yu, Shudong Jin,
and Thomas Robertazzi
Ezra Kissel, Martin Swany, Brian Tierney,
and Eric Pouyoul
Wenji Wu, Phil DeMar, and Matt Crawford
Vishal Ahuja, Amitabha Banerjee, Matthew Farrens, Dipak Ghosal, and Giuseppe Serazzi
Cores and Sockets
Different Sockets
Different Sockets
Socket 0
Socket 1
Different Sockets
Different Sockets
Socket 0
Socket 1
Theme & Layout
by Christina Mao
Full transcript