Loading presentation...

Present Remotely

Send the link below via email or IM


Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.


Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

GDC 2011 DirectX Day

Presentation of the notes taken during the day. It's in no way a complete summary of each presentation, just the points I found noteworthy.

Lutz Latta

on 27 April 2013

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of GDC 2011 DirectX Day

GDC 2011 DirectX Day DX11 Performance Gems Multi-threaded command buffer generation aka "Deferred Contexts" in D3D11 Use tesselation/domain shader
for more efficient particle shading Decide what attribute to calculate at what frequency Soft Particles can now bind depth buffer simultaneously for "real" depth testing and for sampling in the shader - like on consoles Slides at http://developer.amd.com/gpu_assets/GDC2011-DX11-Perf-Gems.pps Deferred Shading Optimizations Slides at http://developer.amd.com/gpu_assets/Deferred%20Shading%20Optimizations.pps AMD GPUs
Each RT adds to export cost
Avoid slow formats:
R32G32B32A32, R32G32, R32,
R32G32B32A32f, R32G32f, R16G16B16A16.
+ R32F, R16G16, R16 on older GPUs
Total export cost = (Num RTs) * (Slowest RT) nVidia GPUs
Each RT adds to export cost
RT export cost proportional to bit depth except:
Less than 32bpp same speed as 32bpp
sRGB formats are slower1010102 and 111110 slower than 8888
Total export cost = Cost(RT0)+Cost(RT1)+... Always make sure your light volumes are geometry-optimized! For both index re-use (post VS cache)
sequential vertex reads (pre VS cache) Common oversight for
algorithmically generated meshes
(spheres, cones, etc.) Use Discard() to get rid of pixels not contributing any light, regardless of the light processing method used
if ( dot(vColor.xyz, 1.0) == 0 ) discard; Firaxis LORE (Low Overhead Rendering Engine) Slides at http://developer.amd.com/gpu_assets/Firaxis%20LORE.pps D3D11 multi-threaded draw call submission "Stateless" packetized rendering All rendering state is bundled into one packet to be submitted together Entire frame is queued up before draw calls are submitted DirectX 11 Rendering in Battlefield 3 Slides at http://www.slideshare.net/DICEStudio/directx-11-rendering-in-battlefield-3 Full deferred renderer Lighting done in compute shader operates on screen space tiles Awesome "trick" Switch work domain within a single compute shader Initially one shader thread works on one pixel
Then one thread processes some subrange of lights for the whole pixel group
Then one thread works again on one pixel Compute shader threadgroup communication for the win! Terrain uses tesselation with displacement mapping Parallel draw call submission with D3D11 deferred contexts Several anti-aliasing options depending on quality settings MSAA, with per-sample evaluation (where needed) of light shaders is highest quality options
FXAA, post process like MLAA, see NVIDIA example
SRAA, another post process, used in conjuction with MSAA, see I3D 2011 paper DirectCompute Accelerated Separable Filters slides at http://developer.amd.com/gpu_assets/DirectCompute%20Accelerated%20Separable%20Filters.pps Compute shaders can communicate relatively cheaply within a thread group Image processing kernels can use thread group shared memory to avoid redundant data reads
Full transcript