An investigation into the techniques used in state of the art videogame shadows.
God of War III : Shadows
These are really crisp and clean shadows based on fairly standard implementation of cascaded shadow maps with PCF filtering but has some great techniques for culling non-shadowed parts of the scene. They obtain really high quality shadows by using a multipass deferred "tiling" approach where they supersample the shadows in the nearest cascade. This works really well, is flexible and is made possible by the culling techniques utilizing the PS3 depth bounds test. Very cool stuff.
Unfortunately, the depth bounds test that the God of War III shadows depend upon so heavily is an Nvidia extension. The Nvidia depth bounds test utilizes hierarchical z so it is very fast. On hardware that doesn't support the hi-near-z and hi-far-z extension it should be possible to use hi-stencil by rendering view aligned boxes.
http://www.gdcvault.com/play/1012341/God-of-War-III
Algorithm
Parallel light sources for all shadows for single matrix multiply with no w divide.
float3 shadowPos = pixelPos * posToShadowMatrix;Holy Grail = 1 filtered texel per pixel
"White Buffer" used to store full screen shadow buffer data for up to 4 lights. Shadow map cascades are no longer sampled in the opaque rendering pass. Renders each cascade's shadow map into the WB. The WB is then sampled once in the opaque pass.
Min blending is used to render the cascades to the WB in an order independent manner.
ZCull Unit is used to minimize the cost of full screen cascade passes. ZCull unit has conservative ZNear and conservative ZFar for "depth bounds test".
As the figure below illustrates, we are only interested in the areas of the screen that have depth values which fall within the z near and z far values.
So, the rendering process proceeds (for a 3 cascade shadow) as:
ZPrePass
Cascade 2 Shadow Map
Render Cascade 2 to WB
Cascade 1 Shadow Map
Render Cascade 1 to WB
Cascade 0 Shadow Map
Render Cascade 0 to WB
Opaque
This deferred approach can be used to apply different settings to each cascade. Eg: different sampling quality, different resolutions. It can also be used to tile the cascades and increase the effective resolution of the cascades. For example, we might want to double the effective resolution of cascade 0 (the nearest cascade) by rendering it as 4 x 1024 x 1024 tiles rather than just 1 x 1024 x 1024 tile.
ZPrePassThat is a lot of rendering passes that can only be fast if we apply some aggressive optimizations.
Cascade 2 Shadow Map
Render Cascade 2 to WB
Cascade 1 Shadow Map
Render Cascade 1 to WB
Cascade 0 Shadow Map (TILE 0)
Render Cascade 0 to WB (TILE 0)
Cascade 0 Shadow Map (TILE 1)
Render Cascade 0 to WB (TILE 1)
Cascade 0 Shadow Map (TILE 2)
Render Cascade 0 to WB (TILE 2)
Cascade 0 Shadow Map (TILE 3)
Render Cascade 0 to WB (TILE 3)
Opaque
Using this technique the GOW team could achieve 10 megapixel resolution (approx 3600x2800) shadows for the nearest cascade.
Optimization
In most GOW scenes shadows run at 4-5 ms.
Depth reconstruction has to go away - one pass to make fp32 depth buffer. From 5 cycles to 1 texture read.
Two speed hits for shadows:
1) Rendering casters
2) Full screen passes for receivers
Cull Optimizations
Two reasons why pixel not shadowed:
1) Not in volume made by casters
2) In caster volume but not right receiver type
Receiver type culling & volume culling
Receiver Type Culling
Two kinds of receiver type culling used in GOW:
1) Baked shadow integration. Eg: ground may use baked shadows but there will be a matching hidden caster to cast dynamic shadows on dynamic shadow receivers. Hidden casters are not visible in the world but are only rendered to the cascade shadow map. Hidden casters are rendered in unique shadow map cascades.
ZCull Unit culls stencil too.
- Can cull based on depth independent criteria.
- ZPrepass can lay down scene stencil for free.
- Same reject rate as Z.
Stencil culling is used to mask out baked shadow receivers when rendering the hidden casters.
2) Minor characters cast only on geometry marked as "background". So minor characters are not self casting. The beauty of this is the more background characters there are on screen the fewer pixels are affected by shadows. This optimization appears to make more sense when you have baked shadows.
Volume Culling
CPU based algorithm, GPU results
2D Cells from 3D Cells
No more full screen passes.
CPU processing per 2D cell. We want to know 3 things for every cell:
1) Do we need to draw it?
2) How big does it really need to be?
3) What distance to camera are shadows?
(64 cells 8x8 for GOW)
Casters and receivers in the grid. What casters hit which receivers?
For each caster, the sphere bounds are stretched in the shadow dir to form a capsule. Rather than use a capsule for visibility testing this was converted to a AABB (or parallel projected frustum).
Side view of the grid showing how z near, z far are determined for a 2D screen cell. Partial use of a cell can also be determined by projecting the bounds to the 2D cell.
Summary
These are some great techniques for achieving high quality shadows. They are more applicable to a game that uses baked environment shadows and uses static environment lighting. For a game with fully dynamic lighting and shadows these techniques may not be as useful.
No comments:
Post a Comment