FidelityFX Stochastic Screen-Space Reflections 1.4
AMD FidelityFX Stochastic Screen Space Reflections (SSSR) is a highly optimized hierarchical screen space traversal kernel for reflections. To support glossy reflections, the ray directions are randomly jittered and the result is denoised to provide a temporally and spatially stable image
Table of contents
Introduction
FidelityFX Stochastic Screen-Space Reflections (or SSSR for short) is a technique which aims to produce high-quality screen-space reflections without the need to render additional reflection geometry passes and shading passes.
At its core, the algorithm uses a cutting-edge, hierarchical depth buffer traversal kernel that ray marches through a depth surface – originally rendered from the point of view of the main camera – and processes the results into a signal which can be composited as a reflection. SSSR accounts for the roughness of the surface, which contains the reflections. By analysing the roughness, SSSR can adjust the traversal rate from full-rate for mirror reflections, all the way down to quarter-rate for more glossy reflections.
Supported platforms
This release of FidelityFX Stochastic Screen-Space Reflections supports the following platforms:
-
Windows 10+
-
DirectX® 12
-
Vulkan® 1.x
-
Shading language and API requirements
DirectX 12 + HLSL
-
HLSL
-
CS_6_2
-
CS_6_6†
-
† CS_6_6
is used on some hardware which supports 64-wide wavefronts.
Vulkan + GLSL
-
Vulkan 1.x
-
GLSL 4.50
with the following extensions-
GL_EXT_samplerless_texture_functions
-
GL_KHR_shader_subgroup_basic
(controlled byFFX_WAVE
) -
GL_KHR_shader_subgroup_ballot
(controlled byFFX_WAVE
) -
GL_KHR_shader_subgroup_shuffle
(controlled byFFX_WAVE
)
-
Note that the GLSL compiler must also support GL_GOOGLE_include_directive
for #include
handling used throughout the GLSL shader system.
Quick start checklist
To use SSSR you should follow the steps below:
-
Double click
BuildAllNativeEffectsSolution.bat
in thesamples
directory. -
Go to the
build
directory, open theFidelityFXparty Native SDK.sln
solution and build the solution matching your API. -
Copy the API libraries
ffx_sssr_x64.lib
andffx_denoiser_x64.lib
frombin/ffx_sdk
into the folder containing a folder in your project which contains third-party libraries. -
Copy the library matching the sdk backend you want to use, e.g.:
bin/ffx_sdk/ffx_backend_dx12_x64.lib
for DirectX 12. -
Copy the following core API header files from
sdk/include/FidelityFX/
into your project:host/ffx_sssr.h
,host/ffx_types.h
,host/ffx_error.h
,host/ffx_util.h
,gpu/sssr/ffx_sssr_common.h
andgpu/sssr/ffx_sssr_resources.h
. Care should be taken to maintain the relative directory structure at the destination of the file copying. -
Copy the header files for the API backend of your choice, e.g. for DirectX 12 you would copy
host/backends/dx12/ffx_dx12.h
. Care should be taken to maintain the relative directory structure at the destination of the file copying. -
Include the
host/ffx_sssr.h
header file in your codebase where you wish to interact with SSSR. -
Create a backend for your target API. E.g. for DirectX 12 you should call
ffxGetInterfaceDX12
. A scratch buffer should be allocated of the size returned by callingffxGetScratchMemorySizeDX12
and the pointer to that buffer passed toffxGetInterfaceDX12
. -
Create a SSSR context by calling
ffxSssrContextCreate
. The parameters structure should be filled out matching the configuration of your application. See the API reference documentation for more details. -
Each frame you should call
ffxSssrContextDispatch
to launch SSSR workloads. The parameters structure should be filled out matching the configuration of your application. See the API reference documentation for more details. -
When your application is terminating (or you wish to destroy the context for another reason) you should call
ffxSssrContextDestroy
. The GPU should be idle before calling this function.
Integration guidelines
Input Resources
The following table enumerates all external inputs required by either SSSR or the FidelityFX Denoiser.
All resources are from the current rendered frame, for DirectX 12 and Vulkan applications all input resources should be transitioned to D3D12_RESOURCE_STATE_NON_PIXEL_SHADER_RESOURCE
and VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL
respectively before calling ffxSssrContextDispatch
.
Name |
Format |
Type |
Notes |
---|---|---|---|
Color buffer |
|
Texture |
The HDR render target of the current frame containing the scene lighted with direct lighting only. SSSR takes care of indirect lighting, including environment map / probe sampling fallback. |
Depth buffer |
|
Texture |
The depth buffer for the current frame provided by the application. The data should be provided as a single floating point value, the precision of which is under the application’s control. The configuration of the depth should be communicated to SSSR via the |
Normal buffer |
|
Texture |
The normal buffer for the current frame provided by the application in the [-1, 1] range. If your application stores normal vectors with a different range, you may use the |
Material parameters buffer |
|
Texture |
The roughness buffer for the current frame provided by the application. By default, SSSR expects the roughness to be the perceptual / artist set roughness squared. If your Gbuffer stores the artist set roughness directly, please set the |
Motion vectors |
|
Texture |
The 2D motion vectors for the current frame provided by the application in the [-0.5, 0.5] range, +Y is top-down. If your application renders motion vectors with a different range, you may use the |
Environment Map |
|
TextureCube |
A texture cube used as a fallback when the surface to be shaded is rougher than a certain threshold or the screen space ray doesn’t hit the Color Buffer. |
BRDF LUT |
|
Texture |
The BRDF look up table to be used in conjunction with the prefiltered environment map used as fallback. |
Reflections Target |
|
Texture |
The surface to write the output of the SSSR algorithm to; it has to be cleared prior to rendering. This reflections buffer can then be composited on top of any other surface by the application in a later pass. ( See the apply_reflections shader in the sample ) |
Depth buffer configurations
An application should inform the SSSR API of its depth buffer configuration by setting the appropriate flags during the creation of the FfxSssrContext
. The table below contains the appropriate flags.
SSSR flag |
Note |
---|---|
|
A bit indicating that the input depth buffer data provided is inverted, as in the [1..0] range. |
Providing motion vectors
Space
The SSSR algorithm in itself doesn’t require the use of motion vectors as it is not a temporal algorithm. However, in order to get temporally and spatially stable reflections, the FidelityFX Reflections Denoiser is used on the output of the SSSR algorithm. A key part of a temporal algorithm is the provision of motion vectors. The FidelityFX Reflections Denoiser accepts motion vectors in 2D which encode the motion from a pixel in the current frame to the position of that same pixel in the previous frame. The algorithm expects that motion vectors are provided by the application in the [-0.5, 0.5] range, +Y is top-down.
If your application computes motion vectors in another space – for example normalized device coordinate space – then you may use the motionVectorScale
field of the FfxSssrDispatchDescription
structure to instruct SSSR to adjust them to match the expected range for the denoiser. The code examples below illustrate how motion vectors may be scaled to screen space. The example HLSL and C++ code below illustrates how NDC-space motion vectors can be scaled using the SSSR host API.
// GPU: Example of application NDC motion vector computation
float2 motionVector = (currentPosition.xy / currentPosition.w) - (previousPosition.xy / previousPosition.w);
// CPU: Matching SSSR motionVectorScale configuration
dispatchParameters.motionVectorScale.x = 0.5f; // Texture space is [0, 1] while NDC is [-1, 1]
dispatchParameters.motionVectorScale.y = -0.5f; // +Y is top down
Host API
While it is possible to generate the appropriate intermediate resources, compile the shader code, set the bindings, and submit the dispatches, it is much easier to use the SSSR host API which is provided.
To use to the API, you should link the SSSR libraries (more on which ones shortly) and include the following header files from sdk/include/FidelityFX/host
:
To use the SSSR API, you should link ffx_sssr_x64.lib
and ffx_denoiser_x64.lib
from the sdk/bin/ffx_sdk
folder which will provide the symbols for the application-facing APIs. However, the SDK’s API has a modular backend, which means that different graphics APIs and platforms may be targeted through the use of a matching backend. Therefore, you should further include the backend lib matching your requirements, referencing the table below.
Target |
Library name |
---|---|
DirectX(R)12 |
|
Vulkan(R) |
|
Please note the modular architecture of the SDK API allows for custom backends to be implemented. See the Modular backend section in the FSR2 documentation for more details.
To begin using the API, the application should first create a FfxSssrContext
structure. This structure should be located somewhere with a lifetime approximately matching that of your backbuffer; somewhere on the application’s heap is usually a good choice. By calling ffxSssrContextCreate
the FfxSssrContext
structure will be populated with the data it requires. Moreover, a number of calls will be made from ffxSssrContextCreate
to the backend which is provided to FfxSssrContext
as part of the FfxSssrContextDescription
structure. These calls will perform such tasks as creating intermediate resources required by SSSR and setting up shaders and their associated pipeline state. The SSSR API does not perform any dynamic memory allocation.
ffxSssrContextDispatch
should be called each frame. This function accepts the FfxSssrContext
structure that was created earlier in the application’s lifetime as well as a description of the inputs and parameters to be used to compute screen space reflections. This description is provided by the application filling out a FfxSssrDispatchDescription
structure.
Destroying the context is performed by calling ffxSssrContextDestroy
. Please note, that the GPU should be idle before attempting to call ffxSssrContextDestroy
, and the function does not perform implicit synchronization to ensure that resources being accessed by SSSR are not currently in flight. The reason for this choice is to avoid SSSR introducing additional GPU flushes for applications who already perform adequate synchronization at the point where they might wish to destroy the FfxSssrContext
, this allows an application to perform the most efficient possible creation and teardown of the SSSR API when required.
The technique
Algorithm structure
The SSSR algorithm is implemented in a series of stages, which are as follows:
-
Hierarchical depth generation
-
Tile classification
-
Blue noise texture generation
-
Indirect arguments generation
-
Intersection
-
Denoising
Hierarchical depth generation
The hierarchical depth generation pass makes use of the FidelityFX Single Pass Downsampler (SPD) to generate a pyramid of mip maps from the scene depth provided by the application. This hierarchical depth buffer is used by SSSR to accelerate the raymarching when the algorithm detects that a coarser mip can be used.
Resource Inputs
The following table contains all resources consumed by the Hierarchical depth generation stage.
Name |
Format |
Type |
Notes |
---|---|---|---|
Depth buffer |
|
Texture |
The depth buffer for the current frame provided by the application. The data should be provided as a single floating point value, the precision of which is under the application’s control. The configuration of the depth should be communicated to SSSR via the |
Resource outputs
The following table contains all resources produced or modified by the Hierarchical depth generation stage.
Name |
Format |
Type |
Notes |
---|---|---|---|
Depth Hierarchy |
|
Texture |
A pyramid of 7 mip maps generated from the scene input depth. |
Tile classification
The tile classification pass scans the scene to detect which pixels require shooting rays and applying denoising. For the pixels that are too rough and don’t require raymarching it will evaluate lighting using the fallback environment map provided. Finally, this pass also extracts the roughness from the material parameters texture to use in the next passes.
Resource Inputs
The following table contains all resources consumed by the Tile Classification stage.
Name |
Format |
Type |
Notes |
---|---|---|---|
Normal buffer |
|
Texture |
The normal buffer for the current frame provided by the application in the [-1, 1] range. If your application stores normal vectors with a different range, you may use the |
Material parameters buffer |
|
Texture |
The roughness buffer for the current frame provided by the application. By default, SSSR expects the roughness to be the perceptual / artist set roughness squared. If your Gbuffer stores the artist set roughness directly, please set the |
Environment Map |
|
TextureCube |
A texture cube used as a fallback when the surface to be shaded is rougher than the |
Depth Hierarchy |
|
Texture |
A pyramid of 7 mip maps generated from the scene input depth. |
Variance history buffer |
|
Texture |
The variance of the luminance buffer generated by the previous frame. The values sampled from this buffer are compared to the value of the |
Resource outputs
The following table contains all resources produced or modified by the Tile Classification stage.
Name |
Format |
Type |
Notes |
---|---|---|---|
Radiance Buffer |
|
Texture |
A texture containing the radiance from the fallback environment probe for pixels which roughness is greater than the |
Ray List |
|
Buffer |
A buffer to store the list of rays to shoot in the intersection pass. The field |
Denoiser Tile List |
|
Buffer |
A buffer containing the list of tiles to be passed to the denoiser. Each tile is 8×8 pixels and identified by its top-left corner thread id. |
Ray Counter |
|
Buffer |
An atomic buffer used to store the number of rays to be shot in the Intersection pass along with the number of tiles to be denoised by the denoiser. |
Extracted Roughness |
|
Texture |
A texture containing the roughness extracted from the material parameters buffer, see inputs |
Description
The Tile Classification stage is implemented in a fullscreen compute pass. For each pixel, we decide whether a ray needs to be shot based on multiple parameters and store that ray in a buffer.
If the surface’s roughness is greater than the roughnessThreshold
parameter, the classifier will not store a ray for this pixel. Instead, it will simply evaluate the fallback environment map provided as input and store that value in the radiance buffer. By default, not every pixel in a quad will shoot a ray. The minimum number of rays shot per pixel is controlled by the samplesPerQuad
parameter. If this parameter is set to lower than 4, we encode within the ray data which neighbor pixels should copy the result over. Setting the parameter temporalVarianceGuidedTracingEnabled
to true
lets the tile classifier dynamically increase the number of rays per quad up to 4 based on the luminance of the previous frame. If the variance of the luminance stored in the variance history buffer is greater than the varianceThreshold
parameter, then a ray is stored for this pixel.
This pass also determines which pixels will require denoising and stores this information in tiles of 64 in the denoiser tile list. Every pixel will use denoising unless the surface is perfectly smooth (mirror-like). This is especially useful for the pixels that did not shoot a ray but instead were tagged to copy the result from a neighboring pixel.
The ray counter buffer stores the number of rays to be shot by the Intersection pass and the number of tiles to pass to the denoiser.
Blue noise texture generation
The blue noise texture generation stage generates a 128×128 blue noise texture every frame based on the frame index and some precomputed textures.
Resource Inputs
The following table contains all resources consumed by the Blue noise texture generation stage.
Name |
Format |
Type |
Notes |
---|---|---|---|
Sobol Buffer |
|
Texture |
A 256×256 precomputed texture used to generate the blue noise texture. |
Scrambling Tile buffer |
|
Texture |
A 512 x 256 precomputed texture used to generate the blue noise texture. |
Resource outputs
The following table contains all resources produced or modified by the Blue noise texture generation stage.
Name |
Format |
Type |
Notes |
---|---|---|---|
Blue noise texture |
|
Texture |
A 128×128 blue noise texture used for ray generation in a later pass |
Indirect arguments generation
The indirect argument generation pass makes use of the ray counter buffer filled by the Tile Classification stage to generate indirect dispatch arguments for the [Intersection Pass](intersection-pass) and the Denoising pass.
Resource Inputs
The following table contains all resources consumed by the Indirect arguments generation stage.
Name |
Format |
Type |
Notes |
---|---|---|---|
Ray Counter |
|
Buffer |
An atomic buffer used to store the number of rays to be shot in the Intersection pass along with the number of tiles to be denoised by the denoiser. |
Resource outputs
The following table contains all resources produced or modified by the Indirect arguments generation stage.
Name |
Format |
Type |
Notes |
---|---|---|---|
Indirect Arguments |
|
Buffer |
A buffer containing the indirect dispatch arguments for the Intersection pass and the denoiser passes. |
Intersection pass
The intersection pass does the actual depth buffer ray marching and radiance evaluation. This is the last pass of the algorithm pre-denoising and it outputs the reflections buffer to be composited on top of the direct lighting by the app.
Resource Inputs
The following table contains all resources consumed by the Intersection stage.
Name |
Format |
Type |
Notes |
---|---|---|---|
Color buffer |
|
Texture |
The HDR render target of the current frame containing the scene lighted with direct lighting only. SSSR takes care of indirect lighting, including environment map / probe sampling fallback. |
Normal buffer |
|
Texture |
The normal buffer for the current frame provided by the application in the [-1, 1] range. If your application stores normal vectors with a different range, you may use the |
Environment Map |
|
TextureCube |
A texture cube used as a fallback when the intersection fails. |
Depth Hierarchy |
|
Texture |
A pyramid of 7 mip maps generated from the scene input depth. |
Extracted Roughness |
|
Texture |
A texture containing the roughness extracted from the material parameters buffer. |
Blue noise texture |
|
Texture |
A 128×128 blue noise texture used to randomize ray generation. |
Resource outputs
The following table contains all resources produced or modified by the Intersection stage.
Name |
Format |
Type |
Notes |
---|---|---|---|
Radiance Buffer |
|
Texture |
A texture containing the result of the stochastic screen space reflection. |
Ray List |
|
Buffer |
A buffer containing the list of rays to shoot filled by the Tile Classification pass. |
Ray Counter |
|
Buffer |
A buffer containing the number of rays to be shot. |
Description
The Intersection stage is implemented as an indirect dispatch call, spawning one thread per ray. Each thread recovers the coordinates of the pixel to shoot a ray from and whether the result of that query should be copied over to some neighbors. The actual ray is then generated by importance sampling the GGX normal distribution, using a random seed from blue noise texture generated in a previous pass.
Next step is the raymarching across the hierarchical depth buffer. We first sample the depth buffer at the current mip level and use that value to generate a safe region for which we can ray march without intersecting the geometry. If the xy boundary is hit, we can keep going using a coarser mip level. On the other hand, if the z boundary is crossed, we should move to a more detailed mip. Once the most detailed mip has been reached, the algorithm stops.
Finally, once a hit is found, it is evaluated and given a confidence level. This level is used to interpolate between the ray marching result and a sample from the environment map ensuring a smooth transition at the edges.
Denoising pass
See the FidelityFX Reflections Denoiser page.
Building the sample
Prerequisites
To build the SSSR sample, please follow the following instructions:
-
Install the following tools:
-
Install the “Desktop Development with C++” workload
-
Generate the solutions:
-
Open the solution from the
build
directory, compile and run.
Version history
Version |
Date |
Notes |
---|---|---|
1.0 |
2020-05-11 |
Initial release of FidelityFX SSSR. |
1.1.0 |
2020-08-28 |
Vulkan Support |
1.2.0 |
2020-11-24 |
Extracted Denoiser to its own library |
1.2.1 |
2022-09-05 |
Fixed issue with Vulkan |
1.3.0 |
2021-09-05 |
Update to support hybrid tracing. |
1.4.0 |
2023-05-26 |
FidelityFX SDK release of SSSR |
Further reading
References
-
Frostbite presentations on Stochastic Screen Space Reflections – https://www.ea.com/frostbite/news/stochastic-screen-space-reflections
-
EA Seed presentation on Hybrid Real-Time Rendering – https://www.ea.com/seed/news/seed-dd18-presentation-slides-raytracing
-
Eric Heitz’ paper on VNDF – http://jcgt.org/published/0007/04/01/
-
Eric Heitz’ paper on Blue Noise sampling – https://eheitzresearch.wordpress.com/762-2/