Home » Blogs » RGA 2.9.1 adds single DirectX®12 shader compilation

DirectX®12 single shader compilation with Radeon™ GPU Analyzer (RGA) v2.9.1

GPUOpen
GPUOpen

The home for games and graphics developers. Discover how our open source tools, SDKs, and effects can help you find your best graphics performance. Learn tips and tricks with our extensive tutorials and samples.

link to RGA 2.9.1 adds single DirectX®12 shader compilation

Background

DirectX®12 requires complete pipeline state definition to compile a pipeline. This involves locating all the pipeline’s shaders, defining a root signature, and, for graphics, defining a subset of the graphics pipeline state. The need to prepare the entire graphics or compute pipeline elements upfront made the offline compilation process of DirectX12 shaders somewhat tedious. This approach could be cumbersome, particularly in scenarios where users want to compile a single shader in isolation.

RGA v2.9.1 to the rescue

RGA v2.9.1 streamlines the shader compilation experience by allowing you to compile a single D3D12 shader. When an incomplete DirectX®12 pipeline is given, RGA v2.9.1 will autogenerate the missing elements of the pipeline for you. These elements can be the root signature, the graphics pipeline state subset or even shaders in the pipeline. This feature essentially makes any input beyond the single shader that you would like to compile optional.

Usage example

Consider the following pixel shader:

Copied!

struct VsOutput
{
    float4 pos : SV_Position;
    float2 tex_coord : TEXCOORD0;
};

Texture2D<float4> texture0 : register(t0);
SamplerState sampler0 : register(s0);

float4 PsMain(VsOutput i) : SV_Target
{
    return texture0.Sample(sampler0, i.tex_coord);
}

Normally, to compile this pixel shader, you would have had to define the entire graphics pipeline state: the accompanying vertex shader, a root signature and the subset of the graphics pipeline state.

With RGA v2.9.1, you can compile that pixel shader in isolation. In terms of the command line invocation, there is no change in how you use RGA. You will use the same RGA DirectX®12 command as before, while omitting the missing pieces of the D3D12 graphics pipeline. In the example below, the pixel shader is being compiled for AMD Radeon RX 7000 series (RDNA 3 architecture) GPU:

Copied!

rga.exe -s dx12 -c gfx1100 --ps "dxc\single_shaders\classic_ps.hlsl" --ps-entry "PsMain" --all-model 6_0 --autogen-dir "C:\RGA-2.9.1\Generated" --isa "C:\RGA-2.9.1\Isa\Out.isa"

This produces the following output:

Copied!

Building for gfx1100...
Auto-generating root signature using reflection into C:\RGA-2.9.1\Generated\rga_autogen_20240415_174858_.rootsig ... success.
Auto-generating graphics pipeline state using reflection into C:\RGA-2.9.1\Generated\rga_autogen_20240415_174858_.gpso ... success.
Auto-generating vertex shader using reflection into C:\RGA-2.9.1\Generated\rga_autogen_20240415_174858_vs.hlsl ... success.
Performing front-end compilation of vertex shader through DXC...
Front-end compilation success.
Performing front-end compilation of pixel shader through DXC...
Front-end compilation success.
Performing front-end compilation of root signature through DXC...
Front-end compilation success.
Compiling graphics pipeline...
Extracting vertex shader disassembly...
vertex shader disassembly extracted successfully.
Extracting pixel shader disassembly...
pixel shader disassembly extracted successfully.
succeeded. 

RGA will detect the parts of the graphics pipeline as missing and auto-generate them via reflection.

A dedicated command line argument, --autogen-dir <folder>, has been introduced, which allows you to specify a folder in which auto-generated files will be stored. By default, these files are deleted after compilation unless otherwise specified.

In our example, RGA will automatically generate a vertex shader, a textual representation of the root signature and a .gpso file containing the subset of the graphics pipeline state. The textual representation of the root signature allows you to investigate compilation issues. It also allows you to easily tweak the auto-generated files and recompile.

RGA uses reflection to ensure that all the files it auto-generates will match your input pixel shader in terms of vertex attributes (e.g., vertex format, vertex attributes to interpolate, render targets) and resource bindings (buffers and textures used by the shader).

Auto-generated HLSL Vertex Shader:

Copied!

// Auto-generated with Radeon GPU Analyzer (RGA).

struct VsInput
{
    float4 attribute0: POSITION0;
};

struct VsOutput
{
    float4 attribute0: SV_POSITION;
    float2 attribute1: TEXCOORD0;
};

void main(VsInput input, out VsOutput output)
{
    float4 result = float4(0.0, 0.0, 0.0, 0.0);
    result += float4(input.attribute0.xyzw);
    output.attribute0 = float4(result.xyzw);
    output.attribute1 = float2(result.xy);
}

Auto-generated text-based Root Signature:

Copied!

// Auto-generated with Radeon GPU Analyzer (RGA).

#define RGA_ROOT_SIGNATURE \
    "RootFlags( ALLOW_INPUT_ASSEMBLER_INPUT_LAYOUT | DENY_HULL_SHADER_ROOT_ACCESS " \
    "| DENY_DOMAIN_SHADER_ROOT_ACCESS | DENY_GEOMETRY_SHADER_ROOT_ACCESS ), " \
    "DescriptorTable(Sampler(s0), visibility=SHADER_VISIBILITY_PIXEL), " \
    "DescriptorTable(SRV(t0), visibility=SHADER_VISIBILITY_PIXEL)"

.gpso file having the graphics pipeline state object:

Copied!

# Auto-generated with Radeon GPU Analyzer (RGA).

# schemaVersion
1.0

# InputLayoutNumElements (the number of D3D12_INPUT_ELEMENT_DESC elements in the D3D12_INPUT_LAYOUT_DESC structure - must match the following "InputLayout" section)
1

# InputLayout ( {SemanticName, SemanticIndex, Format, InputSlot, AlignedByteOffset, InputSlotClass, InstanceDataStepRate } )
 { "POSITION", 0, DXGI_FORMAT_R32G32B32A32_FLOAT, 0, 0, D3D12_INPUT_CLASSIFICATION_PER_VERTEX_DATA, 0 }

# PrimitiveTopologyType (the D3D12_PRIMITIVE_TOPOLOGY_TYPE value to be used when creating the PSO)
D3D12_PRIMITIVE_TOPOLOGY_TYPE_TRIANGLE

# NumRenderTargets (the number of formats in the upcoming RTVFormats section)
1

# RTVFormats (an array of DXGI_FORMAT-typed values for the render target formats - the number of items in the array should match the above NumRenderTargets section)
{ DXGI_FORMAT_R8G8B8A8_UNORM }

Once the missing pieces of the D3D12 graphics pipeline are auto-generated, RGA invokes the AMD Shader compiler passing in the pixel shader along with those files to compile the entire pipeline.

Compilation Workflow

Upon successful compilation, you get the relevant pixel shader disassembly:

Copied!

; D3D12 Shader Hash 0x30d77570b6e44f6c49553fb9ca32e72d
; API PSO Hash 0xd5b60f61c55df988
; Driver Internal Pipeline Hash 0xf9f385166a76e0d7
; -------- Disassembly --------------------
shader main
  asic(GFX11)
  type(PS)
  sgpr_count(14)
  vgpr_count(8)
  wave_size(64)
                                                            // s_ps_state in s0

  s_version     UC_VERSION_GFX11 | UC_VERSION_W64_BIT   // 000000000000: B0802006
  s_set_inst_prefetch_distance  0x0003                  // 000000000004: BF840003
  s_mov_b32     m0, s4                                  // 000000000008: BEFD0004
  s_mov_b64     s[12:13], exec                          // 00000000000C: BE8C017E
  s_wqm_b64     exec, exec                              // 000000000010: BEFE1D7E
  s_getpc_b64   s[0:1]                                  // 000000000014: BE804780
  s_waitcnt_depctr  depctr_vm_vsrc(0) & depctr_va_vdst(0) // 000000000018: BF880F83
  lds_param_load  v2, attr0.x wait_vdst:0               // 00000000001C: CE000002
  lds_param_load  v3, attr0.y wait_vdst:0               // 000000000020: CE000103
  s_mov_b32     s4, s3                                  // 000000000024: BE840003
  s_mov_b32     s5, s1                                  // 000000000028: BE850001
  s_mov_b32     s0, s2                                  // 00000000002C: BE800002
  s_load_b256   s[4:11], s[4:5], null                   // 000000000030: F40C0102 F8000000
  s_load_b128   s[0:3], s[0:1], null                    // 000000000038: F4080000 F8000000
  v_interp_p10_f32  v4, v2, v0, v2 wait_exp:1           // 000000000040: CD000104 040A0102
  v_interp_p10_f32  v0, v3, v0, v3 wait_exp:0           // 000000000048: CD000000 040E0103
                                                        s_delay_alu  instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_2) // 000000000050: BF870112
  v_interp_p2_f32  v2, v2, v1, v4 wait_exp:7            // 000000000054: CD010702 04120302
  v_interp_p2_f32  v0, v3, v1, v0 wait_exp:7            // 00000000005C: CD010700 04020303
  s_and_b64     exec, exec, s[12:13]                    // 000000000064: 8BFE0C7E
  s_waitcnt     lgkmcnt(0)                              // 000000000068: BF89FC07
  image_sample  v[0:3], [v2,v0], s[4:11], s[0:3] dmask:0xf dim:SQ_RSRC_IMG_2D // 00000000006C: F06C0F05 00010002 00000000
  s_waitcnt     vmcnt(0)                                // 000000000078: BF8903F7
  v_cvt_pk_rtz_f16_f32  v0, v0, v1                      // 00000000007C: 5E000300
  v_cvt_pk_rtz_f16_f32  v2, v2, v3                      // 000000000080: 5E040702
  s_mov_b64     exec, s[12:13]                          // 000000000084: BEFE010C
  exp           mrt0, v0, v2, off, off done             // 000000000088: F8000803 00000200
  s_endpgm                                              // 000000000090: BFB00000
  s_code_end                                            // 000000000094: BF9F0000
  s_code_end                                            // 000000000098: BF9F0000
  s_code_end                                            // 00000000009C: BF9F0000
  s_code_end                                            // 0000000000A0: BF9F0000
end

Conclusion

In summary, RGA v2.9.1 simplifies DirectX®12 offline shader compilation and analysis and makes it easier for you to quickly investigate single shaders.

Get the Radeon Developer Tool Suite today!

You can find out more about RGA, including links to the release binaries on GitHub and the full release notes list, on our product page.

Your feedback is incredibly valuable to us and helps drive the RGA roadmap. For feature requests or feedback, get in touch on GitHub!

Adam Sawicki
Adam Sawicki

Adam is a Principal Member of Technical Staff Developer Technology Engineer in Game Engineering group, focusing primarily on Direct3D®12 and Vulkan® games technology and the games that use it.

Apurva Modak
Apurva Modak

Apurva Modak (www.linkedin.com/in/apurva-modak) works on creating compilation and optimization software for compute and graphics workflows with AMD’s Radeon™ GPU Analyzer. Links to third party sites are provided for convenience and unless explicitly stated, AMD is not responsible for the contents of such linked sites and no endorsement is implied.

Amit Ben-Moshe
Amit Ben-Moshe

Amit Ben-Moshe is a Technical Lead and a Principal Member of Technical Staff at AMD. His postings are his own opinions and may not represent AMD’s positions, strategies or opinions. Links to third party sites are provided for convenience and unless explicitly stated, AMD is not responsible for the contents of such linked sites and no endorsement is implied.

Enjoy this blog post? If you found it useful, why not share it with other game developers?

You may also like...

Getting started: AMD GPUOpen software

New or fairly new to AMD’s tools, libraries, and effects? This is the best place to get started on GPUOpen!

AMD GPUOpen Getting Started Development and Performance

Looking for tips on getting started with developing and/or optimizing your game, whether on AMD hardware or generally? We’ve got you covered!

GPUOpen Manuals

Don’t miss our manual documentation! And if slide decks are what you’re after, you’ll find 100+ of our finest presentations here.

AMD GPUOpen Technical blogs

Browse our technical blogs, and find valuable advice on developing with AMD hardware, ray tracing, Vulkan®, DirectX®, Unreal Engine, and lots more.

AMD GPUOpen videos

Words not enough? How about pictures? How about moving pictures? We have some amazing videos to share with you!

AMD GPUOpen Performance Guides

The home of great performance and optimization advice for AMD RDNA™ 2 GPUs, AMD Ryzen™ CPUs, and so much more.

AMD GPUOpen software blogs

Our handy software release blogs will help you make good use of our tools, SDKs, and effects, as well as sharing the latest features with new releases.

AMD GPUOpen publications

Discover our published publications.