GPUPerfAPI

The GPU Performance API (GPUPerfAPI, or GPA) is a powerful library, providing access to GPU Performance Counters. It can help analyze the performance and execution characteristics of applications using a Radeon™ GPU.

GPUPerfAPI is used by Radeon GPU Profiler, as well as several third-party tools including Microsoft PIX on Windows and RenderDoc.

Download the latest version - v3.15

This release adds the following features:

Updated equation for MemUnitBusyCycles.
Updated description of LocalVidMemBytes.
Reduced size of static buffer when logging messages to avoid compiler warning.
Fixed an issue on some variant hardware that would prevent enabling certain hardware counters.

Benefits

Provides a standard API for accessing GPU Performance counters for both graphics and compute workloads across multiple GPU APIs.
Supports Vulkan™, DirectX® 12, DirectX® 11, OpenGL™ and OpenCL™.
Supports all recent GCN™ & RDNA™-based Radeon graphics cards and APUs based on Graphics IP version 8 and newer.
Supports both Windows® and Linux.
Provides derived “public” counters based on raw HW counters.
Provides access to some raw hardware counters. See Raw Hardware Counters for more information.

Find out more

RDNA 3: Read about our tool updates in Radeon Developer Tool Suite (RDTS)

Read this high level summary of our updates to RDTS for RDNA™ 3, including other new features and improvements, plus updates to GPUPerfAPI.

GPUPerfAPI v3.7 includes Radeon™ RX 6000 support and new raytracing counters

GPUPerfAPI v3.7 brings support for Radeon™ RX 6000 series GPUs, new raytracing counters for DirectX® Raytracing, a new scalar and instruction cache counter, and new raytracing High-Frequency counters in Microsoft® PIX2.

Requirements

Supported GPUs

Radeon™ RX 7000 series
Radeon™ RX 6000 series
Radeon™ RX 5500 series and RX 5300 series
Radeon™ RX 5700 and RX 5700 XT
Radeon™ VII
Radeon™ RX Vega
Ryzen™ 7000 Series with Radeon™ 700M Series Graphics
Ryzen™ RX 4600H with Radeon™ Vega Graphics
Ryzen™ 5 2400G and Ryzen™ 3 2200G Processors with Radeon™ Vega Graphics
Radeon™ R9 Fury, Fury X and Fury Nano
Radeon™ RX 400 and RX 500
Tonga R9 285, R9 380

Supported graphics APIs

DirectX® 12
Vulkan®
DirectX® 11
OpenGL®

Supported compute APIs

OpenCL™ (on Windows)

Supported OSs

Windows® 10
Windows® 11
Linux® – Ubuntu 18.04 LTS
Linux® – Ubuntu 20.04 LTS
Linux® – Ubuntu 22.04 LTS

Version history

Version 3.14 (September 2023)

Added support for AMD Radeon RX 7700 XT and AMD Radeon RX 7800 XT graphics cards.
Added support for additional AMD Radeon 700M Series devices.
Improved support for multi-GPU systems.
Added counters back to Gfx9, Gfx10, Gfx103, and Gfx11 hardware generations. These restored counters are listed below by group:
- Timing:
  - TessellatorBusy, TessellatorBusyCycles
  - VsGsBusy, VsGsBusyCycles, VsGsTime
  - PreTessellationBusy, PreTessellationBusyCycles, PreTessellationTime
  - PostTessellationBusy, PostTessellationBusyCycles, PostTessellationTime
- VertexGeometry:
  - VsGsVerticesIn, VsGsPrimsIn, GSVerticesOut
- PreTessellation:
  - PreTessVerticesIn
- PostTessellation:
  - PostTessPrimsOut
- PrimitiveAssembly:
  - PrimitivesIn
- TextureUnit:
  - TexTriFilteringPct, TexTriFilteringCount, NoTexTriFilteringCount
  - TexVolFilteringPct, TexVolFilteringCount, NoTexVolFilteringCount
New counters added:
- MemoryCache:
  - L0TagConflictReadStalledCycles, L0TagConflictWriteStalledCycles, L0TagConflictAtomicStalledCycles

Version 3.13 (June 2023)

Add support for AMD Radeon RX 7000M series hardware.
Add support for AMD Radeon RX 7000S series hardware.
OpenCL support for AMD Radeon RX 7000 series hardware has been restored if using Adrenalin 23.3.2 or newer.
Code has been updated to C++17 language standard.
Fixed a regression that resulted in a crash on certain hardware variants.

Version 3.12 (December 2022)

Add support for Radeon™ RX 7900 XTX and 7900 XT GPUs.
GPA binary sizes have been reduced by approximately 75%.
Update PreTessellation and PostTessellation counters to report results only when tessellation is in use.

Version 3.11.1 (July 2022)

Updated to support the Adrenalin 22.7.1 driver.
Added L2CacheHit counter to OpenGL for parity with other APIs on Radeon RX 5000 Series hardware.

Version 3.11 (April 2022)

Add support for additional GPUs and APUs.
Add support for raytracing counters in Vulkan on RDNA2 (Radeon RX 6000 Series) hardware:
- RayTriTests, and RayBoxTests: These counters collect the number of ray intersections for triangles and boxes, respectively.
- TotalRayTests: This counter collects the aggregated number of ray-box and ray-triangle intersection tests.
- RayTestsPerWave: This counter collects ray intersection test count at a more granular level – per wave.

Version 3.10 (January 2022)

Add support for additional GPUs and APUs, including AMD Radeon™ RX 6300, 6400, and 6500 series GPUs.
Redefined derived counters on GCN™ (Vega), RDNA™, and RDNA™ 2 hardware.
New entrypoint added: GpaGetDeviceGeneration.
Add support for GPA_OVERRIDE_LOG_LEVEL environment variable to increase or decrease logging output.
Fixed driver version detection in OpenGL™ and DirectX® 11.
Extensive counter validation in DirectX® 12.
Improvements made to sample applications.

Version 3.9 (July 2021)

Add support for additional GPUs and APUs, including AMD Radeon™ RX 6600 series GPUs.

Version 3.8 (April 2021)

Add support for additional GPUs and APUs, including AMD Radeon™ RX 6700 series GPUs.
Code has been updated to adhere to Google C++ Style Guide.
- New public headers have been added.
- Old headers are deprecated and will emit compile-time message.
- Projects loading GPA will need to be recompiled, but no code changes are required unless moving to the new headers.
Improvements made to sample applications.
Updated documentation for new codestyle (and https://github.com/GPUOpen-Tools/gpu_performance_api/issues/56)

Version 3.7 (November 2020)

Add support for additional GPUs and APUs, including AMD RDNA™ 2 Radeon™ RX 6000 series GPUs.
New RT counters for DXR workloads on AMD RDNA™ 2 Radeon™ RX 6000 series GPUs:
- RayTriTests, and RayBoxTests: These counters collect the number of ray intersections for triangles and boxes, respectively.
- TotalRayTests: This counter collects the aggregated number of ray-box and ray-triangle intersection tests.
- RayTestsPerWave: This counter collects ray intersection test count at a more granular level – per wave.
New Scalar and Instruction cache counters on AMD RDNA™ Radeon™ RX 5000 series GPUs:
- Scalar cache: ScalarCacheHit, ScalarCacheRequestCount, ScalarCacheHitCount, ScalarCacheMissCount.
- Instruction cache: InstCacheHit, InstCacheRequestCount, InstCacheHitCount, InstCacheMissCount.
Update the Vulkan® sample to remove the static link and use the system-specific Vulkan® loader.
Remove OpenCL™ support on Linux®.
Remove downloading the Vulkan® SDK by the build script.

Version 3.6 (May 2020)

Add support for additional GPUs and APUs, including AMD Ryzen™ 4000 Series APUs.
Add two new GFX10 GlobalMemory Counters for graphics using DX12 and Vulkan®: LocalVidMemBytes and PcieBytes .
Add VS2019 project support to CMake.
Restructure of GPA source layout to adhere to Google style.

Version 3.5 (December 2019)

Add support for additional GPUs and APUs, including Radeon™ 5500 and Radeon™ 5300 Series GPUs.
Add DirectX®11 sample application using GPUPerfAPI.
Add per-API static counter generation.
Decrease in GPUPerfAPI binaries size.
Add script to package GPUPerfAPI post-build.
Remove ROCm/HSA support.
Add Unicode support in GPUPerfAPI for Linux.
Bugs Fixed:
- Fixed CMake files to respect supported build flags.
- Fixed crash when DX12 debug layer was enabled.
- Fixed an issue with loading of shader in GPA Vulkan® sample app.
- Fixed an issue in Vulkan® build with newer Vulkan® SDK with amd_shader_core_properties2 extension
- Fixed an issue with crash on unsupported Gfx6 and Gfx7 GPUs.

Version 3.4 (July 2019)

Add support for additional GPUs and APUs, including Radeon 5700 Series GPUs.
Add support for setting stable GPU clocks for DirectX11, OpenGL and OpenCL.
Add an OpenGL sample application that uses GPUPerfAPI.
Add basic counter validation to sample applications.
Add support for enabling individual hardware counters that make up derived counters.
Add two new GFX9 GlobalMemory Counters for graphics: LocalVidMemBytes and PcieBytes .
Reformat source code using clang-format.
Update counter documentation to contain per-hardware-generation tables.
Bugs Fixed:
- Fixed error handling in GPA_GetEnabledIndex , GPA_EnableCounterByName , and GPA_DisbleCounterByName .
- Fixed an issue with Vulkan timing counters (https://github.com/GPUOpen-Tools/GPA/issues/40).
- Fixed an issue with SALUBusy counters.
- Fixed an issue with HiZQuadsCulledCount and HiZQuadsSurvivingCount counters on GFX8 GPUs.
- Fixed an issue with MemUnitBusy and MemUnitStalled counters on GFX8 GPUs.
- Fixed an issue with VSVALUBusyCycles counter on GFX9 GPUs.

Version 3.3 (December 2018)

Add support for additional GPUs and APUs.
New CMake-based build system.
Support building on Ubuntu 18.04.
ROCm/HSA: uses new librocprofiler64.so rather than deprecated libhsa-runtime-tools64.so library for performance counter collection.
Timing-based counters are now reported in nanoseconds instead of milliseconds.
New timing counter to report top-of-pipe to bottom-of-pipe duration.
GPA now builds GoogleTest libraries on the fly rather than using prebuilt binaries.

Version 3.2 (August 2018)

Add support for additional GPUs and APUs.
Wrapped all GPA entrypoints in try/catch to ensure unhandled exceptions do not escape the GPA library.
Add VS2017 project files.
Bugs Fixed:
- Fixed https://github.com/GPUOpen-Tools/GPA/issues/18.
- Fixed support for scheduling counters on multiple sessions.
- OpenGL: Fixed a bug in GPASample cleanup.

Version 3.1 (Jun 2018)

Add support for additional GPUs and APUs.
Usability improvements to GPAInterfaceLoader.h .
New Vulkan and DirectX 12 sample applications.
New GPA_GetSampleId entry point.
New GPA_GetVersion entry point.
Bugs Fixed:
- Fixed issues with some counters on 56CU Vega10.
- Vulkan: Fixed GPA_ContinueSampleOnCommandList .
- Vulkan: Ensure results are ready before trying to query them.
- DirectX 12: Fixed incorrect device reference counting issue.

Version 3.0 (March 2018)

Add support for additional GPUs and APUs.
Support for collecting hardware counters for Vulkan and DirectX 12 applications.
Redesigned API to support modern graphics APIs.
The documentation has been rewritten and is now available in HTML format.
New counters added:
- Cycle and count-based counters in addition to existing percentage-based counters.
- New Depth Buffer memory read/write counters.
- Additional Color Buffer memory counters.
- For graphics, several global memory counters which were previously available only in the Compute Shader stage are now available generically.
Support for setting stable GPU clocks.
Counter Group Names can now be queried separately from Counter Descriptions.
Counters now have a UUID which can be used to uniquely identify a counter.
New entry point ( GPA_GetFuncTable ) to retrieve a table of function pointers for all GPA entry points.
New C++ GPAInterfaceLoader.h header file provides an easy way to load and use GPA entry points.
Bugs Fixed:
- Fixed an issue with TesselatorBusy counter on many GFX8 GPUs.
- Fixed an issue with FlatVMemInsts and CSFlatVMemInsts counters on many GFX8 GPUs.
- Fixed an issue with LDSInsts counter on Vega GPUs.
- Fixed some issues with Compute Shader counters on Vega GPUs.
- Some counter combinations could lead to incorrect counter results.
- Enabling counters in a certain order can lead to incorrect counter scheduling across multiple passes.
- ROCm/HSA: GPA_OpenContext crashes if libhsa-runtime64.so.1 can’t be found.
- ROCm/HSA: GPA does not coexist nicely with an application that also sets the HSA_TOOLS_LIB environment variable.
- OpenGL: Fixed a crash that can occur with an incorrectly-configured OpenGL driver.
- OpenGL: Fixed some issues with OpenGL device-detection.

Our other SDKs

Capsaicin is a Direct3D12 framework for real-time graphics research which implements the GI-1.0 technique and a reference path-tracer.

The Render Pipeline Shaders (RPS) SDK provides a framework for graphics engines to use Render Graphs with explicit APIs.

ADLX is a modern library designed to access features and functionality of AMD systems such as Display, 3D graphics, Performance Monitoring, GPU Tuning, and more.

Brotli-G is an open-source compression/decompression standard for digital assets (based on Brotli) that is compatible with GPU hardware.

HIP RT is a ray tracing library for HIP, making it easy to write ray tracing applications in HIP.

Orochi is a library which loads HIP and CUDA® APIs dynamically, allowing the user to switch APIs at runtime.

AMD Radeon™ ProRender is our fast, easy, and incredible physically-based rendering engine built on industry standards that enables accelerated rendering on virtually any GPU, any CPU, and any OS in over a dozen leading digital content creation and CAD applications.

Radeon™ Machine Learning (Radeon™ ML or RML) is an AMD SDK for high-performance deep learning inference on GPUs.

Harness the power of machine learning to enhance images with denoising, enabling your application to produce high quality images in a fraction of the time traditional denoising filters take.

The Advanced Media Framework SDK provides developers with optimal access to AMD GPUs for multimedia processing.

AMD GPUOpen Direct3D12 Memory Allocator (D3D12MA)

The D3D12 Memory Allocator (D3D12MA) is a C++ library that provides a simple and easy-to-integrate API to help you allocate memory for DirectX®12 buffers and textures.

The AMD Display Library (ADL) SDK is designed to access display driver functionality for AMD Radeon™ and AMD FirePro™ graphics cards.

The AMD GPU Services (AGS) library provides software developers with the ability to query AMD GPU software and hardware state information that is not normally available through standard operating systems or graphics APIs.

VMA is our single-header, MIT-licensed, C++ library for easily and efficiently managing memory allocation for your Vulkan® games and applications.

AMD TrueAudio Next is a software development kit for GPU accelerated and multi-core high-performance audio signal processing.

AMD Radeon™ ProRender SDK is a powerful physically-based path traced rendering engine that enables creative professionals to produce stunningly photorealistic images.

The lightweight accelerated ray intersection library for DirectX®12 and Vulkan®.

Compressonator is a set of tools to allow artists and developers to more easily work with compressed assets and easily visualize the quality impact of various compression technologies.

LiquidVR™ provides a Direct3D 11 based interface for applications to get access to the following GPU features regardless of whether a VR device is installed on a system.

Engines and APIs

Engines and APIs

Hybrid RT and samples

Hybrid RT and samples

Our SDKs and libraries