Unity Compute Shader 中調用 numthreads 和 Dispatch 的區別 (Difference Between Calling numthreads and Dispatch in a Unity Compute Shader)


問題描述

Unity Compute Shader 中調用 numthreads 和 Dispatch 的區別 (Difference Between Calling numthreads and Dispatch in a Unity Compute Shader)

假設,假設我想使用計算著色器來運行使用 (8, 1, 1) 線程尺寸的 Kernel_X。

我可以將其設置為:

在腳本:

Shader.Dispatch(Kernel_X, 8, 1, 1);

在著色器中:

[numthreads(1,1,1)]
void Kernel_X(uint id : SV_DispatchThreadID) { ... }

或者我可以這樣設置:

在腳本中:

Shader.Dispatch(Kernel_X, 1, 1, 1);

在著色器中:

[numthreads(8,1,1)]
void Kernel_X(uint id : SV_DispatchThreadID) { ... }

我知道在這段代碼的末尾,尺寸會是 (8, 1, 1); 但是,我想知道如何切換數字實際上彼此不同。我的猜測是運行 Dispatch (Kernel_X, 8, 1, 1), “ran”; 1x1x1 內核 8 次,而運行 numthreads(8,1,1) 將運行 8x1x1 內核一次。


參考解法

方法 1:

To understand the difference, a bit of hardware knowledge is required:

Internally, a GPU works on so‑called wave fronts, which are SIMD‑style processing units (Like a group of threads, where each thread can have it's own data, but they all have to execute the exact same instruction at the exact same time, allways). The number of Threads per wave front is hardware dependent, but is usual either 32 (NVidia) or 64 (AMD).

Now, with [numthreads(8,1,1)] you request a shader thread group size of 8 x 1 x 1 = 8 threads, which the hardware is free to distribute among it's wave fronts. So, with 32 threads per wave front, the hardware would schedule one wave front per shader group, with 8 active threads in that wave front (the other 24 threads are "inactive", meaning they do the same work, but are discarding any memory writes). Then, with Dispatch(1, 1, 1), you are dispatching one such shader group, meaning there will be one wave front running on the hardware.

Would you use [numthreads(1,1,1)] instead, only one thread in a wave front could be active. So, by calling Dispatch(8, 1, 1) on that one, the hardware would require to run 8 shader groups (= 8 wave fronts), each one running just with 1/32 active threads, so while you would get the same result, you would waste a lot more computational power.

So, in general, for optimal performance you want to have shader group sizes that are multiples of 32 (or 64), while trying to call Dispatch with as low numbers as reasonable possible.

方法 2:

The Dispatch() call determines the number of thread groups you are invoking. This way you invoke 8 times 1 times 1 = 8 groups.

Shader.Dispatch(Kernel_X, 8, 1, 1);

And in the shader the [numthreads] tag specifies the size of the thread groups. This for example declares 8 times 1 times 1 = 8 threads for every group.

[numthreads(8,1,1)] void Kernel_X(uint id : SV_DispatchThreadID)
{ }

If you want to achieve a total of 8 threads, you can invoke a single group with 8 threads per group, or 8 groups with a single thread per group. The end result is going to be the same, though performance is not. Usually, you may want to have a threadgroup size that is a power of 2, and with nvidia you usually set it at least at 32 while AMD cards are optimized for at least 64 threads per group.

Btw, you usually dispatch way more than 8 threads, as it’s rather pointless to code a compute shader for just 8 threads and your cpu would probably be faster. So, you may want to call:

Shader.Dispatch(Kernel_X, Mathf.CeilToInt((float)wantedThreadNumber/wantedGroupSize), 1, 1);

(by TakeMeHomeCountryRoadsBizzarruskefren)

參考文件

  1. Difference Between Calling numthreads and Dispatch in a Unity Compute Shader (CC BY‑SA 2.5/3.0/4.0)

#hlsl #shader #unity3d #compute-shader






相關問題

如何將幾何著色器與輸出流一起使用? (How do you use Geometry Shader with Output Stream?)

поўнаэкранны квадрат у піксельным шэйдары мае экранныя каардынаты? (fullscreen quad in pixel shader has screen coordinates?)

硬編碼 HLSL 著色器 (Hardcoding HLSL Shader)

GLSL和HLSL之間的模型視圖區別? (Modelview Difference between GLSL and HLSL?)

未定義的 TEXCOORD 數量 (Undefined number of TEXCOORDs)

像素著色器總是返回白色 (Pixel shader always returning white)

GLSL / HLSL 著色器中的星球大戰全息效果 (Star Wars holographic effect in GLSL / HLSL shader)

在 GLSL 中混合多個紋理 (Blending multiple textures in GLSL)

警告 X4000:使用可能未初始化的變量 (warning X4000: use of potentially uninitialized variable)

著色器中的點積與直接向量分量總和性能 (Dot product vs Direct vector components sum performance in shaders)

Unity Compute Shader 中調用 numthreads 和 Dispatch 的區別 (Difference Between Calling numthreads and Dispatch in a Unity Compute Shader)

DirectX 11 曲面細分著色器不工作 (DirectX 11 Tesellation Shader Not Working)







留言討論