-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
请问小彭老师,这段GPU代码为什么加速比这么低? #8
Comments
顺便问一下小彭老师,什么时候把CUDA nsight安排上! |
README里面写了,小彭老师不回答CUDA优化相关的问题。我来回答下吧,你这个开的block数目太多了,总共就那几十个SM,你开了1万多个block,光调度这些block运行在SM上开销就很大了。可以让一个block计算更多的数据,例如每一个block计算256*256个input数据,每一个block内的thread计算256个数据。 |
1和2的问题是可能是你算法需要,改了你的结果就不对了。3这个问题我给你改下。 __global__ void GPU_Cal(TYPE *input, TYPE *output, int width, int height, TYPE *para0, TYPE *para1,
TYPE *para2) {
for (int row = threadIdx.y + blockIdx.y * blockDim.y; row < height; row += gridDim.y * blockDim.y) {
for (int col = threadIdx.x + blockIdx.x * blockDim.x; col < width; col += gridDim.x * blockDim.x) {
int i = row * width + col;
TYPE data = input[i];
TYPE x = (row - para2[0]) * para2[2];
TYPE y = (col - para2[1]) * para2[3];
const TYPE a = para0[0] + para0[2] * x + data * (para0[1] + para0[3] * x) + para0[4] * y + data * para0[5] * y;
const TYPE b = para1[0] + para1[2] * x + data * (para1[1] + para1[3] * x) + para1[4] * y + data * para1[5] * y;
output[i] = a / b;
}
}
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
测试环境:
笔记本R7-5800H,3060,Win11,MSVC最新版Release模式。
测试结果:
GPU time: 0.0018809
CPU time: 0.0048002
ratio: 2.55208
我用其它的CUDA程序加速比都能达到10倍左右,这个加速比为什么这么慢?
(另外,改成float加速就很快,为什么?如果一定要用double,该怎么改?)
The text was updated successfully, but these errors were encountered: