How do CUDA devices handle immediate operands?_问答_开发者

How do CUDA devices handle immediate operands?

开发者 https://www.devze.com 2022-12-27 07:26 出处：网络

Compiling CUDA code with immediate (integer) operands, are they held in the instruction stream, or are they placed into memory? Specifically I\'m thinking about 24 or 32 bit unsigned integer operands.

Compiling CUDA code with immediate (integer) operands, are they held in the instruction stream, or are they placed into memory? Specifically I'm thinking about 24 or 32 bit unsigned integer operands.

I haven't been able to find information about this in any of the CUDA documentation I've examined so far. So ref开发者_如何学Pythonerences to any documents on specific uarch details like this would be perfect, as I don't currently have a good model for how CUDA works at this level.

NVIDIA doesn't release any information about how the devices work at this level. There is a tool called decuda that can decompile cubins, so you can see the machine code. If I recall, immediates go into the instruction stream, at least as far a decuda is able to deduce. The problem with decuda is that it only works for CUDA 2.3 or lower. They changed the executable format to elf in CUDA 3.0, and decuda hasn't been maintained in a long time.

The best official documentation is the PTX documentation, but that documents a virtual machine isa, not the real device.

If I recall correctly integer division (for example) is very costly, some while floating point operations (like sinf(..)) are completely implemented in hardware and therefore fast.

This talk gave me some insight: "CUDA Tricks for Computational Physics" http://physics.bu.edu/~kbarros/talks/