SM75 (Turing) Instructions
111 base instructions, 605 total variants
AL2P(1)
Unknown Op
ALD(7)
Unknown Op
ATOM(6)
Atomic Operation on Generic Memory
ATOMG(6)
Atomic Operation on Global Memory
ATOMS(6)
Atomic Operation on Shared Memory
B2R(3)
Move Barrier To Register
BMMA(1)
Bit Matrix Multiply and Accumulate
BMSK(5)
Bitfield Mask
BREV(5)
Bit Reverse
CS2R(1)
Move Special Register to Register
CSMTEST(1)
Unknown Op
DADD(5)
FP64 Add
DFMA(9)
FP64 Fused Mutiply Add
DMUL(5)
FP64 Multiply
DSETP(5)
FP64 Compare And Set Predicate
ERRBAR(1)
Error Barrier
F2I(4)
Floating Point To Integer Conversion
FADD(5)
FP32 Add
FCHK(5)
Floating-point Range Check
FFMA(9)
FP32 Fused Multiply and Add
FLO(5)
Find Leading One
FMNMX(5)
FP32 Minimum/Maximum
FMUL(5)
FP32 Multiply
FOOTPRINT(8)
Unknown Op
FSEL(5)
Floating Point Select
FSET(5)
FP32 Compare And Set
FSETP(5)
FP32 Compare And Set Predicate
FSWZADD(1)
FP32 Swizzle Add
GETLMEMBASE(1)
Get Local Memory Base Address
HADD2(5)
FP16 Add
HFMA2(9)
FP16 Fused Mutiply Add
HMMA(2)
Matrix Multiply and Accumulate
HMUL2(5)
FP16 Multiply
HSET2(5)
FP16 Compare And Set
HSETP2(5)
FP16 Compare And Set Predicate
I2F(3)
Integer To Floating Point Conversion
I2I(5)
Integer To Integer Conversion
I2IP(5)
Integer To Integer Conversion and Packing
IABS(5)
Integer Absolute Value
IADD3(10)
3-input Integer Addition
IDP(4)
Integer Dot Product and Accumulate
IMAD(50)
Integer Multiply And Add
IMMA(1)
Integer Matrix Multiply and Accumulate
IMNMX(5)
Integer Minimum/Maximum
IPA(10)
Unknown Op
ISBERD(1)
Unknown Op
ISETP(5)
Integer Compare And Set Predicate
LD(16)
Load from generic Memory
LDC(4)
Load Constant
LDG(8)
Load from Global Memory
LDL(4)
Load within Local Memory Window
LDS(8)
Load within Shared Memory Window
LDSM(4)
Load Matrix from Shared Memory with Element Size Expansion
LDTRAM(2)
Unknown Op
LEA(22)
LOAD Effective Address
LEPC(1)
Load Effective PC
LOP3(5)
Logic Operation
MATCH(2)
Match Register Values Across Thread Group
MOV(5)
Move
MOVM(1)
Move Matrix with Transposition or Expansion
MUFU(5)
FP32 Multi Function Operation
NOP(1)
No Operation
OUT(3)
Unknown Op
P2R(5)
Move Predicate Register To Register
PIXLD(1)
Unknown Op
PLOP3(14)
Predicate Logic Operation
PMTRIG(1)
Performance Monitor Trigger
POPC(5)
Population count
PRMT(9)
Permute Register Pair
QSPC(4)
Query Space
R2P(5)
Move Register To Predicate Register
R2UR(1)
Move from Vector Register to a Uniform Register
RPCMOV(10)
PC Register Move
S2R(1)
Move Special Register to Register
S2UR(1)
Move Special Register to Uniform Register
SEL(5)
Select Source with Predicate
SETCTAID(1)
Set CTA ID
SGXT(5)
Sign Extend
SHF(9)
Funnel Shift
SHFL(4)
Warp Wide Register Shuffle
SUATOM(8)
Atomic Op on Surface Memory
SULD(8)
Surface Load
TEX(12)
Texture Fetch
TLD(12)
Texture Load
TLD4(12)
Texture Load 4
TMML(12)
Texture MipMap Level
TXD(12)
Texture Fetch With Derivatives
TXQ(5)
Texture Query
UBMSK(2)
Uniform Bitfield Mask
UBREV(2)
Uniform Bit Reverse
UCLEA(2)
Load Effective Address for a Constant
UFLO(2)
Uniform Find Leading One
UIADD3(8)
Uniform Integer Addition
UIMAD(10)
Uniform Integer Multiplication
UISETP(4)
Integer Compare and Set Uniform Predicate
ULDC(6)
Load from Constant Memory into a Uniform Register
ULEA(8)
Uniform Load Effective Address
ULOP3(2)
Logic Operation
UMOV(2)
Uniform Move
UP2UR(2)
Uniform Predicate to Uniform Register
UPLOP3(4)
Uniform Predicate Logic Operation
UPOPC(2)
Uniform Population Count
UPRMT(2)
Uniform Byte Permute
UR2UP(2)
Uniform Register to Uniform Predicate
USEL(2)
Uniform Select
USGXT(2)
Uniform Sign Extend
USHF(3)
Uniform Funnel Shift
VABSDIFF(9)
Absolute Difference
VABSDIFF4(9)
Absolute Difference
VOTE(1)
Vote Across SIMD Thread Group
VOTEU(1)
Voting across SIMD Thread Group with Results in Uniform Destination
Unfound Instructions
Our fuzzer has not found these 61 instructions. If you have a cubin that contains any of these instructions and would like to contribute it, message us at collab@sf-tensor.com
BARunfound
Barrier Synchronization
BMOVunfound
Move Convergence Barrier State
BPTunfound
BreakPoint/Trap
BRAunfound
Relative Branch
BREAKunfound
Break out of the Specified Convergence Barrier
BRXunfound
Relative Branch Indirect
BRXUunfound
Relative Branch with Uniform Register Based Offset
BSSYunfound
Barrier Set Convergence Synchronization Point
BSYNCunfound
Synchronize Threads on a Convergence Barrier
CALLunfound
Call Function
CCTLunfound
Cache Control
CCTLLunfound
Cache Control
CCTLTunfound
Texture Cache Control
DEPBARunfound
Dependency Barrier
EXITunfound
Exit Program
F2Funfound
Floating Point To Floating Point Conversion
FADD32Iunfound
FP32 Add
FFMA32Iunfound
FP32 Fused Multiply and Add
FMUL32Iunfound
FP32 Multiply
FRNDunfound
Round To Integer
HADD2_32Iunfound
FP16 Add
HFMA2_32Iunfound
FP16 Fused Mutiply Add
HMUL2_32Iunfound
FP16 Multiply
IADDunfound
Integer Addition
IADD32Iunfound
Integer Addition
IDP4Aunfound
Integer Dot Product and Accumulate
IMULunfound
Integer Multiply
IMUL32Iunfound
Integer Multiply
ISCADDunfound
Scaled Integer Addition
ISCADD32Iunfound
Scaled Integer Addition
JMPunfound
Absolute Jump
JMXunfound
Absolute Jump Indirect
JMXUunfound
Absolute Jump with Uniform Register Based Offset
KILLunfound
Kill Thread
LOPunfound
Logic Operation
LOP32Iunfound
Logic Operation
MEMBARunfound
Memory Barrier
MOV32Iunfound
Move
NANOSLEEPunfound
Suspend Execution
PSETPunfound
Combine Predicates and Set Predicate
R2Bunfound
Move Register to Barrier
REDunfound
Reduction Operation on Generic Memory
RETunfound
Return From Subroutine
RTTunfound
Return From Trap
SETLMEMBASEunfound
Set Local Memory Base Address
SHLunfound
Shift Left
SHRunfound
Shift Right
STunfound
Store to Generic Memory
STGunfound
Store to Global Memory
STLunfound
Store to Local Memory
STSunfound
Store to Shared Memory
SUREDunfound
Reduction Op on Surface Memory
SUSTunfound
Surface Store
UIADD3.64unfound
Uniform Integer Addition
ULOPunfound
Logic Operation
ULOP32Iunfound
Logic Operation
UPSETPunfound
Uniform Predicate Logic Operation
USHLunfound
Uniform Left Shift
USHRunfound
Uniform Right Shift
WARPSYNCunfound
Synchronize Threads in Warp
YIELDunfound
Yield Control