4.7. Occupancy
This section describes the occupancy calculation functions of the CUDA runtime application programming interface.
Besides the occupancy calculator functions (cudaOccupancyMaxActiveBlocksPerMultiprocessor and cudaOccupancyMaxActiveBlocksPerMultiprocessorWithFlags), there are also C++ only occupancy-based launch configuration functions documented in C++ API Routines module.
See cudaOccupancyMaxPotentialBlockSize ( C++ API), cudaOccupancyMaxPotentialBlockSize ( C++ API), cudaOccupancyMaxPotentialBlockSizeVariableSMem ( C++ API), cudaOccupancyMaxPotentialBlockSizeVariableSMem ( C++ API)
Functions
- __host__  __device__ cudaError_t cudaOccupancyMaxActiveBlocksPerMultiprocessor ( int* numBlocks, const void* func, int blockSize, size_t dynamicSMemSize )
 - Returns occupancy for a device function.
 - __host__ cudaError_t cudaOccupancyMaxActiveBlocksPerMultiprocessorWithFlags ( int* numBlocks, const void* func, int blockSize, size_t dynamicSMemSize, unsigned int flags )
 - Returns occupancy for a device function with the specified flags.
 
Functions
- __host__  __device__ cudaError_t cudaOccupancyMaxActiveBlocksPerMultiprocessor ( int* numBlocks, const void* func, int blockSize, size_t dynamicSMemSize )
 - 
                           Returns occupancy for a device function.
Parameters
- numBlocks
 - - Returned occupancy
 - func
 - - Kernel function for which occupancy is calculated
 - blockSize
 - - Block size the kernel is intended to be launched with
 - dynamicSMemSize
 - - Per-block dynamic shared memory usage intended, in bytes
 
Returns
cudaSuccess, cudaErrorCudartUnloading, cudaErrorInitializationError, cudaErrorInvalidDevice, cudaErrorInvalidDeviceFunction, cudaErrorInvalidValue, cudaErrorUnknown,
Description
Returns in *numBlocks the maximum number of active blocks per streaming multiprocessor for the device function.
Note:Note that this function may also return error codes from previous, asynchronous launches.
See also:
cudaOccupancyMaxActiveBlocksPerMultiprocessorWithFlags, cudaOccupancyMaxPotentialBlockSize ( C++ API), cudaOccupancyMaxPotentialBlockSizeWithFlags ( C++ API), cudaOccupancyMaxPotentialBlockSizeVariableSMem ( C++ API), cudaOccupancyMaxPotentialBlockSizeVariableSMemWithFlags ( C++ API), cuOccupancyMaxActiveBlocksPerMultiprocessor
 - __host__ cudaError_t cudaOccupancyMaxActiveBlocksPerMultiprocessorWithFlags ( int* numBlocks, const void* func, int blockSize, size_t dynamicSMemSize, unsigned int flags )
 - 
                           Returns occupancy for a device function with the specified flags.
Parameters
- numBlocks
 - - Returned occupancy
 - func
 - - Kernel function for which occupancy is calculated
 - blockSize
 - - Block size the kernel is intended to be launched with
 - dynamicSMemSize
 - - Per-block dynamic shared memory usage intended, in bytes
 - flags
 - - Requested behavior for the occupancy calculator
 
Returns
cudaSuccess, cudaErrorCudartUnloading, cudaErrorInitializationError, cudaErrorInvalidDevice, cudaErrorInvalidDeviceFunction, cudaErrorInvalidValue, cudaErrorUnknown,
Description
Returns in *numBlocks the maximum number of active blocks per streaming multiprocessor for the device function.
The flags parameter controls how special cases are handled. Valid flags include:
- 
                                       
cudaOccupancyDefault: keeps the default behavior as cudaOccupancyMaxActiveBlocksPerMultiprocessor
 
- 
                                       
cudaOccupancyDisableCachingOverride: This flag suppresses the default behavior on platform where global caching affects occupancy. On such platforms, if caching is enabled, but per-block SM resource usage would result in zero occupancy, the occupancy calculator will calculate the occupancy as if caching is disabled. Setting this flag makes the occupancy calculator to return 0 in such cases. More information can be found about this feature in the "Unified L1/Texture Cache" section of the Maxwell tuning guide.
 
Note:Note that this function may also return error codes from previous, asynchronous launches.
See also:
cudaOccupancyMaxActiveBlocksPerMultiprocessor, cudaOccupancyMaxPotentialBlockSize ( C++ API), cudaOccupancyMaxPotentialBlockSizeWithFlags ( C++ API), cudaOccupancyMaxPotentialBlockSizeVariableSMem ( C++ API), cudaOccupancyMaxPotentialBlockSizeVariableSMemWithFlags ( C++ API), cuOccupancyMaxActiveBlocksPerMultiprocessorWithFlags