File buffer.h

This file contains the interface definition for the backends.

For normal use you should not call the functions defined in this file directly.

See also

array.h For managing buffers

See also

kernel.h For using kernels

Typedefs

typedef struct _gpucontext gpucontext

Opaque struct for context data.

typedef struct _gpucontext_props gpucontext_props

Opaque structure that holds properties for the context.

Enums

enum ga_usefl

Flags for gpukernel_init().

It is important to specify these properly as the compilation machinery will ensure that the proper configuration is made to support the requested features or error out if the demands cannot be met.

Warning

Failure to properly specify the feature flags will in most cases result in silent data corruption (especially on ATI cards).

Values:

enumerator GA_USE_SMALL

The kernel makes use of small (size is smaller than 4 bytes) types.

enumerator GA_USE_DOUBLE

The kernel makes use of double or complex doubles.

enumerator GA_USE_COMPLEX

The kernel makes use of complex of complex doubles.

enumerator GA_USE_HALF

The kernel makes use of half-floats (also known as float16)

enumerator GA_USE_CUDA

The kernel is made of CUDA code.

enumerator GA_USE_OPENCL

The kernel is made of OpenCL code.

Functions

int gpu_get_platform_count(const char *name, unsigned int *platcount)

Gets information about the number of available platforms for the backend specified in name.

Parameters:
  • name – the backend name

  • platcount – will contain number of compatible platforms in host

Returns:

GA_NO_ERROR, if success

int gpu_get_device_count(const char *name, unsigned int platform, unsigned int *devcount)

Gets information about the number of compatible devices on a specific host’s platform for the backend specified in name.

Parameters:
  • name – the backend name

  • platform – number for a platform in host

  • devcount – will contain number of compatible devices in platform

Returns:

GA_NO_ERROR, if success

int gpucontext_props_new(gpucontext_props **res)

Allocate and initialized an instance of gpucontext_props.

Initialization is done with default values.

Parameters:
  • res – pointer to storage space for the created object

Returns:

GA_NO_ERROR or an error code if an error occurred.

int gpucontext_props_cuda_dev(gpucontext_props *p, int devno)

Set the device number for a CUDA device.

Parameters:
  • p – properties object

  • devno – device number

Returns:

GA_NO_ERROR or an error code if an error occurred.

int gpucontext_props_opencl_dev(gpucontext_props *p, int platno, int devno)

Set the platform and device for OpenCL.

Parameters:
  • p – properties object

  • platno – platform number

  • devno – device number

Returns:

GA_NO_ERROR or an error code if an error occurred.

int gpucontext_props_sched(gpucontext_props *p, int sched)

Set the scheduling mode for the device.

Parameters:
  • p – properties object

  • sched – scheduling mode. One of these.

Returns:

GA_NO_ERROR or an error code if an error occurred.

int gpucontext_props_set_single_stream(gpucontext_props *p)

Set single-stream mode.

All operations on the device will be serialized on a single stream. This will also disable most of the interlocking normally done between multiple streams to keep everything in order.

This mode can be faster if you don’t have a lot of device-level parallelism in your workload.

Parameters:
  • p – properties object

Returns:

GA_NO_ERROR or an error code if an error occurred.

int gpucontext_props_kernel_cache(gpucontext_props *p, const char *path)

Set the path for the kernel cache.

The cache can be shared with other running instances, even on shared drives.

Parameters:
  • p – properties object

  • path – desired location of the kernel cache

Returns:

GA_NO_ERROR or an error code if an error occurred.

int gpucontext_props_alloc_cache(gpucontext_props *p, size_t initial, size_t max)

Configure the allocation cache.

The maximum size is also a limit on the total amount of memory allocated on the device.

Parameters:
  • p – properties object

  • initial – initial size of the cache

  • max – maximum size of the cache

Returns:

GA_NO_ERROR or an error code if an error occurred.

void gpucontext_props_del(gpucontext_props *p)

Free a properties object.

This should not be called on a properties object that has been passed to gpucontext_init().

Parameters:
  • p – properties object

Returns:

GA_NO_ERROR or an error code if an error occurred.

int gpucontext_init(gpucontext **res, const char *name, gpucontext_props *props)

Create a context on the specified device.

The passed-in properties pointer will be managed by this function and needs not be freed. This means that you shouldn’t touch the properties object after passing it to this function.

Warning

This function is not thread-safe.

Parameters:
  • res – a pointer to a location that will be allocated

  • name – the backend name.

  • props – a properties object for the context. Can be NULL for defaults.

Returns:

GA_NO_ERROR or an error code if an error occurred.

void gpucontext_deref(gpucontext *ctx)

Dereference a context.

This removes a reference to the context and as soon as the reference count drops to zero the context is destroyed. The context can stay alive after you call this function because some object keep a reference to their context.

Parameters:
  • ctx – a valid context pointer.

int gpucontext_property(gpucontext *ctx, int prop_id, void *res)

Fetch a context property.

The property must be a context property. The currently defined properties and their type are defined in Properties.

Parameters:
  • ctx – context

  • prop_id – property id (from Properties)

  • res – pointer to the return space of the appropriate type

Returns:

GA_NO_ERROR or an error code if an error occurred.

const char *gpucontext_error(gpucontext *ctx, int err)

Get a string describing err.

If you need to get a description of a error that occurred during context creation, call this function using NULL as the context. This version of the call is not thread-safe.

Parameters:
  • ctx – the context in which the error occured

  • err – error code

Returns:

string description of error

gpudata *gpudata_alloc(gpucontext *ctx, size_t sz, void *data, int flags, int *ret)

Allocates a buffer of size sz in context ctx.

Buffers are reference counted internally and start with a reference count of 1.

Parameters:
  • ctx – a context pointer

  • sz – the requested size

  • flags – see Allocation flags

  • data – optional pointer to host buffer

  • ret – error return pointer

Returns:

A non-NULL pointer to a gpudata structure. This structure is intentionally opaque as its content may change according to the backend used.

void gpudata_retain(gpudata *b)

Increase the reference count to the passed buffer by 1.

Parameters:
  • b – a buffer

void gpudata_release(gpudata *b)

Release a buffer.

This will decrement the reference count of the buffer by 1. If that count reaches 0 all associated ressources will be released.

Even if your application does not have any references left to a buffer it may still hang around if it is in use by internal mechanisms (kernel call, …)

int gpudata_share(gpudata *a, gpudata *b, int *ret)

Check if two buffers may overlap.

Both buffers must have been created with the same backend.

Parameters:
  • a – first buffer

  • b – second buffer

  • ret – error return pointer

Return values:
  • 1 – The buffers may overlap

  • 0 – The buffers do not overlap.

  • -1 – An error was encoutered, ret contains a detailed error code if not NULL.

int gpudata_move(gpudata *dst, size_t dstoff, gpudata *src, size_t srcoff, size_t sz)

Copy the content of a buffer to another.

Both buffers must be in the same context and contiguous. Additionally the buffers must not overlap otherwise the content of the destination buffer is not defined.

Parameters:
  • dst – destination buffer

  • dstoff – offset inside the destination buffer

  • src – source buffer

  • srcoff – offset inside the source buffer

  • sz – size of data to copy (in bytes)

Returns:

GA_NO_ERROR or an error code if an error occurred.

int gpudata_transfer(gpudata *dst, size_t dstoff, gpudata *src, size_t srcoff, size_t sz)

Transfer the content of buffer across contexts.

If possible it will try to the the transfer in an efficient way using backend-specific tricks. If those fail or can’t be used, it will fallback to a copy through the host.

Parameters:
  • dst – buffer to transfer to

  • dstoff – offset in the destination buffer

  • src – buffer to transfer from

  • srcoff – offset in the source buffer

  • sz – size of the region to transfer

Returns:

the new buffer in dst_ctx or NULL if no efficient way to transfer could be found.

int gpudata_read(void *dst, gpudata *src, size_t srcoff, size_t sz)

Transfer data from a buffer to memory.

The buffer and the memory region must be contiguous.

Parameters:
  • dst – destination in memory

  • src – source buffer

  • srcoff – offset inside the source buffer

  • sz – size of data to copy (in bytes)

Returns:

GA_NO_ERROR or an error code if an error occurred.

int gpudata_write(gpudata *dst, size_t dstoff, const void *src, size_t sz)

Transfer data from memory to a buffer.

The buffer and the memory region must be contiguous.

Parameters:
  • dst – destination buffer

  • dstoff – offset inside the destination buffer

  • src – source in memory

  • sz – size of data to copy (in bytes)

Returns:

GA_NO_ERROR or an error code if an error occurred.

int gpudata_memset(gpudata *dst, size_t dstoff, int data)

Set a buffer to a byte pattern.

This function acts like the C function memset() for device buffers.

Parameters:
  • dst – destination buffer

  • dstoff – offset into the destination buffer

  • data – byte value to write into the destination.

Returns:

GA_NO_ERROR or an error code if an error occurred.

int gpudata_sync(gpudata *b)

Synchronize a buffer.

Waits for all previous read, writes, copies and kernel calls involving this buffer to be finished.

This call is not required for normal use of the library as all exposed operations will properly synchronize amongst themselves. This call may be useful in a performance timing context to ensure that the work is really done, or before interaction with another library to wait for pending operations.

int gpudata_property(gpudata *buf, int prop_id, void *res)

Fetch a buffer property.

Can be used for buffer properties and context properties. Context properties will fetch the value for the context associated with the buffer. The currently defined properties and their type are defined in Properties.

Parameters:
  • buf – buffer

  • prop_id – property id (from Properties)

  • res – pointer to the return space of the appropriate type

Returns:

GA_NO_ERROR or an error code if an error occurred.

gpucontext *gpudata_context(gpudata *b)
gpukernel *gpukernel_init(gpucontext *ctx, unsigned int count, const char **strings, const size_t *lengths, const char *fname, unsigned int numargs, const int *typecodes, int flags, int *ret, char **err_str)

Compile a kernel.

Compile the kernel composed of the concatenated strings in strings and return a callable kernel. If lengths is NULL then all the strings must be NUL-terminated. Otherwise, it doesn’t matter (but the lengths must not include the final NUL byte if provided).

If *err_str is not NULL on return, the caller must call free(*err_str) after use.

Parameters:
  • ctx – context to work in

  • count – number of input strings

  • strings – table of string pointers

  • lengths – (optional) length for each string in the table

  • fname – name of the kernel function (as defined in the code)

  • numargs – number of kernel arguments

  • typecodes – the type of each argument

  • flags – flags for compilation (see ga_usefl)

  • ret – error return pointer

  • err_str – returns pointer to debug message from GPU backend (if provided a non-NULL err_str)

Returns:

Allocated kernel structure or NULL if an error occured. ret will be updated with the error code if not NULL.

void gpukernel_retain(gpukernel *k)

Retain a kernel.

Increase the reference count of the passed kernel by 1.

Parameters:
  • k – a kernel

void gpukernel_release(gpukernel *k)

Release a kernel.

Decrease the reference count of a kernel. If it reaches 0, all resources associated with k will be released.

If the reference count of a kernel reaches 0 while it is running, this call will block until completion.

int gpukernel_setarg(gpukernel *k, unsigned int i, void *a)

Set kernel argument.

Buffer arguments will not be retained and it is the responsability of the caller to ensure that the value is still valid whenever a call is made.

Parameters:
  • k – kernel

  • i – argument index (starting at 0)

  • a – pointer to argument

Returns:

GA_NO_ERROR or an error code if an error occurred.

int gpukernel_call(gpukernel *k, unsigned int n, const size_t *gs, const size_t *ls, size_t shared, void **args)

Call a kernel.

If args is NULL, it will be assumed that the arguments have previously been set with kernel_setarg().

Parameters:
  • k – kernel

  • n – number of dimensions of grid/block

  • gs – grid sizes for this call (also known as global size)

  • ls – block sizes for this call (also known as local size)

  • shared – amount of dynamic shared memory to reserve

  • args – table of pointers to each argument (optional).

Returns:

GA_NO_ERROR or an error code if an error occurred.

int gpukernel_property(gpukernel *k, int prop_id, void *res)

Fetch a property.

Can be used for kernel and context properties. The context properties will fetch the value for the context associated with the kernel. The currently defined properties and their type are defined in Properties.

Parameters:
  • k – kernel

  • prop_id – property id (from Properties)

  • res – pointer to the return space of the appropriate type

Returns:

GA_NO_ERROR or an error code if an error occurred.

gpucontext *gpukernel_context(gpukernel *k)