|
| 1 | +.. Licensed to the Apache Software Foundation (ASF) under one |
| 2 | +.. or more contributor license agreements. See the NOTICE file |
| 3 | +.. distributed with this work for additional information |
| 4 | +.. regarding copyright ownership. The ASF licenses this file |
| 5 | +.. to you under the Apache License, Version 2.0 (the |
| 6 | +.. "License"); you may not use this file except in compliance |
| 7 | +.. with the License. You may obtain a copy of the License at |
| 8 | +
|
| 9 | +.. http://www.apache.org/licenses/LICENSE-2.0 |
| 10 | +
|
| 11 | +.. Unless required by applicable law or agreed to in writing, |
| 12 | +.. software distributed under the License is distributed on an |
| 13 | +.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| 14 | +.. KIND, either express or implied. See the License for the |
| 15 | +.. specific language governing permissions and limitations |
| 16 | +.. under the License. |
| 17 | +
|
| 18 | +.. currentmodule:: pyarrow.cuda |
| 19 | + |
| 20 | +CUDA Integration |
| 21 | +================ |
| 22 | + |
| 23 | +Arrow is not limited to CPU buffers (located in the computer's main memory, |
| 24 | +also named "host memory"). It also has provisions for accessing buffers |
| 25 | +located on a CUDA-capable GPU device (in "device memory"). |
| 26 | + |
| 27 | +.. note:: |
| 28 | + This functionality is optional and must have been enabled at build time. |
| 29 | + If this is not done by your package manager, you might have to build Arrow |
| 30 | + yourself. |
| 31 | + |
| 32 | +CUDA Contexts |
| 33 | +------------- |
| 34 | + |
| 35 | +A CUDA context represents access to a particular CUDA-capable device. |
| 36 | +For example, this is creating a CUDA context accessing CUDA device number 0:: |
| 37 | + |
| 38 | + >>> from pyarrow import cuda |
| 39 | + >>> ctx = cuda.Context(0) |
| 40 | + >>> |
| 41 | + |
| 42 | +CUDA Buffers |
| 43 | +------------ |
| 44 | + |
| 45 | +A CUDA buffer can be created by copying data from host memory to the memory |
| 46 | +of a CUDA device, using the :meth:`Context.buffer_from_data` method. |
| 47 | +The source data can be any Python buffer-like object, including Arrow buffers:: |
| 48 | + |
| 49 | + >>> import numpy as np |
| 50 | + >>> arr = np.arange(4, dtype=np.int32) |
| 51 | + >>> arr.nbytes |
| 52 | + 16 |
| 53 | + >>> cuda_buf = ctx.buffer_from_data(arr) |
| 54 | + >>> type(cuda_buf) |
| 55 | + pyarrow._cuda.CudaBuffer |
| 56 | + >>> cuda_buf.size # The buffer's size in bytes |
| 57 | + 16 |
| 58 | + >>> cuda_buf.address # The buffer's address in device memory |
| 59 | + 30088364544 |
| 60 | + >>> cuda_buf.context.device_number |
| 61 | + 0 |
| 62 | + |
| 63 | +Conversely, you can copy back a CUDA buffer to device memory, getting a regular |
| 64 | +CPU buffer:: |
| 65 | + |
| 66 | + >>> buf = cuda_buf.copy_to_host() |
| 67 | + >>> type(buf) |
| 68 | + pyarrow.lib.Buffer |
| 69 | + >>> np.frombuffer(buf, dtype=np.int32) |
| 70 | + array([0, 1, 2, 3], dtype=int32) |
| 71 | + |
| 72 | +.. warning:: |
| 73 | + Many Arrow functions expect a CPU buffer but will not check the buffer's |
| 74 | + actual type. You will get a crash if you pass a CUDA buffer to such a |
| 75 | + function:: |
| 76 | + |
| 77 | + >>> pa.py_buffer(b"x" * 16).equals(cuda_buf) |
| 78 | + Segmentation fault |
| 79 | + |
| 80 | +Numba Integration |
| 81 | +----------------- |
| 82 | + |
| 83 | +There is not much you can do directly with Arrow CUDA buffers from Python, |
| 84 | +but they support interoperation with `Numba <https://numba.pydata.org/>`_, |
| 85 | +a JIT compiler which can turn Python code into optimized CUDA kernels. |
| 86 | + |
| 87 | +Arrow to Numba |
| 88 | +~~~~~~~~~~~~~~ |
| 89 | + |
| 90 | +First let's define a Numba CUDA kernel operating on an ``int32`` array. Here, |
| 91 | +we will simply increment each array element (assuming the array is writable):: |
| 92 | + |
| 93 | + import numba.cuda |
| 94 | + |
| 95 | + @numba.cuda.jit |
| 96 | + def increment_by_one(an_array): |
| 97 | + pos = numba.cuda.grid(1) |
| 98 | + if pos < an_array.size: |
| 99 | + an_array[pos] += 1 |
| 100 | + |
| 101 | +Then we need to wrap our CUDA buffer into a Numba "device array" with the right |
| 102 | +array metadata (shape, strides and datatype). This is necessary so that Numba |
| 103 | +can identify the array's characteristics and compile the kernel with the |
| 104 | +appropriate type declarations. |
| 105 | + |
| 106 | +In this case the metadata can simply be got from the original Numpy array. |
| 107 | +Note the GPU data isn't copied, just pointed to:: |
| 108 | + |
| 109 | + >>> from numba.cuda.cudadrv.devicearray import DeviceNDArray |
| 110 | + >>> device_arr = DeviceNDArray(arr.shape, arr.strides, arr.dtype, gpu_data=cuda_buf.to_numba()) |
| 111 | + |
| 112 | +(ideally we could have defined an Arrow array in CPU memory, copied it to CUDA |
| 113 | +memory without losing type information, and then invoked the Numba kernel on it |
| 114 | +without constructing the DeviceNDArray by hand; this is not yet possible) |
| 115 | + |
| 116 | +Finally we can run the Numba CUDA kernel on the Numba device array (here |
| 117 | +with a 16x16 grid size):: |
| 118 | + |
| 119 | + >>> increment_by_one[16, 16](device_arr) |
| 120 | + |
| 121 | +And the results can be checked by copying back the CUDA buffer to CPU memory:: |
| 122 | + |
| 123 | + >>> np.frombuffer(cuda_buf.copy_to_host(), dtype=np.int32) |
| 124 | + array([1, 2, 3, 4], dtype=int32) |
| 125 | + |
| 126 | +Numba to Arrow |
| 127 | +~~~~~~~~~~~~~~ |
| 128 | + |
| 129 | +Conversely, a Numba-created device array can be viewed as an Arrow CUDA buffer, |
| 130 | +using the :meth:`CudaBuffer.from_numba` factory method. |
| 131 | + |
| 132 | +For the sake of example, let's first create a Numba device array:: |
| 133 | + |
| 134 | + >>> arr = np.arange(10, 14, dtype=np.int32) |
| 135 | + >>> arr |
| 136 | + array([10, 11, 12, 13], dtype=int32) |
| 137 | + >>> device_arr = numba.cuda.to_device(arr) |
| 138 | + |
| 139 | +Then we can create a CUDA buffer pointing the device array's memory. |
| 140 | +We don't need to pass a CUDA context explicitly this time: the appropriate |
| 141 | +CUDA context is automatically retrieved and adapted from the Numba object. |
| 142 | + |
| 143 | +:: |
| 144 | + |
| 145 | + >>> cuda_buf = cuda.CudaBuffer.from_numba(device_arr.gpu_data) |
| 146 | + >>> cuda_buf.size |
| 147 | + 16 |
| 148 | + >>> cuda_buf.address |
| 149 | + 30088364032 |
| 150 | + >>> cuda_buf.context.device_number |
| 151 | + 0 |
| 152 | + |
| 153 | +Of course, we can copy the CUDA buffer back to host memory:: |
| 154 | + |
| 155 | + >>> np.frombuffer(cuda_buf.copy_to_host(), dtype=np.int32) |
| 156 | + array([10, 11, 12, 13], dtype=int32) |
| 157 | + |
| 158 | +.. seealso:: |
| 159 | + Documentation for Numba's `CUDA support <https://numba.pydata.org/numba-doc/latest/cuda/index.html>`_. |
0 commit comments