Skip to content

Commit bcfacaa

Browse files
committed
ARROW-3233: [Python] Add prose documentation for CUDA support
It will be harder to add generated API docs without requiring CUDA support on the machine building the docs. Author: Antoine Pitrou <antoine@python.org> Closes apache#3359 from pitrou/ARROW-3233-pyarrow-cuda-doc and squashes the following commits: 40b63f0 <Antoine Pitrou> ARROW-3233: Add prose documentation for CUDA support
1 parent 361285d commit bcfacaa

3 files changed

Lines changed: 163 additions & 0 deletions

File tree

docs/source/python/cuda.rst

Lines changed: 159 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,159 @@
1+
.. Licensed to the Apache Software Foundation (ASF) under one
2+
.. or more contributor license agreements. See the NOTICE file
3+
.. distributed with this work for additional information
4+
.. regarding copyright ownership. The ASF licenses this file
5+
.. to you under the Apache License, Version 2.0 (the
6+
.. "License"); you may not use this file except in compliance
7+
.. with the License. You may obtain a copy of the License at
8+
9+
.. http://www.apache.org/licenses/LICENSE-2.0
10+
11+
.. Unless required by applicable law or agreed to in writing,
12+
.. software distributed under the License is distributed on an
13+
.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
14+
.. KIND, either express or implied. See the License for the
15+
.. specific language governing permissions and limitations
16+
.. under the License.
17+
18+
.. currentmodule:: pyarrow.cuda
19+
20+
CUDA Integration
21+
================
22+
23+
Arrow is not limited to CPU buffers (located in the computer's main memory,
24+
also named "host memory"). It also has provisions for accessing buffers
25+
located on a CUDA-capable GPU device (in "device memory").
26+
27+
.. note::
28+
This functionality is optional and must have been enabled at build time.
29+
If this is not done by your package manager, you might have to build Arrow
30+
yourself.
31+
32+
CUDA Contexts
33+
-------------
34+
35+
A CUDA context represents access to a particular CUDA-capable device.
36+
For example, this is creating a CUDA context accessing CUDA device number 0::
37+
38+
>>> from pyarrow import cuda
39+
>>> ctx = cuda.Context(0)
40+
>>>
41+
42+
CUDA Buffers
43+
------------
44+
45+
A CUDA buffer can be created by copying data from host memory to the memory
46+
of a CUDA device, using the :meth:`Context.buffer_from_data` method.
47+
The source data can be any Python buffer-like object, including Arrow buffers::
48+
49+
>>> import numpy as np
50+
>>> arr = np.arange(4, dtype=np.int32)
51+
>>> arr.nbytes
52+
16
53+
>>> cuda_buf = ctx.buffer_from_data(arr)
54+
>>> type(cuda_buf)
55+
pyarrow._cuda.CudaBuffer
56+
>>> cuda_buf.size # The buffer's size in bytes
57+
16
58+
>>> cuda_buf.address # The buffer's address in device memory
59+
30088364544
60+
>>> cuda_buf.context.device_number
61+
0
62+
63+
Conversely, you can copy back a CUDA buffer to device memory, getting a regular
64+
CPU buffer::
65+
66+
>>> buf = cuda_buf.copy_to_host()
67+
>>> type(buf)
68+
pyarrow.lib.Buffer
69+
>>> np.frombuffer(buf, dtype=np.int32)
70+
array([0, 1, 2, 3], dtype=int32)
71+
72+
.. warning::
73+
Many Arrow functions expect a CPU buffer but will not check the buffer's
74+
actual type. You will get a crash if you pass a CUDA buffer to such a
75+
function::
76+
77+
>>> pa.py_buffer(b"x" * 16).equals(cuda_buf)
78+
Segmentation fault
79+
80+
Numba Integration
81+
-----------------
82+
83+
There is not much you can do directly with Arrow CUDA buffers from Python,
84+
but they support interoperation with `Numba <https://numba.pydata.org/>`_,
85+
a JIT compiler which can turn Python code into optimized CUDA kernels.
86+
87+
Arrow to Numba
88+
~~~~~~~~~~~~~~
89+
90+
First let's define a Numba CUDA kernel operating on an ``int32`` array. Here,
91+
we will simply increment each array element (assuming the array is writable)::
92+
93+
import numba.cuda
94+
95+
@numba.cuda.jit
96+
def increment_by_one(an_array):
97+
pos = numba.cuda.grid(1)
98+
if pos < an_array.size:
99+
an_array[pos] += 1
100+
101+
Then we need to wrap our CUDA buffer into a Numba "device array" with the right
102+
array metadata (shape, strides and datatype). This is necessary so that Numba
103+
can identify the array's characteristics and compile the kernel with the
104+
appropriate type declarations.
105+
106+
In this case the metadata can simply be got from the original Numpy array.
107+
Note the GPU data isn't copied, just pointed to::
108+
109+
>>> from numba.cuda.cudadrv.devicearray import DeviceNDArray
110+
>>> device_arr = DeviceNDArray(arr.shape, arr.strides, arr.dtype, gpu_data=cuda_buf.to_numba())
111+
112+
(ideally we could have defined an Arrow array in CPU memory, copied it to CUDA
113+
memory without losing type information, and then invoked the Numba kernel on it
114+
without constructing the DeviceNDArray by hand; this is not yet possible)
115+
116+
Finally we can run the Numba CUDA kernel on the Numba device array (here
117+
with a 16x16 grid size)::
118+
119+
>>> increment_by_one[16, 16](device_arr)
120+
121+
And the results can be checked by copying back the CUDA buffer to CPU memory::
122+
123+
>>> np.frombuffer(cuda_buf.copy_to_host(), dtype=np.int32)
124+
array([1, 2, 3, 4], dtype=int32)
125+
126+
Numba to Arrow
127+
~~~~~~~~~~~~~~
128+
129+
Conversely, a Numba-created device array can be viewed as an Arrow CUDA buffer,
130+
using the :meth:`CudaBuffer.from_numba` factory method.
131+
132+
For the sake of example, let's first create a Numba device array::
133+
134+
>>> arr = np.arange(10, 14, dtype=np.int32)
135+
>>> arr
136+
array([10, 11, 12, 13], dtype=int32)
137+
>>> device_arr = numba.cuda.to_device(arr)
138+
139+
Then we can create a CUDA buffer pointing the device array's memory.
140+
We don't need to pass a CUDA context explicitly this time: the appropriate
141+
CUDA context is automatically retrieved and adapted from the Numba object.
142+
143+
::
144+
145+
>>> cuda_buf = cuda.CudaBuffer.from_numba(device_arr.gpu_data)
146+
>>> cuda_buf.size
147+
16
148+
>>> cuda_buf.address
149+
30088364032
150+
>>> cuda_buf.context.device_number
151+
0
152+
153+
Of course, we can copy the CUDA buffer back to host memory::
154+
155+
>>> np.frombuffer(cuda_buf.copy_to_host(), dtype=np.int32)
156+
array([10, 11, 12, 13], dtype=int32)
157+
158+
.. seealso::
159+
Documentation for Numba's `CUDA support <https://numba.pydata.org/numba-doc/latest/cuda/index.html>`_.

docs/source/python/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,7 @@ files into Arrow structures.
4343
pandas
4444
csv
4545
parquet
46+
cuda
4647
extending
4748
api
4849
development

docs/source/python/memory.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -109,6 +109,9 @@ the buffer is garbaged-collected, all of the memory is freed:
109109
buf = None
110110
pa.total_allocated_bytes()
111111
112+
.. seealso::
113+
On-GPU buffers using Arrow's optional :doc:`CUDA integration <cuda>`.
114+
112115

113116
Input and Output
114117
================

0 commit comments

Comments
 (0)