-
Notifications
You must be signed in to change notification settings - Fork 59
Expand file tree
/
Copy pathsimd.html
More file actions
421 lines (362 loc) · 15.8 KB
/
simd.html
File metadata and controls
421 lines (362 loc) · 15.8 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta charset="utf-8">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta name="generator" content="pandoc" />
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>Parallel Programming with Boost.SIMD</title>
<script src="libs/jquery-1.11.3/jquery.min.js"></script>
<script src="libs/jqueryui-1.11.4/jquery-ui.min.js"></script>
<link href="libs/tocify-1.9.1/jquery.tocify.css" rel="stylesheet" />
<script src="libs/tocify-1.9.1/jquery.tocify.js"></script>
<meta name="viewport" content="width=device-width, initial-scale=1" />
<link href="libs/bootstrap-3.3.5/css/yeti.min.css" rel="stylesheet" />
<script src="libs/bootstrap-3.3.5/js/bootstrap.min.js"></script>
<script src="libs/bootstrap-3.3.5/shim/html5shiv.min.js"></script>
<script src="libs/bootstrap-3.3.5/shim/respond.min.js"></script>
<style type="text/css">code{white-space: pre;}</style>
<link rel="stylesheet"
href="libs/highlight/textmate.css"
type="text/css" />
<script src="libs/highlight/highlight.js"></script>
<style type="text/css">
pre:not([class]) {
background-color: white;
}
</style>
<script type="text/javascript">
if (window.hljs && document.readyState && document.readyState === "complete") {
window.setTimeout(function() {
hljs.initHighlighting();
}, 0);
}
</script>
<link rel="stylesheet" href="styles.css" type="text/css" />
</head>
<body>
<style type = "text/css">
.main-container {
max-width: 940px;
margin-left: auto;
margin-right: auto;
}
code {
color: inherit;
background-color: rgba(0, 0, 0, 0.04);
}
img {
max-width:100%;
height: auto;
}
h1 {
font-size: 34px;
}
h1.title {
font-size: 38px;
}
h2 {
font-size: 30px;
}
h3 {
font-size: 24px;
}
h4 {
font-size: 18px;
}
h5 {
font-size: 16px;
}
h6 {
font-size: 12px;
}
</style>
<div class="container-fluid main-container">
<script>
$(function() {
// establish options
var options = {
selectors: "h1,h2,h3",
theme: "bootstrap3",
context: '.toc-content',
hashGenerator: function (text) {
return text.replace(/[.\/?&!#<>]/g, '').replace(/\s/g, '_').toLowerCase();
},
ignoreSelector: "h1.title",
scrollTo: 0
};
options.showAndHide = false;
options.smoothScroll = true;
// tocify
var toc = $("#TOC").tocify(options).data("toc-tocify");
});
</script>
<style type="text/css">
#TOC {
margin: 25px 0px 20px 0px;
}
@media (max-width: 768px) {
#TOC {
position: relative;
width: 100%;
}
}
.toc-content {
padding-left: 30px;
padding-right: 40px;
}
div.main-container {
max-width: 1200px;
}
div.tocify {
width: 20%;
max-width: 260px;
max-height: 85%;
}
@media (min-width: 768px) and (max-width: 991px) {
div.tocify {
width: 25%;
}
}
.tocify ul, .tocify li {
line-height: 20px;
}
.tocify-subheader .tocify-item {
font-size: 0.9em;
padding-left: 5px;
}
.tocify .list-group-item {
border-radius: 0px;
}
.tocify-subheader {
display: inline;
}
.tocify-subheader .tocify-item {
font-size: 0.95em;
padding-left: 10px;
}
</style>
<!-- setup 3col/9col grid for toc_float and main content -->
<div class="row-fluid">
<div class="col-sm-4 col-md-3">
<div id="TOC" class="tocify">
</div>
</div>
<div class="toc-content col-sm-8 col-md-9">
<div class="navbar navbar-default navbar-inverse navbar-fixed-top" role="navigation">
<div class="container">
<div class="navbar-header">
<button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#navbar">
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
<a class="navbar-brand" href="/">Rcpp Parallel</a>
</div>
<div id="navbar" class="navbar-collapse collapse">
<ul class="nav navbar-nav">
<li><a href="/">Home</a></li>
<li><a href="/tbb.html">Intel TBB</a></li>
<li><a href="/simd.html">Boost.SIMD</a></li>
</ul>
<ul class="nav navbar-nav navbar-right">
<li><a href="https://github.com/RcppCore/RcppParallel">GitHub</a></li>
</ul>
</div><!--/.nav-collapse -->
</div><!--/.container -->
</div><!--/.navbar -->
<script>
// manage active state of menu based on current page
$(document).ready(function () {
// active menu
href = window.location.pathname
href = href.substr(href.lastIndexOf('/'))
$('a[href="' + href + '"]').parent().addClass('active');
});
</script>
<div id="header">
<h1 class="title">Parallel Programming with Boost.SIMD</h1>
</div>
<p><strong>IMPORTANT NOTE</strong>: Support for Boost.SIMD is currently only available in the development version of RcppParallel. You can install the development version as follows:</p>
<pre class="r"><code>devtools::install_github("RcppCore/RcppParallel")</code></pre>
<div id="introduction" class="section level2">
<h2>Introduction</h2>
<p>Modern CPU processors are built with new, extended instruction sets that optimize for certain operations. A class of these allow for vectorized operations, called Single Instruction / Multiple Data (SIMD) instructions. Although modern compilers will use these instructions when possible, they are often unable to reason about whether or not a particular block of code can be executed using SIMD instructions.</p>
<p><code>Boost.SIMD</code> [<a href="https://meetingcpp.com/tl_files/mcpp/slides/12/simd.pdf">PDF</a>] is a C++ header-only library that makes it possible to explicitly request the use of SIMD instructions when possible, while falling back to regular scalar operations when not. <a href="http://rcppcore.github.io/RcppParallel/"><code>RcppParallel</code></a> wraps and exposes this library for use with <code>R</code> vectors.</p>
<p>The primary abstraction that <code>Boost.SIMD</code> uses under the hood is the <code>boost::simd::pack<></code> data structure. This item represents a small, contiguous, pack of integral objects (e.g. <code>double</code>s), and comes with a host of functions that facilitate the use of SIMD operations on those objects when possible. Although you don’t need to know the details to use the high-level functionality provided by <code>Boost.SIMD</code>, it’s useful for understanding what happens behind the scenes.</p>
<p>Here’s a quick example of how we might compute the sum of elements in a vector, using <code>Boost.SIMD</code>.</p>
<pre class="cpp"><code>// [[Rcpp::depends(RcppParallel)]]
#define RCPP_PARALLEL_USE_SIMD
#include <RcppParallel.h>
using namespace RcppParallel;
#include <Rcpp.h>
using namespace Rcpp;
// Define a functor -- a C++ class which defines a templated
// 'function call' operator -- to perform the addition of
// two pieces of data.
struct add_two {
template <typename T>
T operator()(const T& lhs, const T& rhs) {
return lhs + rhs;
}
};
// [[Rcpp::export]]
double simd_sum(NumericVector x) {
// Pass the functor to 'simdReduce()'.
return simdReduce(x.begin(), x.end(), 0.0, add_two());
}</code></pre>
<p>Behind the scenes, <code>simdReduce()</code> takes care of iteration over our sequence, and ensures that we use optimized SIMD instructions over packs of numbers when possible, and scalar instructions when not. By passing a templated functor, <code>simdReduce()</code> can automagically choose the correct template specialization depending on whether it’s working with a pack or not. In other words, two template specializations will be generated in this case: one with <code>T = double</code>, and another with <code>T = boost::simd::pack<double></code>.</p>
<p>Let’s confirm that this produces the correct output, and run a small benchmark.</p>
<pre class="r"><code># helper function for printing microbenchmark output
printBm <- function(bm) {
summary <- summary(bm)
print(summary[, 1:7], row.names = FALSE)
}
# generate some data
data <- rnorm(1024 * 1000)
# verify that it produces the correct sum
all.equal(simd_sum(data), sum(data))</code></pre>
<pre><code>## [1] TRUE</code></pre>
<pre class="r"><code># compare results
library(microbenchmark)
bm <- microbenchmark(sum(data), simd_sum(data))
printBm(bm)</code></pre>
<pre><code>## expr min lq mean median uq max
## sum(data) 824.013 836.9370 880.5446 870.0565 909.2475 1300.552
## simd_sum(data) 416.062 421.2825 456.2859 432.6560 481.2070 595.670</code></pre>
<p>We get a noticable gain by taking advantage of SIMD instructions here, although it’s worth noting that we don’t handle <code>NA</code> and <code>NaN</code> with the same granularity as <code>R</code>.</p>
</div>
<div id="simd-algorithms" class="section level2">
<h2>SIMD Algorithms</h2>
<div id="built-in-algorithms" class="section level3">
<h3>Built-In Algorithms</h3>
<p><code>Boost.SIMD</code> provides two primary abstractions for the implementation of SIMD algorithms:</p>
<table>
<thead>
<tr class="header">
<th align="left">Algorithm</th>
<th align="left">Transformation</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td align="left"><code>boost::simd::transform()</code></td>
<td align="left"><code>vector</code> -> <code>vector</code></td>
</tr>
<tr class="even">
<td align="left"><code>boost::simd::accumulate()</code></td>
<td align="left"><code>vector</code> -> <code>scalar</code></td>
</tr>
</tbody>
</table>
<p>These functions operate like their <code>std::</code> counterparts, but expect a functor with a templated call operator. By making the call operator templated, <code>Boost.SIMD</code> can generate code using its own optimized SIMD functions when appropriate, and fall back to a default implementation (based on the types provided) when not.</p>
<p><code>RcppParallel</code> augments this with its own algorithms as well, for consistency with <code>parallelFor()</code> and <code>parallelReduce()</code>:</p>
<table>
<thead>
<tr class="header">
<th align="left">Algorithm</th>
<th align="left">Transformation</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td align="left"><code>RcppParallel::simdTransform()</code></td>
<td align="left"><code>vector</code> -> <code>vector</code></td>
</tr>
<tr class="even">
<td align="left"><code>RcppParallel::simdReduce()</code></td>
<td align="left"><code>vector</code> -> <code>scalar</code></td>
</tr>
<tr class="odd">
<td align="left"><code>RcppParallel::simdFor()</code></td>
<td align="left"><code>vector</code> -> <code>any</code></td>
</tr>
</tbody>
</table>
<p><code>simdFor()</code> is useful in particular when neither <code>transform()</code> nor <code>accumulate()</code> seem to be a good fit.</p>
</div>
<div id="custom-algorithms" class="section level3">
<h3>Custom Algorithms</h3>
<p>To take advantage of <code>Boost.SIMD</code>, you should try to perform the following steps:</p>
<ol style="list-style-type: decimal">
<li>Decompose your problem into separate, vectorizable pieces,</li>
<li>Select an appropriate algorithm provided by <code>RcppParallel</code>,</li>
<li>Write templated functors in a <code>Boost.SIMD</code>-aware way.</li>
</ol>
<p><code>Boost.SIMD</code> provides a large number of functions that have optimized specializations for packed data structures, while falling back to regular operations for scalar data structures. These functions can typically be accessed within the <code>boost::simd</code> namespace. To illustrate, here’s an example of a functor that computes the square for a set of data, using the <code>boost::simd::sqr()</code> function:</p>
<pre class="cpp"><code>class simd_square {
template <typename T>
void operator()(const T& data) {
return boost::simd::sqr(data);
}
};</code></pre>
<p>A reference guide for other functions provided is available <a href="http://nt2.metascale.fr/doc/html/boost_simd_functions_and_operators/reference.html">here</a>.</p>
</div>
</div>
<div id="using-simd-in-an-r-package" class="section level2">
<h2>Using SIMD in an R Package</h2>
<p><strong>IMPORTANT NOTE</strong>: Support for Boost.SIMD is currently only available in the development version of RcppParallel. Therefore, packages using Boost.SIMD should not yet be submitted to CRAN.</p>
<div id="package-configuration" class="section level3">
<h3>Package Configuration</h3>
<p>To build an R package that uses <code>Boost.SIMD</code>, you need to make some modifications to the standard <code>RcppParallel</code> configuration. Within the <code>DESCRIPTION</code> file of your package, you need to:</p>
<ol style="list-style-type: decimal">
<li>Add the <a href="https://cran.r-project.org/package=BH"><strong>BH</strong></a> package as a <code>LinkingTo</code> dependency, and</li>
<li>Add <code>C++11</code> as a <code>SystemRequirement</code>.</li>
</ol>
<p>For example:</p>
<pre class="yaml"><code>Imports: RcppParallel
LinkingTo: RcppParallel, BH
SystemRequirements: GNU make, C++11</code></pre>
</div>
<div id="platform-compatibility" class="section level3">
<h3>Platform Compatibility</h3>
<p><code>Boost.SIMD</code> requires a C++11 conformant compiler. This means that packages making use of SIMD features may not compile on platforms with older compilers, including Windows and RedHat/CentOS Linux. You can however create a package that takes advantage of <code>Boost.SIMD</code> where available and falls back to a non-SIMD implementation otherwise.</p>
<p>You can opt-in to the use of <code>Boost.SIMD</code> by defining the <code>RCPP_PARALLEL_USE_SIMD</code> macro before including <code><RcppParallel.h></code>, e.g.</p>
<pre><code>#define RCPP_PARALLEL_USE_SIMD
#include <RcppParallel.h></code></pre>
<p>You can test for the availability of <code>Boost.SIMD</code> on a given platform using the <code>RCPP_PARALLEL_USE_SIMD</code> preprocessor variable. If the current compiler doesn’t support C++11 (as determined by <code>__cplusplus <= 199711L</code>) the variable will be undefined (even if you defined it explicitly). This allows you to write code like this:</p>
<pre class="cpp"><code>#define RCPP_PARALLEL_USE_SIMD
#include <RcppParallel.h>
#if RCPP_PARALLEL_USE_SIMD
IntegerVector transformDataImpl(IntegerVector x) {
// Implement with Boost.SIMD
}
#else
IntegerVector transformDataImpl(IntegerVector x) {
// Implement without Boost.SIMD
}
#endif
// [[Rcpp::export]]
IntegerVector transformData(IntegerVector x) {
return transformDataImpl(x);
}</code></pre>
<p>The two <code>transformDataImpl</code> functions have the same name, but only one will be compiled and linked based on whether the target platform supports <code>Boost.SIMD</code>.</p>
<p>Note that if you conditionally compile all uses of <code>Boost.SIMD</code> within your package, then you can drop the <code>C++11</code> from <code>SystemRequirements</code> (it’s no longer required as a result of your fallback implementation).</p>
</div>
</div>
<div id="learning-more" class="section level2">
<h2>Learning More</h2>
<p>If you want to dive deeper into <code>Boost.SIMD</code>, you can <a href="http://nt2.metascale.fr/doc/html/boost_simd.html">read the online documentation</a>, and also browse the examples <a href="https://github.com/RcppCore/RcppParallel/tree/master/inst/examples/boost-simd">here</a>.</p>
<hr />
<p>If you want to try out <code>Boost.SIMD</code> yourself, please install the development version of <a href="http://rcppcore.github.io/RcppParallel/"><code>RcppParallel</code></a> with <code>devtools::install_github("RcppCore/RcppParallel")</code>.</p>
</div>
</div>
</div>
</div>
<script>
// add bootstrap table styles to pandoc tables
$(document).ready(function () {
$('tr.header').parent('thead').parent('table').addClass('table table-condensed');
});
</script>
<!-- dynamically load mathjax for compatibility with self-contained -->
<script>
(function () {
var script = document.createElement("script");
script.type = "text/javascript";
script.src = "https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML";
document.getElementsByTagName("head")[0].appendChild(script);
})();
</script>
</body>
</html>