@@ -129,6 +129,49 @@ Algorithms
129129 Rprop
130130 SGD
131131
132+ Many of our algorithms have various implementations optimized for performance,
133+ readability and/or generality, so we attempt to default to the generally fastest
134+ implementation for the current device if no particular implementation has been
135+ specified by the user.
136+
137+ We have 3 major categories of implementations: for-loop, foreach (multi-tensor), and
138+ fused. The most straightforward implementations are for-loops over the parameters with
139+ big chunks of computation. For-looping is usually slower than our foreach
140+ implementations, which combine parameters into a multi-tensor and run the big chunks
141+ of computation all at once, thereby saving many sequential kernel calls. A few of our
142+ optimizers have even faster fused implementations, which fuse the big chunks of
143+ computation into one kernel. We can think of foreach implementations as fusing
144+ horizontally and fused implementations as fusing vertically on top of that.
145+
146+ In general, the performance ordering of the 3 implementations is fused > foreach > for-loop.
147+ So when applicable, we default to foreach over for-loop. Applicable means the foreach
148+ implementation is available, the user has not specified any implementation-specific kwargs
149+ (e.g., fused, foreach, differentiable), and all tensors are native and on CUDA. Note that
150+ while fused should be even faster than foreach, the implementations are newer and we would
151+ like to give them more bake-in time before flipping the switch everywhere. You are welcome
152+ to try them out though!
153+
154+ Below is a table showing the available and default implementations of each algorithm:
155+
156+ .. csv-table ::
157+ :header: "Algorithm", "Default", "Has foreach?", "Has fused?"
158+ :widths: 25, 25, 25, 25
159+ :delim: ;
160+
161+ :class: `Adadelta `;foreach;yes;no
162+ :class: `Adagrad `;foreach;yes;no
163+ :class: `Adam `;foreach;yes;yes
164+ :class: `AdamW `;foreach;yes;yes
165+ :class: `SparseAdam `;for-loop;no;no
166+ :class: `Adamax `;foreach;yes;no
167+ :class: `ASGD `;foreach;yes;no
168+ :class: `LBFGS `;for-loop;no;no
169+ :class: `NAdam `;foreach;yes;no
170+ :class: `RAdam `;foreach;yes;no
171+ :class: `RMSprop `;foreach;yes;no
172+ :class: `Rprop `;foreach;yes;no
173+ :class: `SGD `;foreach;yes;no
174+
132175How to adjust learning rate
133176---------------------------
134177
0 commit comments