@@ -133,100 +133,135 @@ declare module 'vscode' {
133133 */
134134 export interface SemanticTokensProvider {
135135 /**
136- * A file can contain many tokens, perhaps even hundreds of thousands tokens. Therefore, to improve
137- * the memory consumption around describing semantic tokens, we have decided to avoid allocating objects
138- * and we have decided to represent tokens from a file as an array of integers.
136+ * A file can contain many tokens, perhaps even hundreds of thousands of tokens. Therefore, to improve
137+ * the memory consumption around describing semantic tokens, we have decided to avoid allocating an object
138+ * for each token and we represent tokens from a file as an array of integers. Furthermore, the position
139+ * of each token is expressed relative to the token before it because most tokens remain stable relative to
140+ * each other when edits are made in a file.
139141 *
140142 *
141- * In short, each token takes 5 integers to represent, so a specific token i in the file consists of the following fields:
143+ * ---
144+ * In short, each token takes 5 integers to represent, so a specific token `i` in the file consists of the following fields:
142145 * - at index `5*i` - `deltaLine`: token line number, relative to the previous token
143146 * - at index `5*i+1` - `deltaStart`: token start character, relative to the previous token (relative to 0 or the previous token's start if they are on the same line)
144147 * - at index `5*i+2` - `length`: the length of the token. A token cannot be multiline.
145148 * - at index `5*i+3` - `tokenType`: will be looked up in `SemanticTokensLegend.tokenTypes`
146149 * - at index `5*i+4` - `tokenModifiers`: each set bit will be looked up in `SemanticTokensLegend.tokenModifiers`
147150 *
148151 *
152+ *
153+ * ---
154+ * ### How to encode tokens
155+ *
149156 * Here is an example for encoding a file with 3 tokens:
150157 * ```
151- * [ { line: 2, startChar: 5, length: 3, tokenType: "properties", tokenModifiers: ["private", "static"] },
158+ * { line: 2, startChar: 5, length: 3, tokenType: "properties", tokenModifiers: ["private", "static"] },
152159 * { line: 2, startChar: 10, length: 4, tokenType: "types", tokenModifiers: [] },
153- * { line: 5, startChar: 2, length: 7, tokenType: "classes", tokenModifiers: [] } ]
160+ * { line: 5, startChar: 2, length: 7, tokenType: "classes", tokenModifiers: [] }
154161 * ```
155162 *
156163 * 1. First of all, a legend must be devised. This legend must be provided up-front and capture all possible token types.
157- * For this example, we will choose the following legend which is passed in when registering the provider:
164+ * For this example, we will choose the following legend which must be passed in when registering the provider:
158165 * ```
159- * { tokenTypes: ['', 'properties', 'types', 'classes'],
160- * tokenModifiers: ['', ' private', 'static'] }
166+ * tokenTypes: ['properties', 'types', 'classes'],
167+ * tokenModifiers: ['private', 'static']
161168 * ```
162169 *
163- * 2. The first transformation is to encode `tokenType` and `tokenModifiers` as integers using the legend. Token types are looked
170+ * 2. The first transformation step is to encode `tokenType` and `tokenModifiers` as integers using the legend. Token types are looked
164171 * up by index, so a `tokenType` value of `1` means `tokenTypes[1]`. Multiple token modifiers can be set by using bit flags,
165- * so a `tokenModifier` value of `6 ` is first viewed as binary `0b110 `, which means `[tokenModifiers[1 ], tokenModifiers[2 ]]` because
166- * bits 1 and 2 are set. Using this legend, the tokens now are:
172+ * so a `tokenModifier` value of `3 ` is first viewed as binary `0b00000011 `, which means `[tokenModifiers[0 ], tokenModifiers[1 ]]` because
173+ * bits 0 and 1 are set. Using this legend, the tokens now are:
167174 * ```
168- * [ { line: 2, startChar: 5, length: 3, tokenType: 1 , tokenModifiers: 6 }, // 6 is 0b110
169- * { line: 2, startChar: 10, length: 4, tokenType: 2 , tokenModifiers: 0 },
170- * { line: 5, startChar: 2, length: 7, tokenType: 3 , tokenModifiers: 0 } ]
175+ * { line: 2, startChar: 5, length: 3, tokenType: 0 , tokenModifiers: 3 },
176+ * { line: 2, startChar: 10, length: 4, tokenType: 1 , tokenModifiers: 0 },
177+ * { line: 5, startChar: 2, length: 7, tokenType: 2 , tokenModifiers: 0 }
171178 * ```
172179 *
173- * 3. Then, we will encode each token relative to the previous token in the file:
180+ * 3. The next steps is to encode each token relative to the previous token in the file. In this case, the second token
181+ * is on the same line as the first token, so the `startChar` of the second token is made relative to the `startChar`
182+ * of the first token, so it will be `10 - 5`. The third token is on a different line than the second token, so the
183+ * `startChar` of the third token will not be altered:
174184 * ```
175- * [ { deltaLine: 2, deltaStartChar: 5, length: 3, tokenType: 1, tokenModifiers: 6 },
176- * // this token is on the same line as the first one, so the startChar is made relative
177- * { deltaLine: 0, deltaStartChar: 5, length: 4, tokenType: 2, tokenModifiers: 0 },
178- * // this token is on a different line than the second one, so the startChar remains unchanged
179- * { deltaLine: 3, deltaStartChar: 2, length: 7, tokenType: 3, tokenModifiers: 0 } ]
185+ * { deltaLine: 2, deltaStartChar: 5, length: 3, tokenType: 0, tokenModifiers: 3 },
186+ * { deltaLine: 0, deltaStartChar: 5, length: 4, tokenType: 1, tokenModifiers: 0 },
187+ * { deltaLine: 3, deltaStartChar: 2, length: 7, tokenType: 2, tokenModifiers: 0 }
180188 * ```
181189 *
182- * 4. Finally, the integers are organized in a single array, which is a memory friendly representation:
190+ * 4. Finally, the last step is to inline each of the 5 fields for a token in a single array, which is a memory friendly representation:
183191 * ```
184192 * // 1st token, 2nd token, 3rd token
185- * [ 2,5,3,1,6 , 0,5,4,2 ,0, 3,2,7,3 ,0 ]
193+ * [ 2,5,3,0,3 , 0,5,4,1 ,0, 3,2,7,2 ,0 ]
186194 * ```
187195 *
188- * In principle, each call to `provideSemanticTokens` expects a complete representations of the semantic tokens.
189- * It is possible to simply return all the tokens at each call.
190196 *
191- * But oftentimes, a small edit in the file will result in a small change to the above delta-based represented tokens.
192- * (In fact, that is why the above tokens are delta-encoded relative to their corresponding previous tokens).
193- * In such a case, if VS Code passes in the previous result id, it is possible for an advanced tokenization provider
194- * to return a delta to the integers array.
195197 *
196- * To continue with the previous example, suppose a new line has been pressed at the beginning of the file, such that
197- * all the tokens are now one line lower, and that a new token has appeared since the last result on line 4.
198- * For example, the tokens might look like:
198+ * ---
199+ * ### How tokens change when the document changes
200+ *
201+ * Let's look at how tokens might change.
202+ *
203+ * Continuing with the above example, suppose a new line was inserted at the top of the file.
204+ * That would make all the tokens move down by one line (notice how the line has changed for each one):
199205 * ```
200- * [ { line: 3, startChar: 5, length: 3, tokenType: "properties", tokenModifiers: ["private", "static"] },
206+ * { line: 3, startChar: 5, length: 3, tokenType: "properties", tokenModifiers: ["private", "static"] },
201207 * { line: 3, startChar: 10, length: 4, tokenType: "types", tokenModifiers: [] },
202- * { line: 4, startChar: 3, length: 5, tokenType: "properties", tokenModifiers: ["static"] },
203- * { line: 6, startChar: 2, length: 7, tokenType: "classes", tokenModifiers: [] } ]
208+ * { line: 6, startChar: 2, length: 7, tokenType: "classes", tokenModifiers: [] }
204209 * ```
205- *
206- * The integer encoding of all new tokens would be:
210+ * The integer encoding of the tokens does not change substantially because of the delta-encoding of positions:
207211 * ```
208- * [ 3,5,3,1,6, 0,5,4,2,0, 1,3,5,1,2, 2,2,7,3,0 ]
212+ * // 1st token, 2nd token, 3rd token
213+ * [ 3,5,3,0,3, 0,5,4,1,0, 3,2,7,2,0 ]
209214 * ```
215+ * It is possible to express these new tokens in terms of an edit applied to the previous tokens:
216+ * ```
217+ * [ 2,5,3,0,3, 0,5,4,1,0, 3,2,7,2,0 ]
218+ * [ 3,5,3,0,3, 0,5,4,1,0, 3,2,7,2,0 ]
210219 *
211- * A smart tokens provider can return a `resultId` to `SemanticTokens`. Then, if the editor still has in memory the previous
212- * result, the editor will pass in options the previous result id at `SemanticTokensRequestOptions.previousResultId`. Only when
213- * the editor passes in the previous result id, it is safe and smart for a smart tokens provider can compute a diff from the
214- * previous result to the new result.
215- *
216- * *NOTE*: It is illegal to return `SemanticTokensEdits` if `options.previousResultId` is not set!
220+ * edit: { start: 0, deleteCount: 1, data: [3] } // replace integer at offset 0 with 3
221+ * ```
217222 *
223+ * Furthermore, let's assume that a new token has appeared on line 4:
224+ * ```
225+ * { line: 3, startChar: 5, length: 3, tokenType: "properties", tokenModifiers: ["private", "static"] },
226+ * { line: 3, startChar: 10, length: 4, tokenType: "types", tokenModifiers: [] },
227+ * { line: 4, startChar: 3, length: 5, tokenType: "properties", tokenModifiers: ["static"] },
228+ * { line: 6, startChar: 2, length: 7, tokenType: "classes", tokenModifiers: [] }
229+ * ```
230+ * The integer encoding of the tokens is:
218231 * ```
219- * [ 2,5,3,1,6, 0,5,4,2,0, 3,2,7,3,0 ]
220- * [ 3,5,3,1,6 , 0,5,4,2 ,0, 1,3,5,1 ,2, 2,2,7,3,0 ]
232+ * // 1st token, 2nd token, 3rd token, 4th token
233+ * [ 3,5,3,0,3 , 0,5,4,1 ,0, 1,3,5,0 ,2, 2,2,7,2,0, ]
221234 * ```
222- * and return as simple integer edits the diff :
235+ * Again, it is possible to express these new tokens in terms of an edit applied to the previous tokens :
223236 * ```
224- * { edits: [
225- * { start: 0, deleteCount: 1, data: [3] } // replace integer at offset 0 with 3
226- * { start: 10, deleteCount: 1, data: [1,3,5,1,2,2] } // replace integer at offset 10 with [1,3,5,1,2,2]
227- * ]}
237+ * [ 3,5,3,0,3, 0,5,4,1,0, 3,2,7,2,0 ]
238+ * [ 3,5,3,0,3, 0,5,4,1,0, 1,3,5,0,2, 2,2,7,2,0, ]
239+ *
240+ * edit: { start: 10, deleteCount: 1, data: [1,3,5,0,2,2] } // replace integer at offset 10 with [1,3,5,0,2,2]
228241 * ```
229- * All indices expressed in the returned diff represent indices in the old result array, so they all refer to the previous result state.
242+ *
243+ *
244+ *
245+ * ---
246+ * ### When to return `SemanticTokensEdits`
247+ *
248+ * When doing edits, it is possible that multiple edits occur until VS Code decides to invoke the semantic tokens provider.
249+ * In principle, each call to `provideSemanticTokens` can return a full representations of the semantic tokens, and that would
250+ * be a perfectly reasonable semantic tokens provider implementation.
251+ *
252+ * However, when having a language server running in a separate process, transferring all the tokens between processes
253+ * might be slow, so VS Code allows to return the new tokens expressed in terms of multiple edits applied to the previous
254+ * tokens.
255+ *
256+ * To clearly define what "previous tokens" means, it is possible to return a `resultId` with the semantic tokens. If the
257+ * editor still has in memory the previous result, the editor will pass in options the previous `resultId` at
258+ * `SemanticTokensRequestOptions.previousResultId`. Only when the editor passes in the previous `resultId`, it is allowed
259+ * that a semantic tokens provider returns the new tokens expressed as edits to be applied to the previous result. Even in this
260+ * case, the semantic tokens provider needs to return a new `resultId` that will identify these new tokens as a basis
261+ * for the next request.
262+ *
263+ * *NOTE 1*: It is illegal to return `SemanticTokensEdits` if `options.previousResultId` is not set.
264+ * *NOTE 2*: All edits in `SemanticTokensEdits` contain indices in the old integers array, so they all refer to the previous result state.
230265 */
231266 provideSemanticTokens ( document : TextDocument , options : SemanticTokensRequestOptions , token : CancellationToken ) : ProviderResult < SemanticTokens | SemanticTokensEdits > ;
232267 }
0 commit comments