Skip to content

Commit 7155f24

Browse files
committed
Improve JSDoc
1 parent 687853e commit 7155f24

1 file changed

Lines changed: 87 additions & 52 deletions

File tree

src/vs/vscode.proposed.d.ts

Lines changed: 87 additions & 52 deletions
Original file line numberDiff line numberDiff line change
@@ -133,100 +133,135 @@ declare module 'vscode' {
133133
*/
134134
export interface SemanticTokensProvider {
135135
/**
136-
* A file can contain many tokens, perhaps even hundreds of thousands tokens. Therefore, to improve
137-
* the memory consumption around describing semantic tokens, we have decided to avoid allocating objects
138-
* and we have decided to represent tokens from a file as an array of integers.
136+
* A file can contain many tokens, perhaps even hundreds of thousands of tokens. Therefore, to improve
137+
* the memory consumption around describing semantic tokens, we have decided to avoid allocating an object
138+
* for each token and we represent tokens from a file as an array of integers. Furthermore, the position
139+
* of each token is expressed relative to the token before it because most tokens remain stable relative to
140+
* each other when edits are made in a file.
139141
*
140142
*
141-
* In short, each token takes 5 integers to represent, so a specific token i in the file consists of the following fields:
143+
* ---
144+
* In short, each token takes 5 integers to represent, so a specific token `i` in the file consists of the following fields:
142145
* - at index `5*i` - `deltaLine`: token line number, relative to the previous token
143146
* - at index `5*i+1` - `deltaStart`: token start character, relative to the previous token (relative to 0 or the previous token's start if they are on the same line)
144147
* - at index `5*i+2` - `length`: the length of the token. A token cannot be multiline.
145148
* - at index `5*i+3` - `tokenType`: will be looked up in `SemanticTokensLegend.tokenTypes`
146149
* - at index `5*i+4` - `tokenModifiers`: each set bit will be looked up in `SemanticTokensLegend.tokenModifiers`
147150
*
148151
*
152+
*
153+
* ---
154+
* ### How to encode tokens
155+
*
149156
* Here is an example for encoding a file with 3 tokens:
150157
* ```
151-
* [ { line: 2, startChar: 5, length: 3, tokenType: "properties", tokenModifiers: ["private", "static"] },
158+
* { line: 2, startChar: 5, length: 3, tokenType: "properties", tokenModifiers: ["private", "static"] },
152159
* { line: 2, startChar: 10, length: 4, tokenType: "types", tokenModifiers: [] },
153-
* { line: 5, startChar: 2, length: 7, tokenType: "classes", tokenModifiers: [] } ]
160+
* { line: 5, startChar: 2, length: 7, tokenType: "classes", tokenModifiers: [] }
154161
* ```
155162
*
156163
* 1. First of all, a legend must be devised. This legend must be provided up-front and capture all possible token types.
157-
* For this example, we will choose the following legend which is passed in when registering the provider:
164+
* For this example, we will choose the following legend which must be passed in when registering the provider:
158165
* ```
159-
* { tokenTypes: ['', 'properties', 'types', 'classes'],
160-
* tokenModifiers: ['', 'private', 'static'] }
166+
* tokenTypes: ['properties', 'types', 'classes'],
167+
* tokenModifiers: ['private', 'static']
161168
* ```
162169
*
163-
* 2. The first transformation is to encode `tokenType` and `tokenModifiers` as integers using the legend. Token types are looked
170+
* 2. The first transformation step is to encode `tokenType` and `tokenModifiers` as integers using the legend. Token types are looked
164171
* up by index, so a `tokenType` value of `1` means `tokenTypes[1]`. Multiple token modifiers can be set by using bit flags,
165-
* so a `tokenModifier` value of `6` is first viewed as binary `0b110`, which means `[tokenModifiers[1], tokenModifiers[2]]` because
166-
* bits 1 and 2 are set. Using this legend, the tokens now are:
172+
* so a `tokenModifier` value of `3` is first viewed as binary `0b00000011`, which means `[tokenModifiers[0], tokenModifiers[1]]` because
173+
* bits 0 and 1 are set. Using this legend, the tokens now are:
167174
* ```
168-
* [ { line: 2, startChar: 5, length: 3, tokenType: 1, tokenModifiers: 6 }, // 6 is 0b110
169-
* { line: 2, startChar: 10, length: 4, tokenType: 2, tokenModifiers: 0 },
170-
* { line: 5, startChar: 2, length: 7, tokenType: 3, tokenModifiers: 0 } ]
175+
* { line: 2, startChar: 5, length: 3, tokenType: 0, tokenModifiers: 3 },
176+
* { line: 2, startChar: 10, length: 4, tokenType: 1, tokenModifiers: 0 },
177+
* { line: 5, startChar: 2, length: 7, tokenType: 2, tokenModifiers: 0 }
171178
* ```
172179
*
173-
* 3. Then, we will encode each token relative to the previous token in the file:
180+
* 3. The next steps is to encode each token relative to the previous token in the file. In this case, the second token
181+
* is on the same line as the first token, so the `startChar` of the second token is made relative to the `startChar`
182+
* of the first token, so it will be `10 - 5`. The third token is on a different line than the second token, so the
183+
* `startChar` of the third token will not be altered:
174184
* ```
175-
* [ { deltaLine: 2, deltaStartChar: 5, length: 3, tokenType: 1, tokenModifiers: 6 },
176-
* // this token is on the same line as the first one, so the startChar is made relative
177-
* { deltaLine: 0, deltaStartChar: 5, length: 4, tokenType: 2, tokenModifiers: 0 },
178-
* // this token is on a different line than the second one, so the startChar remains unchanged
179-
* { deltaLine: 3, deltaStartChar: 2, length: 7, tokenType: 3, tokenModifiers: 0 } ]
185+
* { deltaLine: 2, deltaStartChar: 5, length: 3, tokenType: 0, tokenModifiers: 3 },
186+
* { deltaLine: 0, deltaStartChar: 5, length: 4, tokenType: 1, tokenModifiers: 0 },
187+
* { deltaLine: 3, deltaStartChar: 2, length: 7, tokenType: 2, tokenModifiers: 0 }
180188
* ```
181189
*
182-
* 4. Finally, the integers are organized in a single array, which is a memory friendly representation:
190+
* 4. Finally, the last step is to inline each of the 5 fields for a token in a single array, which is a memory friendly representation:
183191
* ```
184192
* // 1st token, 2nd token, 3rd token
185-
* [ 2,5,3,1,6, 0,5,4,2,0, 3,2,7,3,0 ]
193+
* [ 2,5,3,0,3, 0,5,4,1,0, 3,2,7,2,0 ]
186194
* ```
187195
*
188-
* In principle, each call to `provideSemanticTokens` expects a complete representations of the semantic tokens.
189-
* It is possible to simply return all the tokens at each call.
190196
*
191-
* But oftentimes, a small edit in the file will result in a small change to the above delta-based represented tokens.
192-
* (In fact, that is why the above tokens are delta-encoded relative to their corresponding previous tokens).
193-
* In such a case, if VS Code passes in the previous result id, it is possible for an advanced tokenization provider
194-
* to return a delta to the integers array.
195197
*
196-
* To continue with the previous example, suppose a new line has been pressed at the beginning of the file, such that
197-
* all the tokens are now one line lower, and that a new token has appeared since the last result on line 4.
198-
* For example, the tokens might look like:
198+
* ---
199+
* ### How tokens change when the document changes
200+
*
201+
* Let's look at how tokens might change.
202+
*
203+
* Continuing with the above example, suppose a new line was inserted at the top of the file.
204+
* That would make all the tokens move down by one line (notice how the line has changed for each one):
199205
* ```
200-
* [ { line: 3, startChar: 5, length: 3, tokenType: "properties", tokenModifiers: ["private", "static"] },
206+
* { line: 3, startChar: 5, length: 3, tokenType: "properties", tokenModifiers: ["private", "static"] },
201207
* { line: 3, startChar: 10, length: 4, tokenType: "types", tokenModifiers: [] },
202-
* { line: 4, startChar: 3, length: 5, tokenType: "properties", tokenModifiers: ["static"] },
203-
* { line: 6, startChar: 2, length: 7, tokenType: "classes", tokenModifiers: [] } ]
208+
* { line: 6, startChar: 2, length: 7, tokenType: "classes", tokenModifiers: [] }
204209
* ```
205-
*
206-
* The integer encoding of all new tokens would be:
210+
* The integer encoding of the tokens does not change substantially because of the delta-encoding of positions:
207211
* ```
208-
* [ 3,5,3,1,6, 0,5,4,2,0, 1,3,5,1,2, 2,2,7,3,0 ]
212+
* // 1st token, 2nd token, 3rd token
213+
* [ 3,5,3,0,3, 0,5,4,1,0, 3,2,7,2,0 ]
209214
* ```
215+
* It is possible to express these new tokens in terms of an edit applied to the previous tokens:
216+
* ```
217+
* [ 2,5,3,0,3, 0,5,4,1,0, 3,2,7,2,0 ]
218+
* [ 3,5,3,0,3, 0,5,4,1,0, 3,2,7,2,0 ]
210219
*
211-
* A smart tokens provider can return a `resultId` to `SemanticTokens`. Then, if the editor still has in memory the previous
212-
* result, the editor will pass in options the previous result id at `SemanticTokensRequestOptions.previousResultId`. Only when
213-
* the editor passes in the previous result id, it is safe and smart for a smart tokens provider can compute a diff from the
214-
* previous result to the new result.
215-
*
216-
* *NOTE*: It is illegal to return `SemanticTokensEdits` if `options.previousResultId` is not set!
220+
* edit: { start: 0, deleteCount: 1, data: [3] } // replace integer at offset 0 with 3
221+
* ```
217222
*
223+
* Furthermore, let's assume that a new token has appeared on line 4:
224+
* ```
225+
* { line: 3, startChar: 5, length: 3, tokenType: "properties", tokenModifiers: ["private", "static"] },
226+
* { line: 3, startChar: 10, length: 4, tokenType: "types", tokenModifiers: [] },
227+
* { line: 4, startChar: 3, length: 5, tokenType: "properties", tokenModifiers: ["static"] },
228+
* { line: 6, startChar: 2, length: 7, tokenType: "classes", tokenModifiers: [] }
229+
* ```
230+
* The integer encoding of the tokens is:
218231
* ```
219-
* [ 2,5,3,1,6, 0,5,4,2,0, 3,2,7,3,0 ]
220-
* [ 3,5,3,1,6, 0,5,4,2,0, 1,3,5,1,2, 2,2,7,3,0 ]
232+
* // 1st token, 2nd token, 3rd token, 4th token
233+
* [ 3,5,3,0,3, 0,5,4,1,0, 1,3,5,0,2, 2,2,7,2,0, ]
221234
* ```
222-
* and return as simple integer edits the diff:
235+
* Again, it is possible to express these new tokens in terms of an edit applied to the previous tokens:
223236
* ```
224-
* { edits: [
225-
* { start: 0, deleteCount: 1, data: [3] } // replace integer at offset 0 with 3
226-
* { start: 10, deleteCount: 1, data: [1,3,5,1,2,2] } // replace integer at offset 10 with [1,3,5,1,2,2]
227-
* ]}
237+
* [ 3,5,3,0,3, 0,5,4,1,0, 3,2,7,2,0 ]
238+
* [ 3,5,3,0,3, 0,5,4,1,0, 1,3,5,0,2, 2,2,7,2,0, ]
239+
*
240+
* edit: { start: 10, deleteCount: 1, data: [1,3,5,0,2,2] } // replace integer at offset 10 with [1,3,5,0,2,2]
228241
* ```
229-
* All indices expressed in the returned diff represent indices in the old result array, so they all refer to the previous result state.
242+
*
243+
*
244+
*
245+
* ---
246+
* ### When to return `SemanticTokensEdits`
247+
*
248+
* When doing edits, it is possible that multiple edits occur until VS Code decides to invoke the semantic tokens provider.
249+
* In principle, each call to `provideSemanticTokens` can return a full representations of the semantic tokens, and that would
250+
* be a perfectly reasonable semantic tokens provider implementation.
251+
*
252+
* However, when having a language server running in a separate process, transferring all the tokens between processes
253+
* might be slow, so VS Code allows to return the new tokens expressed in terms of multiple edits applied to the previous
254+
* tokens.
255+
*
256+
* To clearly define what "previous tokens" means, it is possible to return a `resultId` with the semantic tokens. If the
257+
* editor still has in memory the previous result, the editor will pass in options the previous `resultId` at
258+
* `SemanticTokensRequestOptions.previousResultId`. Only when the editor passes in the previous `resultId`, it is allowed
259+
* that a semantic tokens provider returns the new tokens expressed as edits to be applied to the previous result. Even in this
260+
* case, the semantic tokens provider needs to return a new `resultId` that will identify these new tokens as a basis
261+
* for the next request.
262+
*
263+
* *NOTE 1*: It is illegal to return `SemanticTokensEdits` if `options.previousResultId` is not set.
264+
* *NOTE 2*: All edits in `SemanticTokensEdits` contain indices in the old integers array, so they all refer to the previous result state.
230265
*/
231266
provideSemanticTokens(document: TextDocument, options: SemanticTokensRequestOptions, token: CancellationToken): ProviderResult<SemanticTokens | SemanticTokensEdits>;
232267
}

0 commit comments

Comments
 (0)