You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
### 5. Python example, doing tokenization and hyphenation of a text
146
184
147
185
Since hyphenation API's take one word at a time with the limit of 300 Unicode characters, we need to break the text into words first and then run hyphenation for each token.
148
186
@@ -172,7 +210,7 @@ Li-ke Cu-rios-i-ty , the Per-se-ve-rance ro-ver was built by en-gi-neers and sci
172
210
Note you can specify any other Unicode character as a hyphen that API inserts into the output string.
173
211
174
212
175
-
### 5. C# example, calling XLM Roberta tokenizer and getting ids and offsets
213
+
### 6. C# example, calling XLM Roberta tokenizer and getting ids and offsets
176
214
177
215
Note, everything that is supported in Python is supported by C# API as well. C# also has ability to use parallel computations since all models and functions are stateless you can share the same model across the threads without locks. Let's load XLM Roberta model and tokenize a string, for each token let's get ID and offsets in the original text.
178
216
@@ -231,7 +269,7 @@ tokens from offsets: ['Auto'/4396 'pho'/22014 'bia'/9166 ','/4 ' also'/2843 ' ca
231
269
```
232
270
See this project for more C# examples: https://github.com/microsoft/BlingFire/tree/master/nuget/test .
233
271
234
-
### 6. JavaScript example, fetching and loading model file, using the model to compute ids
272
+
### 7. JavaScript example, fetching and loading model file, using the model to compute ids
235
273
236
274
The goal of integration with JavaScript is ability to run the code in a browser with ML frameworks like TensorFlow.js and FastText web assembly.
Full example code can be found [here](https://github.com/microsoft/BlingFire/blob/master/wasm/example.html). Details of the API are described in the [wasm](https://github.com/microsoft/BlingFire/tree/master/wasm) folder.
278
316
279
317
280
-
### 7. Example of making a difference with using Bling Fire default tokenizer in a classification task
318
+
### 8. Example of making a difference with using Bling Fire default tokenizer in a classification task
281
319
282
320
[This notebook](/doc/Bling%20Fire%20Tokenizer%20Demo.ipynb) demonstrates how Bling Fire tokenizer helps in Stack Overflow posts classification problem.
283
321
284
-
### 8. Example of reaching 99% accuracy for language detection
322
+
### 9. Example of reaching 99% accuracy for language detection
285
323
286
324
[This document](https://github.com/microsoft/BlingFire/wiki/How-to-train-better-language-detection-with-Bling-Fire-and-FastText) describes how to improve [FastText](https://fasttext.cc/) language detection model with Bling Fire and achive 99% accuracy in language detection task for 365 languages.
0 commit comments