Skip to content

Latest commit

 

History

History
 
 

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 

NLP Tools

This folder integrates NLP tools such as text embeddings, text-tagging models, text metrics, etc. for a variety of languages. This is useful for e.g. embedding-based WER calculations amongst other things.

  • Flair, a framework for e.g. bert embeddings, POS-tagging.
  • Spacy, a framework for NLP pipelines, from tokenization to lemmatization and beyond.
  • SacreBLEU, a standardized implementation of the BLEU metric.

Here is a record of test setup and relevant results:

$ pip install flair==0.14.0 spacy==3.8.3 sacrebleu==2.4.3
$ pytest --cov=speechbrain/integrations/nlp/ --cov-context=test --doctest-modules speechbrain/integrations/nlp/

=================== test session starts =======================
platform linux -- Python 3.12.7, pytest-8.3.4, pluggy-1.5.0
plugins: hypothesis-6.112.0, cov-6.0.0, anyio-4.6.2.post1
collected 3 items

speechbrain/integrations/nlp/bleu.py .
speechbrain/integrations/nlp/flair_embeddings.py .
speechbrain/integrations/nlp/spacy_pipeline.py .

---------- coverage: platform linux, python 3.12.7-final-0 -----------
Name                                               Stmts   Miss  Cover
----------------------------------------------------------------------
speechbrain/integrations/nlp/__init__.py               3      0   100%
speechbrain/integrations/nlp/bleu.py                  51      9    82%
speechbrain/integrations/nlp/flair_embeddings.py      27      3    89%
speechbrain/integrations/nlp/flair_tagger.py          18      9    50%
speechbrain/integrations/nlp/spacy_pipeline.py        19      1    95%
----------------------------------------------------------------------
TOTAL                                                118     22    81%