Skip to content

Latest commit

 

History

History

NLP Tools

This folder integrates NLP tools such as text embeddings, text-tagging models, text metrics, etc. for a variety of languages. This is useful for e.g. embedding-based WER calculations amongst other things.

  • Flair, a framework for e.g. bert embeddings, POS-tagging.
  • Spacy, a framework for NLP pipelines, from tokenization to lemmatization and beyond.
  • SacreBLEU, a standardized implementation of the BLEU metric.

Here is a record of test setup and relevant results:

$ pip install flair==0.14.0 spacy==3.8.3 sacrebleu==2.4.3
$ pytest --cov=speechbrain/integrations/nlp/ --cov-context=test --doctest-modules speechbrain/integrations/nlp/

=================== test session starts =======================
platform linux -- Python 3.12.7, pytest-8.3.4, pluggy-1.5.0
plugins: hypothesis-6.112.0, cov-6.0.0, anyio-4.6.2.post1
collected 3 items

speechbrain/integrations/nlp/bleu.py .
speechbrain/integrations/nlp/flair_embeddings.py .
speechbrain/integrations/nlp/spacy_pipeline.py .

---------- coverage: platform linux, python 3.12.7-final-0 -----------
Name                                               Stmts   Miss  Cover
----------------------------------------------------------------------
speechbrain/integrations/nlp/__init__.py               3      0   100%
speechbrain/integrations/nlp/bleu.py                  51      9    82%
speechbrain/integrations/nlp/flair_embeddings.py      27      3    89%
speechbrain/integrations/nlp/flair_tagger.py          18      9    50%
speechbrain/integrations/nlp/spacy_pipeline.py        19      1    95%
----------------------------------------------------------------------
TOTAL                                                118     22    81%