Tatoeba is a platform whose purpose is to create a collaborative and open dataset of sentences and their translations.
-
Updated
Dec 8, 2025 - PHP
Tatoeba is a platform whose purpose is to create a collaborative and open dataset of sentences and their translations.
Resources for conservation, development, and documentation of low resource (human) languages.
Speeding the availability of language resources for endangered languages. Tools such as this have the power to shift how we think about endangered languages. Rather than perceiving them as being antiquated, difficult to learn and on the brink of vanishing, we see them as modern, easily accessible for learning online in text and audio formats.
My thesis on "Open Source Code and Low Resource Languages" for an MSc in Language Science and Technology at Saarland University
A pipeline to isolate and transcribe one language in mixed-language speech
A Python module for retrieving script types of writing systems including alphabets, abjads, abugidas, syllabaries, logographs, featurals as well as Latin script codes
Weather app originally created by CodeExplained (https://github.com/CodeExplainedRepo/Weather-App-JavaScript) to which I have added translations of weather descriptions in Kouri-Vini (Louisiana Creole) and a location search bar. Please let me know if any of my translations aren't showing up correctly. Byin mèsi. :)
Repository of our paper Nesciun Lengaz Lascià Endò: Machine Translation for Fassa Ladin.
Scottish Gaelic Spellchecker - GOC (Gaelic Orthographic Convention)
A Oneida (Canada) to English Dictionary
A Python script supporting Chamorro language preservation through the creation of a custom Chamorro-English dictionary for Kindle devices—making reading in Chamorro more accessible.
Saving endangered Indian languages with open AI innovation
Scottish Gaelic Spellchecker (Universal)
tema per u chjam'è rispondi: a python application written with tkinter
A project to scrape and process online Chamorro language dictionaries to support language analysis and revitalization efforts. (WIP)
Digitised comparative Enggano word list from Oudemans (1889). This publication contains the unpublished Enggano word list by Francis (1870) put in comparison with those by Boewang (1854), van de Straaten & Severijn (1855), von Rosenberg (1855). View the data at https://github.com/engganolang/oudemans1889/blob/main/data/oudemans1889-long.csv
A Python script to scrape, process and export Chamorro Bible text from different online sources, making the text accessible for analysis, research, and digital preservation. (WIP)
A project to scrape and process Chamorro language news articles for language preservation, analysis, and learning tools development. (WIP)
A repository of R codes to process the questionnaire report as part of the "Increasing Impact for Enggano" follow-up grant funded by the Impact Division of the University of Oxford.
Add a description, image, and links to the endangered-languages topic page so that developers can more easily learn about it.
To associate your repository with the endangered-languages topic, visit your repo's landing page and select "manage topics."