Skip to content

cogniinsight/language-resources

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Language Resources and Tools

Build Status

Datasets and scripts for basic natural language and speech processing.

This is not an official Google product.

Natural Languages

Directory Language Available
af Afrikaans
bn Bengali / Bangla
hi_ur Hindi & Urdu
is Icelandic
jv Javanese
lo Lao
my Burmese / Myanmar
si Sinhala
xh Xhosa
zu Zulu

Tools

We are including a few tools for working with the natural language datasets. These tools are written in C++ and Python and are built with Bazel. To compile and use these tools, install a recent version of Bazel (minimally Bazel release 0.2.0 is required).

License

Unless otherwise noted, all original files are licensed under an Apache License, Version 2.0.

Where specifically noted, some datasets are licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0).

The directory third_party/ contains third-party works, which we are including under the respective licenses of the upstream projects. See third_party/README.md for further details.

About

Datasets and tools for basic natural language processing.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • C++ 48.7%
  • Python 34.9%
  • Java 11.9%
  • Shell 3.0%
  • C 0.8%
  • Protocol Buffer 0.5%
  • Makefile 0.2%