Skip to content

Clean parallel corpus for five low resourced Indian Languages

Notifications You must be signed in to change notification settings

himanshudce/Indian-Language-Dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 

Repository files navigation

Indian-Language-Dataset

Cleaned and Preprocessed parallel corpus for five less resourced Indian Languages

ID Language Train Test Dev

1 & Tamil & 183451 & 2000 & 1000

2 & Malayalam & 548000 & 3660 & 3000

3 & Telugu & 75000 & 3897 & 3000

4 & Bengali & 658000 & 3255 & 3500

5 & Urdu & 36000 & 2454 & 2000

Link For Dataset - https://drive.google.com/open?id=1b3h13rBwTOZRygT6ZIdk4eZ9MKmXSZJa

About

Clean parallel corpus for five low resourced Indian Languages

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published