Cleaned and Preprocessed parallel corpus for five less resourced Indian Languages
ID Language Train Test Dev
1 & Tamil & 183451 & 2000 & 1000
2 & Malayalam & 548000 & 3660 & 3000
3 & Telugu & 75000 & 3897 & 3000
4 & Bengali & 658000 & 3255 & 3500
5 & Urdu & 36000 & 2454 & 2000
Link For Dataset - https://drive.google.com/open?id=1b3h13rBwTOZRygT6ZIdk4eZ9MKmXSZJa