This repo contains various datasets in Georgian for NLP or other purposes. These are entire text of "The Knight with the Panther skin" vefxistyaosani.txt
, Georgian aphorisms aforizmebi.txt
, first and last names of Georgian poets and writers poetswriters.txt
, baby names in Georgian names.csv
(© kids.ge), and full Georgian Alphabet anbani.csv
with corresponding descriptions of the letters as it appears in Unicode.
Some of these datasets were fed to Neural Networks (char-rnn by Andrej Karpathy) to generate fake data, such as fake-aforizmebi.txt
, fake-names.txt
trained on Georgian (origin) subset, fake-poetswriters.txt
.
Name | Description | Source | Lines | URL |
---|---|---|---|---|
vefxistyaosani.csv | Labeled text of "The Knight with the Panther skin" | 6678 | GET | |
quotes.csv | Quotes from 184 famous people in Georgian | ka.wikiquote.org | 3683 | GET |
aforizmebi.txt | Georgian aphorisms | various sources | 132 | GET |
poetswriters.txt | First and Last names of Georgian Poets and Writers | ka.wikipedia.org | 544 | GET |
names.csv | Baby names in Georgian with various origins | kids.ge © | 2094 | GET |
anbani.csv | Full Georgian alphabet with descriptions and char codes | unicode.org | 175 | GET |
vefxistyaosani.txt | Raw text of "The Knight with the Panther skin" | 8524 | GET |
Name | Description | Source | Lines | URL |
---|---|---|---|---|
fake-aforizmebi.txt | Georgian aphorisms generated using char-rnn | anbani.db | 17047 | GET |
fake-poetswriters.txt | Fake poetic names trained on Georgian poets and writers | anbani.db | 2514 | GET |
fake-names.csv | Fake names trained on Georgian subset of baby names | anbani.db | 60961 | GET |
fake-vefxistyaosani.txt | Char-RNN mimicking Shota Rustaveli (not well) | anbani.db | 26032 | GET |
Here are some of the resources you might like.
Fake Georgian text and names generation is supported by anbani.js
- a multifunctional Javascript library for working with Georgian Alphabet. Read more about the package here [anbani / anbani.js]
npm install anbani
var anbani = require('anbani')
anbani.core.convert("ანბანი", "მხედრული", "ასომთავრული")
// 'ႠႬႡႠႬႨ'
anbani.lorem.names(3)
// ['დამერ გაშვითელი', 'სიბო ყორთელია', 'გიმოლ ვაწოშვილი']
anbani.lorem.sentences(10)
// 'მოეხვიდეს სიტირენ გიშიხარნი. წეითო გამიზრიან, ჰქონთავისთან გემრუფენ, უკრთებოდემნი მესმანცა მყივნე.'
For other awesome Georgian datasets, visit [bumbeishvili / awesome-georgian-datasets]
Datasets are available freely for non-commercial purposes only. For commercial purposes, contact the corresponding source.