|
1 |
| -# franca |
| 1 | +# Language Detector |
2 | 2 |
|
3 |
| -Crystal port of [franc](https://github.com/wooorm/franc) |
| 3 | +Crystal port of [franc](https://github.com/wooorm/franc). |
| 4 | + |
| 5 | +It's not the state-of-the-art algorithm on language identification, but gets 90%+ success on long enough text samples. |
| 6 | + |
| 7 | +It identifies any given text sample by extracting its 3 characters trigrams and comparing them to the most recurring trigrams extracted from a translation of the [UDHR](https://www.un.org/en/universal-declaration-human-rights/) in all the available languages. |
| 8 | + |
| 9 | +Language Detector returns the ISO-869-3 three letters language code of the most probable guess. |
4 | 10 |
|
5 | 11 | ## Installation
|
6 | 12 |
|
7 | 13 | 1. Add the dependency to your `shard.yml`:
|
8 | 14 |
|
9 | 15 | ```yaml
|
10 | 16 | dependencies:
|
11 |
| - franca: |
12 |
| - github: rmarronnier/franca |
| 17 | + cadmium_language_detector: |
| 18 | + github: cadmiumcr/language_detector |
13 | 19 | ```
|
14 | 20 |
|
15 | 21 | 2. Run `shards install`
|
16 | 22 |
|
17 | 23 | ## Usage
|
18 | 24 |
|
19 | 25 | ```crystal
|
20 |
| -require "franca" |
21 |
| -``` |
| 26 | +require "language_detector" |
| 27 | +
|
| 28 | +text = "Alice was published in 1865, three years after Charles Lutwidge Dodgson and the Reverend Robinson Duckworth rowed in a |
| 29 | +boat, on 4 July 1862 [4] (this popular date of the golden afternoon [5] might be a confusion or even another Alice-tale, for that |
| 30 | +particular day was cool, cloudy and rainy [6] ), up the Isis with the three young daughters of Henry Liddell (the Vice-Chancellor ofOxford University and Dean of Christ Church): Lorina Charlotte Liddell (aged |
| 31 | +13, born 1849) (Prima in the book's prefatory verse); Alice Pleasance Liddell |
| 32 | +(aged 10, born 1852) (Secunda in the prefatory verse); Edith Mary Liddell |
| 33 | +(aged 8, born 1853) (Tertia in the prefatory verse). [7] |
| 34 | +The journey began at Folly Bridge near Oxford and ended five miles away in the |
| 35 | +village of Godstow. During the trip Charles Dodgson told the girls a story that |
| 36 | +featured a bored little girl named Alice who goes looking for an adventure. The |
| 37 | +girls loved it, and Alice Liddell asked Dodgson to write it down for her. He |
| 38 | +began writing the manuscript of the story the next day, although that earliest |
| 39 | +version no longer exists. The girls and Dodgson took another boat trip a month |
| 40 | +later when he elaborated the plot to the story of Alice, and in November he |
| 41 | +began working on the manuscript in earnest." |
| 42 | +
|
| 43 | +pp LanguageDetector.new.detect(text) |
| 44 | +
|
| 45 | +# "eng" |
22 | 46 |
|
23 |
| -TODO: Write usage instructions here |
| 47 | +``` |
24 | 48 |
|
25 |
| -## Development |
26 | 49 |
|
27 |
| -TODO: Write development instructions here |
28 | 50 |
|
29 | 51 | ## Contributing
|
30 | 52 |
|
31 |
| -1. Fork it (<https://github.com/rmarronnier/franca/fork>) |
| 53 | +1. Fork it (<https://github.com/cadmiumcr/language_detector/fork>) |
32 | 54 | 2. Create your feature branch (`git checkout -b my-new-feature`)
|
33 | 55 | 3. Commit your changes (`git commit -am 'Add some feature'`)
|
34 | 56 | 4. Push to the branch (`git push origin my-new-feature`)
|
|
0 commit comments