+* From a rapid look at the data, it seems that the most readable texts talk about well known (by a kid) things in a simple way (for example: dinosaurs), while the least readable one are about highly techinical subjects (such as metalworking), with a lot of technical words in the text (which of course will be hard to read for a kid). This suggests two approaches: the first one is to select the argument(s) of the text as an extra training variable (maybe spacy + W2V), the second one is to estimate the frequency of words in a large corpora of WELL BALANCED text and look for those words that are higly specilized and unusual (maybe accounting also for the intrinsic "intensione (vedi 'Il software del linguaggio' di R. Simone)
0 commit comments