Reading a full forms lexicon #130

arademaker · 2021-08-11T13:32:45Z

The words command produce all pairs of up/lower words. Do we have any command do read a file with those pairs and produce an fst from the pairs?

mhulden · 2021-08-11T13:46:47Z

You can use read spaced-text for that; however, the format required is a little different. You need to separate symbols with spaces and input/output pairs go on separate lines, with newlines in between. Example:

c a t
g a t o

d o g
p e r r o

produces a transducer that maps cat to gato and dog to perro.

arademaker · 2021-08-11T14:07:43Z

Thank you, surely that can help us to have a morphological analyzer out of our full-forms Portuguese Lexicon at https://github.com/LR-POR/MorphoBr/. But, of course, such a transducer is not the perfect solution since it does not capture the rules of the morphology nor the position classes and the respective morphemes.


a l e t o l o g i n h a s	 
a l e t o l o g i a +N +DIM +F +PL

arademaker · 2023-03-20T18:06:15Z

Hi @mhulden,

foma[0]: read spaced-text all.foma
Stack full!

I got a stack full error while reading a file with 8,027,574 lines. Any alternative? Can I increase the stack size? The file was created according to the above instructions

% head all.foma
a
a +N +M +SG

a s
a +N +M +PL

a z i n h o
a +N +DIM +M +SG

arademaker · 2023-03-20T19:51:21Z

I was able to compile the spaced-text files

% ll -h *.sp
-rw-r--r--  1 ar  staff    32M Mar 20 16:25 adjectives.sp
-rw-r--r--  1 ar  staff   1.4M Mar 20 16:25 adverbs.sp
-rw-r--r--  1 ar  staff    31M Mar 20 16:25 nouns.sp
-rw-r--r--  1 ar  staff   150M Mar 20 16:25 verbs.sp

with the foma script

% cat compile-m.foma
!Copyright (C) 2023 Alexandre Rademaker

read spaced-text nouns.sp
define nouns ;
clear stack

read spaced-text verbs.sp
define verbs ;
clear stack

read spaced-text adjectives.sp
define adjs ;
clear stack

read spaced-text adverbs.sp
define advs ;
clear stack

save defined morphobr.bin

after changing the https://github.com/mhulden/foma/blob/master/foma/int_stack.c#L22 to 5097152. Does it make sense?

arademaker · 2023-03-20T19:58:25Z

The only strange behaviour I got is that adjectives are not considered:

% echo "fracota" | flookup -a -i morphobr.bin
fracota	fracote+N+F+SG

ar@tenis morpho-br % rg fracota
nouns/nouns-f.dict
16878:fracota	fracote+N+F+SG
16879:fracotas	fracote+N+F+PL
16880:fracotazinha	fracote+N+DIM+F+SG
16881:fracotazinhas	fracote+N+DIM+F+PL

adjectives/adjectives-f.dict
16046:fracota	fracote+A+F+SG
16047:fracotas	fracote+A+F+PL
16048:fracotazinha	fracote+A+DIM+F+SG
16049:fracotazinhas	fracote+A+DIM+F+PL

Any idea?

mhulden · 2023-03-21T01:44:28Z

Consider doing this instead of save defined

regex  nouns | verbs | adjs | advs;
save stack morphbr.bin

(save defined saves several FSTs and flookup only loads one - with the above, you should get a single FST one the stack and save that.)

arademaker · 2023-03-21T12:39:58Z

Thanks, it worked. The strange behavior is that I tested it with nouns and verbs, and it works. That is, an ambiguous word. The problem may be that without this explicit combination of the FSTs with the disjunction. We ended up with an FST with multiple starting states, and the flookup tool tried only one?! But I was using the -a flag!

Anyway, the explicit disjunction to combine the FSTs worked fine!

arademaker mentioned this issue Aug 11, 2021

prepare release LR-POR/MorphoBr#12

Open

2 tasks

arademaker closed this as completed Aug 11, 2021

arademaker mentioned this issue Aug 13, 2021

ideas for the Haskell library LR-POR/tools#7

Open

arademaker reopened this Mar 20, 2023

arademaker closed this as completed Mar 21, 2023

arademaker mentioned this issue Mar 21, 2023

compiling a finite-state transducer from the dict files LR-POR/MorphoBr#130

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reading a full forms lexicon #130

Reading a full forms lexicon #130

arademaker commented Aug 11, 2021

mhulden commented Aug 11, 2021

arademaker commented Aug 11, 2021

arademaker commented Mar 20, 2023 •

edited

Loading

arademaker commented Mar 20, 2023 •

edited

Loading

arademaker commented Mar 20, 2023

mhulden commented Mar 21, 2023 •

edited

Loading

arademaker commented Mar 21, 2023

Reading a full forms lexicon #130

Reading a full forms lexicon #130

Comments

arademaker commented Aug 11, 2021

mhulden commented Aug 11, 2021

arademaker commented Aug 11, 2021

arademaker commented Mar 20, 2023 • edited Loading

arademaker commented Mar 20, 2023 • edited Loading

arademaker commented Mar 20, 2023

mhulden commented Mar 21, 2023 • edited Loading

arademaker commented Mar 21, 2023

arademaker commented Mar 20, 2023 •

edited

Loading

arademaker commented Mar 20, 2023 •

edited

Loading

mhulden commented Mar 21, 2023 •

edited

Loading