incluir scripts para décupla validação cruzada #556

leoalenc · 2024-09-11T02:13:30Z

incluir scripts para décupla validação cruzada de parsing com toquenização e etiquetas ouro
melhorar scripts
incluir scripts para décupla validação cruzada de parsing de texto cru

Scripts para replicação dos experimentos deste artigo:

ALENCAR, Leonel Figueiredo de. A Universal Dependencies Treebank for Nheengatu. In: GAMALLO, Pablo; CLARO, Daniela; TEIXEIRA, António J. S.; REAL, Livy; GARCÍA, Marcos; OLIVEIRA, Hugo Gonçalo; AMARO, Raquel (Eds.). Proceedings of the 16th International Conference on Computational Processing of Portuguese, PROPOR 2024, Santiago de Compostela, Galicia/Spain, 12-15 March, 2024. Stroudsburg, PA, USA: Association for Computational Linguistics, 2024. v. 2, p. 37-54. Available at: https://aclanthology.org/2024.propor-2.8.

@inproceedings{DeAlencar2024a,
  author = "de Alencar, Leonel Figueiredo",
  editor  = {Pablo Gamallo and
            Daniela Claro and
            Ant{\'{o}}nio J. S. Teixeira and
            Livy Real and
            Marcos Garc{\'{\i}}a and
            Hugo Gon{\c{c}}alo Oliveira and
            Raquel Amaro},
  title = "A {U}niversal {D}ependencies Treebank for {N}heengatu",
  booktitle = {Proceedings of the 16th International Conference on Computational Processing of Portuguese, {PROPOR} 2024, Santiago de Compostela, Galicia/Spain, 12-15 March, 2024},
  pages = "37--54",
  volume = {2},
  publisher = {Association for Computational Linguistics},
  year = {2024},
  month = {3},
  url = "https://aclanthology.org/2024.propor-2.8",
  address = {Stroudsburg, PA, USA},
  abstract="We present UD_Nheengatu-CompLin, the inaugural treebank for Nheengatu, an endangered Indigenous language of Brazil with limited digital resources. This treebank stands as the largest among Indigenous American languages in version 2.13 of the Universal Dependencies collection. The developmental version comprises 1,336 trees, encompassing 13,246 tokens and 13,374 words. In a 10-fold cross-validation experiment using UDPipe 1.2, parsing with gold tokenization and gold tags achieved a labeled attachment score (LAS) of 81.17 ± 1.02, outperforming Yauti, the rule-based analyzer employed for sentence annotation.",
  isbn = {979-8-89176-062-2,
  doi = "10.5281/zenodo.11372209"}
}

leoalenc · 2024-09-11T02:18:53Z

@dominickmaia , incluí no commit os scripts para décupla validação cruzada de parsing com toquenização e etiquetas ouro.

dominickmaia · 2024-09-11T11:27:35Z

obrigada @leoalenc

leoalenc · 2024-09-11T16:55:58Z

Sobre a avaliação do parsing dependencial (métricas UAS, LAS etc.):

https://web.stanford.edu/~jurafsky/slp3/old_oct19/15.pdf

@inproceedings{nivre-fang-2017-universal,
    title = "{U}niversal {D}ependency Evaluation",
    author = "Nivre, Joakim  and
      Fang, Chiao-Ting",
    editor = "de Marneffe, Marie-Catherine  and
      Nivre, Joakim  and
      Schuster, Sebastian",
    booktitle = "Proceedings of the {N}o{D}a{L}i{D}a 2017 Workshop on Universal Dependencies ({UDW} 2017)",
    month = may,
    year = "2017",
    address = "Gothenburg, Sweden",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/W17-0411",
    pages = "86--95",
}

leoalenc added tools This issue relates to Python code testing Testing data and code labels Sep 11, 2024

leoalenc self-assigned this Sep 11, 2024

leoalenc added a commit that referenced this issue Sep 11, 2024

#556

1a1bb44

leoalenc added a commit that referenced this issue Sep 11, 2024

#556

16ac8f5

leoalenc added the parsing Issues about syntactic parsing label Sep 11, 2024

leoalenc added a commit that referenced this issue Sep 11, 2024

#556

70f7947

leoalenc mentioned this issue Sep 24, 2024

incluir teste de significância estatística #592

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

incluir scripts para décupla validação cruzada #556

incluir scripts para décupla validação cruzada #556

leoalenc commented Sep 11, 2024 •

edited

Loading

leoalenc commented Sep 11, 2024

dominickmaia commented Sep 11, 2024

leoalenc commented Sep 11, 2024

incluir scripts para décupla validação cruzada #556

incluir scripts para décupla validação cruzada #556

Comments

leoalenc commented Sep 11, 2024 • edited Loading

leoalenc commented Sep 11, 2024

dominickmaia commented Sep 11, 2024

leoalenc commented Sep 11, 2024

leoalenc commented Sep 11, 2024 •

edited

Loading