Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

incluir scripts para décupla validação cruzada #556

Open
2 of 3 tasks
leoalenc opened this issue Sep 11, 2024 · 3 comments
Open
2 of 3 tasks

incluir scripts para décupla validação cruzada #556

leoalenc opened this issue Sep 11, 2024 · 3 comments
Assignees
Labels
parsing Issues about syntactic parsing testing Testing data and code tools This issue relates to Python code

Comments

@leoalenc
Copy link
Contributor

leoalenc commented Sep 11, 2024

  • incluir scripts para décupla validação cruzada de parsing com toquenização e etiquetas ouro
  • melhorar scripts
  • incluir scripts para décupla validação cruzada de parsing de texto cru

Scripts para replicação dos experimentos deste artigo:

ALENCAR, Leonel Figueiredo de. A Universal Dependencies Treebank for Nheengatu. In: GAMALLO, Pablo; CLARO, Daniela; TEIXEIRA, António J. S.; REAL, Livy; GARCÍA, Marcos; OLIVEIRA, Hugo Gonçalo; AMARO, Raquel (Eds.). Proceedings of the 16th International Conference on Computational Processing of Portuguese, PROPOR 2024, Santiago de Compostela, Galicia/Spain, 12-15 March, 2024. Stroudsburg, PA, USA: Association for Computational Linguistics, 2024. v. 2, p. 37-54. Available at: https://aclanthology.org/2024.propor-2.8.

@inproceedings{DeAlencar2024a,
  author = "de Alencar, Leonel Figueiredo",
  editor  = {Pablo Gamallo and
            Daniela Claro and
            Ant{\'{o}}nio J. S. Teixeira and
            Livy Real and
            Marcos Garc{\'{\i}}a and
            Hugo Gon{\c{c}}alo Oliveira and
            Raquel Amaro},
  title = "A {U}niversal {D}ependencies Treebank for {N}heengatu",
  booktitle = {Proceedings of the 16th International Conference on Computational Processing of Portuguese, {PROPOR} 2024, Santiago de Compostela, Galicia/Spain, 12-15 March, 2024},
  pages = "37--54",
  volume = {2},
  publisher = {Association for Computational Linguistics},
  year = {2024},
  month = {3},
  url = "https://aclanthology.org/2024.propor-2.8",
  address = {Stroudsburg, PA, USA},
  abstract="We present UD_Nheengatu-CompLin, the inaugural treebank for Nheengatu, an endangered Indigenous language of Brazil with limited digital resources. This treebank stands as the largest among Indigenous American languages in version 2.13 of the Universal Dependencies collection. The developmental version comprises 1,336 trees, encompassing 13,246 tokens and 13,374 words. In a 10-fold cross-validation experiment using UDPipe 1.2, parsing with gold tokenization and gold tags achieved a labeled attachment score (LAS) of 81.17 ± 1.02, outperforming Yauti, the rule-based analyzer employed for sentence annotation.",
  isbn = {979-8-89176-062-2,
  doi = "10.5281/zenodo.11372209"}
}
@leoalenc leoalenc added tools This issue relates to Python code testing Testing data and code labels Sep 11, 2024
@leoalenc leoalenc self-assigned this Sep 11, 2024
leoalenc added a commit that referenced this issue Sep 11, 2024
@leoalenc
Copy link
Contributor Author

@dominickmaia , incluí no commit os scripts para décupla validação cruzada de parsing com toquenização e etiquetas ouro.

leoalenc added a commit that referenced this issue Sep 11, 2024
@dominickmaia
Copy link
Collaborator

obrigada @leoalenc

@leoalenc leoalenc added the parsing Issues about syntactic parsing label Sep 11, 2024
@leoalenc
Copy link
Contributor Author

Sobre a avaliação do parsing dependencial (métricas UAS, LAS etc.):

https://web.stanford.edu/~jurafsky/slp3/old_oct19/15.pdf

@inproceedings{nivre-fang-2017-universal,
    title = "{U}niversal {D}ependency Evaluation",
    author = "Nivre, Joakim  and
      Fang, Chiao-Ting",
    editor = "de Marneffe, Marie-Catherine  and
      Nivre, Joakim  and
      Schuster, Sebastian",
    booktitle = "Proceedings of the {N}o{D}a{L}i{D}a 2017 Workshop on Universal Dependencies ({UDW} 2017)",
    month = may,
    year = "2017",
    address = "Gothenburg, Sweden",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/W17-0411",
    pages = "86--95",
}

leoalenc added a commit that referenced this issue Sep 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parsing Issues about syntactic parsing testing Testing data and code tools This issue relates to Python code
Projects
None yet
Development

No branches or pull requests

2 participants