Skip to content

Add spanish folder #12

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 29 commits into
base: master
Choose a base branch
from
Open

Add spanish folder #12

wants to merge 29 commits into from

Conversation

sguinetti
Copy link

Hello again. This PR is for adding a spanish-format numbers. I have added several parameters to test with dates and numbers. I hope it is useful, as I have tried to adapt as much as I could. If it is possible to test its operation, I thank you in advance.

@Stypox
Copy link
Owner

Stypox commented Jul 3, 2025

Thank you! I pushed a commit that copy-pasted the files from Italian into Spanish, since Spanish is quite similar to Italian, though there is some stuff that needs to be adapted and some hardcoded values that need to be changed in the kotlin files. Here is a build of Dicio based on that commit, you will see that now the timer and calculator skills are available in Spanish, too. And it also kind of works already, see the screenshot below, however it didn't understand "one million" for some reason. https://github.com/Stypox/testing-apks/releases/download/15/app-debug.apk

image

To make this complete you would need to:

  • write some tests under src/test/java/org/dicio/numbers/lang/es/ e.g. by copy-pasting and translating those from English or Italian (adapting to the differences in Spanish obviously)
  • work through the kotlin code to fix the failing tests and add more functions (or remove some) if you need

I would help you out but I don't know Spanish, though let me know if you have any question about the code 😬

@sguinetti
Copy link
Author

Thank you. I made a few tweaks in the tokenizer. And I added the test folder to perform tests with Spanish. It will take a few days to complete the translation or adaptation. Hopefully I will finish it when I have some time.

@sguinetti
Copy link
Author

sguinetti commented Jul 15, 2025

I asked Gemini 2.5 Pro to take on the challenge of adapting the code to follow the logic of the Spanish language and found it interesting that it made modifications in a matter of minutes, including absurd situations that I have not imagined to do (like "tres cientos" when the right word is "trescientos"). I hope this commit is useful for you to review and make other modifications ("lang" folder).

@Stypox
Copy link
Owner

Stypox commented Jul 15, 2025

Thanks! This is one of the first times I see LLMs actually being useful at writing code xD. I still had to fix a few compiler errors though.

I built again an APK, https://github.com/Stypox/testing-apks/releases/download/20/app-debug.apk, that will allow you to test:

  • number parsing/formatting through the calculator skill
  • duration parsing/formatting through the timer skill
  • I added a new debug skill called "Calendar" to debug the datetime parsing/formatting (code here), it currently only accepts inputs of the form "evento 17 de junio" and only prints out debug information
image

Stypox added a commit to Stypox/dicio-android that referenced this pull request Jul 15, 2025
@sguinetti
Copy link
Author

sguinetti commented Jul 16, 2025

However it didn't understand "one million" for some reason. https://github.com/Stypox/testing-apks/releases/download/15/app-debug.apk

image

I'm thinking that the app does not recognize words like “million” or “billion” in the singular but does recognize words like ‘millions’ and “biollons” in the plural.

Examples of spanish phrases to numbers:

  • "dos mil": 2000
  • "dos miles": 2
  • "dos miles más tres": 2000 + 3
  • "un millón": 1
  • "un millones": 1000000
  • "dos millones": "2000000"
  • "diez millones": 10000000
  • "mil millones": 1000 + 1000000
  • "mil millón": 1000
  • "miles millones más dos": 1000000 + 2
  • "miles millones más mil": 1001000 + 1000

Perhaps the solution is to edit the multiplier block in tokenizer.json.

Copy link
Owner

@Stypox Stypox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the continued effort, here is another APK: https://github.com/Stypox/testing-apks/releases/download/21/app-debug.apk

The tests don't compile at the moment, and there are plenty of errors (e.g. some methods being tested only exist for the english parser and not for the spanish one). Could you fix them (or prompt the ai to fix them)? Also, are you able to run tests yourself, so you can more easily iterate until they pass?

@sguinetti
Copy link
Author

I have asked the AI to correct several details. Now I have fixed tokenizer to identify composite numbers taking references from the Italian one which brought better results. I am doing also with date_time.json.

About testing, I haven't managed to compile with APK because I haven't got it yet. Also, I'm using a lighter text editor. I'm sorry.

@sguinetti sguinetti requested a review from Stypox July 17, 2025 22:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants