Weighted names and surnames #168

atnbueno · 2020-07-22T22:22:37Z

I think this is pretty universal: some names and surnames are more common than others. It would be a bit more realistic if we could assign weights to names and surnames and imitate their real distribution.

keitharm · 2022-05-16T22:10:26Z

Sounds like a good feature, but it may be a bit complicated to implement in terms of finding a source that can cover the surnames and their associated weights of all the nationalities.

atnbueno · 2022-05-17T11:48:24Z

What I had in mind was a backwards compatible solution:

Solution #1: allow numbers after the names/surnames, and if present, use them as weight (no number meaning weight=1)

Solution #2: allow repeated names in the lists (equivalent to integer weights) and just pick one random line

If there's no weight information, the data stays as is. If someone finds a weighted source for a particular version, they can do a PR.

atnbueno · 2022-12-01T16:03:44Z

I've found official stats about names and surnames in Spain, and I've assembled weighted lists for male and female names, as well as surnames:

Antonio           38          Maria Carmen      38          Garcia     57
Manuel            34          Maria             34          Rodriguez  36
Jose              33          Carmen            21          Gonzalez   36
Francisco         28          Ana Maria         16          Fernandez  36
David             22          Maria Pilar       15          Lopez      34
Juan              20          Laura             15          Martinez   33
Javier            19          Josefa            15          Sanchez    32
Jose Antonio      18          Isabel            15          Perez      31
Daniel            18          Maria Dolores     15          Gomez      19
Francisco Javier  17          Maria Teresa      14          Martin     19
...                           ...                           ...
Emilio Jose        1          Elizabeth          1          Asensio     1
Jose Andres        1          Meritxell          1          Reina       1
Simon              1          Desiree            1          Polo        1
Luis Antonio       1          Gregoria           1          Ojeda       1
                1000 TOTAL    Antonia Maria      1          Ramon       1
                              ...                           ...
                              Maria Manuela      1          Carrera     1
                              Mia                1          Toledo      1
                              Maria Candelaria   1          Ayala       1
                              Maria Gracia       1          Alcaraz     1
                                              1000 TOTAL    Hernando    1
                                                            ...
                                                            Mejias      1
                                                            Carvajal    1
                                                            Rosales     1
                                                            Toro        1
                                                                     1000 TOTAL

After looking at the API code, it looks like simply using a list with 38 "Antonio" lines, 34 "Manuel", etc. would work without any code change.

Should I do a PR with such lists?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Weighted names and surnames #168

Weighted names and surnames #168

atnbueno commented Jul 22, 2020

keitharm commented May 16, 2022

atnbueno commented May 17, 2022 •

edited

Loading

atnbueno commented Dec 1, 2022

Weighted names and surnames #168

Weighted names and surnames #168

Comments

atnbueno commented Jul 22, 2020

keitharm commented May 16, 2022

atnbueno commented May 17, 2022 • edited Loading

atnbueno commented Dec 1, 2022

atnbueno commented May 17, 2022 •

edited

Loading