-
Notifications
You must be signed in to change notification settings - Fork 275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Weighted names and surnames #168
Comments
Sounds like a good feature, but it may be a bit complicated to implement in terms of finding a source that can cover the surnames and their associated weights of all the nationalities. |
What I had in mind was a backwards compatible solution: Solution #1: allow numbers after the names/surnames, and if present, use them as weight (no number meaning weight=1) Solution #2: allow repeated names in the lists (equivalent to integer weights) and just pick one random line If there's no weight information, the data stays as is. If someone finds a weighted source for a particular version, they can do a PR. |
I've found official stats about names and surnames in Spain, and I've assembled weighted lists for male and female names, as well as surnames:
After looking at the API code, it looks like simply using a list with 38 "Antonio" lines, 34 "Manuel", etc. would work without any code change. Should I do a PR with such lists? |
I think this is pretty universal: some names and surnames are more common than others. It would be a bit more realistic if we could assign weights to names and surnames and imitate their real distribution.
The text was updated successfully, but these errors were encountered: