Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Random probability distribution function #1862

Open
maxime4000 opened this issue Feb 20, 2023 · 7 comments
Open

Adding Random probability distribution function #1862

maxime4000 opened this issue Feb 20, 2023 · 7 comments
Labels
c: feature Request for new feature has workaround Workaround provided or linked m: number Something is referring to the number module p: 1-normal Nothing urgent s: waiting for user interest Waiting for more users interested in this feature
Milestone

Comments

@maxime4000
Copy link

maxime4000 commented Feb 20, 2023

Clear and concise description of the problem

So I'm seeding a database with faker. I have field that allow array of some type. I want to generate multiple array, but with different size. Some where the array is empty, some where the array has 1 elements and some where the array has multiple elements.

Most of the case will have one element in the array, but I also want to test limit case, so having a way to generate Random distributed data would be nice.

const isEmpty = faker.datatype.boolean(); // ~50%
const isOneElement = faker.datatype.boolean(); // ~25%
const length = faker.datatype.number(100); // ~25%
const array = 
isEmpty ? []
        : isOneElement 
          ? [getFakerFunction(field)] 
          : Array.from({length}, () => getFakerFunction(field));
return array

Let's said that I'm faking an array of value and I want some length to be more common than others. It's common to have an array of length 1 to 3 but it's very rare to have an array of 100. I would like to have a random probability distribution function for this.

Suggested solution

In my case, I'm looking for a random exponential distribution.

  • Length 1 has 40% chance to happen,
  • Length 2 => 30%
  • Length 3 => 20%
  • and so on...

The function would accept an argument like this:

type ExponentialDistributionOptions = {
	min?: number;
	max?: number;
	precision?:number;
	curveSettings: {
		deviation?:number;
		mean?: number;
		// ...
	}
}

And would generate a number using the distribution called.
I would expect to call faker.random.exponentialDistribution({min: 0, max: 100, curveSettings: {...}}) and the number generated from this would have more chance to be closer to 0 than closer to 100. On a scale of 1000 random value generated, we could see few value with a number close to 100.

I wouldn't limit the feature to only exponential distribution, I would also add gaussian distribution, Rayleigh distribution, gamma distribution, etc...

Alternative

No response

Additional context

I'm not sure if what I'm asking is out of scope for faker, but at the same time, faker is generating data from a random value. Why would faker couldn't generate number base on some probability of that number to be generated?

Btw, I'm no mathematician, so I might be incorrect with what I explain, but I still think faker could add some random probability distribution function.

@maxime4000 maxime4000 added the s: pending triage Pending Triage label Feb 20, 2023
@ST-DDT
Copy link
Member

ST-DDT commented Feb 20, 2023

Do you refer to something like this?

function exponentialDistributionNumber(start = 1, stepScale = 2, stepProbability = 0.5, limit = Number.MAX_SAFE_INTEGER) {
    let max = start;
    while(faker.datatype.boolean(stepProbability) && max < limit) {
        max *= stepScale;
    }
    return faker.number.int({ min: 0, max: Math.min(max, limit) });
}
Result occurrences for 1 Mio runs of exponentialDistributionNumber(1, 2, 0.5, 100)

0: 367108
1: 368619
2: 117775
3: 34374
4: 34582
5: 9489
6: 9445
7: 9374
8: 9518
9: 2549
10: 2571
11: 2515
12: 2571
13: 2442
14: 2549
15: 2482
16: 2511
17: 661
18: 677
19: 656
20: 660
21: 649
22: 672
23: 684
24: 651
25: 662
26: 612
27: 659
28: 654
29: 641
30: 653
31: 653
32: 692
33: 212
34: 204
35: 195
36: 178
37: 200
38: 178
39: 192
40: 212
41: 201
42: 212
43: 219
44: 189
45: 194
46: 203
47: 203
48: 209
49: 161
50: 210
51: 200
52: 199
53: 189
54: 196
55: 175
56: 196
57: 166
58: 199
59: 188
60: 191
61: 187
62: 192
63: 193
64: 185
65: 78
66: 73
67: 73
68: 90
69: 84
70: 63
71: 83
72: 87
73: 59
74: 73
75: 65
76: 70
77: 83
78: 91
79: 88
80: 72
81: 75
82: 80
83: 61
84: 73
85: 83
86: 78
87: 78
88: 68
89: 60
90: 77
91: 94
92: 82
93: 67
94: 68
95: 79
96: 79
97: 77
98: 76
99: 90
100: 85

grafik

Would something like this suffice or do you need more/something else?

@ST-DDT ST-DDT added c: feature Request for new feature p: 1-normal Nothing urgent s: awaiting more info Additional information are requested m: number Something is referring to the number module labels Feb 20, 2023
@maxime4000
Copy link
Author

Interesting! Yes something like this would suffice. That would be nice if it was implemented as an API function.

@matthewmayer
Copy link
Contributor

Something similar could also be achieved by having a variant of faker.helpers.arrayElement where each element of the array has a fixed independent probability of being included in the return values

@ST-DDT
Copy link
Member

ST-DDT commented Feb 21, 2023

Something similar could also be achieved by having a variant of faker.helpers.arrayElement where each element of the array has a fixed independent probability of being included in the return values

Like helpers.weightedArrayElement? Well not really but close when used for the length.

@ST-DDT ST-DDT added s: needs decision Needs team/maintainer decision and removed s: pending triage Pending Triage s: awaiting more info Additional information are requested labels Feb 21, 2023
@xDivisionByZerox xDivisionByZerox added this to the vFuture milestone Mar 9, 2023
@xDivisionByZerox xDivisionByZerox added s: awaiting more info Additional information are requested s: waiting for user interest Waiting for more users interested in this feature has workaround Workaround provided or linked and removed s: needs decision Needs team/maintainer decision s: awaiting more info Additional information are requested labels Mar 9, 2023
@xDivisionByZerox
Copy link
Member

xDivisionByZerox commented Mar 9, 2023

Team decision

There is an existing workaround for this problem.
We are currently unsure about implementation details regarding the distribution.

If you want/need this feature please upvote this issue.

@ST-DDT ST-DDT added s: waiting for user interest Waiting for more users interested in this feature and removed s: waiting for user interest Waiting for more users interested in this feature labels May 5, 2023
@github-actions
Copy link
Contributor

github-actions bot commented May 5, 2023

Thank you for your feature proposal.

We marked it as "waiting for user interest" for now to gather some feedback from our community:

  • If you would like to see this feature be implemented, please react to the description with an up-vote (:+1:).
  • If you have a suggestion or want to point out some special cases that need to be considered, please leave a comment, so we are aware about them.

We would also like to hear about other community members' use cases for the feature to give us a better understanding of their potential implicit or explicit requirements.

We will start the implementation based on:

  • the number of votes (:+1:) and comments
  • the relevance for the ecosystem
  • availability of alternatives and workarounds
  • and the complexity of the requested feature

We do this because:

  • There are plenty of languages/countries out there and we would like to ensure that every method can cover all or almost all of them.
  • Every feature we add to faker has "costs" associated to it:
    • initial costs: design, implementation, reviews, documentation
    • running costs: awareness of the feature itself, more complex module structure, increased bundle size, more work during refactors

View more issues which are waiting for user interest

@ST-DDT
Copy link
Member

ST-DDT commented Oct 21, 2023

Here an improved version of the function:

/**
 * Generates a random number between min and max using an exponential distribution.
 * The lower bound is inclusive, but the upper bound is exclusive.
 *
 * @param options The options for generating the number.
 * @param options.min The minimum value to generate. Defaults to `0`.
 * @param options.max The maximum value to generate. Defaults to `1`.
 * @param options.bias The bias of the distribution. Must be greater than 0. Defaults to 1.
 * The lower the bias, the more likely the number will be closer to the min (0-1@0.1 -> avg: ~0.025).
 * A bias of 1 will generate the default exponential distribution (0-1@1 -> avg: ~0.202).
 * The higher the bias, the more likely the number will be closer to the max (0-1@10 -> avg: ~0.691).
 *
 * @throws If bias is less than or equal to 0.
 * @throws If max is less than min.
 */
function exponentialDistributionNumber(
  options:
    | number
    | {
        /**
         * The minimum value to generate.
         *
         * @default 0
         */
        min?: number;
        /**
         * The maximum value to generate.
         *
         * @default 1
         */
        max?: number;
        /**
         * The bias of the distribution. Must be greater than 0.
         *
         * The lower the bias, the more likely the number will be closer to the min (0-1@0.1 -> avg ~0.025).
         * A bias of 1 will generate the default exponential distribution (0-1@1 -> avg ~0.202).
         * The higher the bias, the more likely the number will be closer to the max (0-1@10 -> avg ~0.691).
         *
         * @default 1
         */
        bias?: number;
      }
) {
  if (typeof options === 'number') {
    options = { max: options };
  }

  const { min = 0, max = 1, bias = 1 } = options;

  if (bias <= 0) {
    throw new FakerError('Bias must be greater than 0');
  }

  if (max === min) {
    return min;
  }

  if (max < min) {
    throw new FakerError(`Max ${max} should be greater than min ${min}.`);
  }

  const random = faker.number.float(); // [0,1)
  const exponent = random ** (1 / bias); // [0,1)
  const range = max - min + 1; // +1 to account for x ** 0 = 1
  return min + range ** exponent - 1; // -1 to account for x ** 0 = 1
}

Generating 100kk values between 0-100:

grafik

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c: feature Request for new feature has workaround Workaround provided or linked m: number Something is referring to the number module p: 1-normal Nothing urgent s: waiting for user interest Waiting for more users interested in this feature
Projects
None yet
Development

No branches or pull requests

4 participants