Skip to content

fix weight in interpolation#58

Merged
CamDavidsonPilon merged 5 commits intoCamDavidsonPilon:masterfrom
kunigami:fix_weight
Nov 21, 2021
Merged

fix weight in interpolation#58
CamDavidsonPilon merged 5 commits intoCamDavidsonPilon:masterfrom
kunigami:fix_weight

Conversation

@kunigami
Copy link
Contributor

This is essentially: #52 with extra simplifications.

Intuitively if p is close to t (the mid point of the previous cluster), the weight for c_i.mean should be higher. Whereas right now z1 = p - t is smaller the closer p is to t.

Note that when we do this change, the p == t + k is handled too:

z1 = p - t = t + k - t = k
z2 = t + k - p = t + k - t - k = 0

Also noting that k = z1 + z2:

c_i.mean * z1 + c_i_plus_one.mean * z2) / (z1 + z2) = c_i.mean * k / k = c+i.mean

I added a test case to show case this:

 [1, 10, 10, 10, 10, 100, 100, 100, 100, 1000]

it currently returns 132.0 for p=40.

t.batch_update([1, 1, 2, 2, 3, 4, 4, 4, 5, 5])
assert t.percentile(30) == 2
assert t.percentile(40)*3 == 7
assert t.percentile(40)*3 == 8
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eh, sorry I didn't realize I would have to update this test again >.<

# Test https://github.com/CamDavidsonPilon/tdigest/issues/16
t = TDigest()
t.batch_update([62.0, 202.0, 1415.0, 1433.0])
print(t.percentile(26))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥

@CamDavidsonPilon
Copy link
Owner

Cool! Thanks @kunigami!

@CamDavidsonPilon CamDavidsonPilon merged commit bf90fb1 into CamDavidsonPilon:master Nov 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants