Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recording exceeded limits #212

Closed
wants to merge 28 commits into from
Closed

Conversation

liamrabetian
Copy link

@liamrabetian liamrabetian commented Nov 5, 2020

Hi @jsocol . I've been busy doing all these for this issue #210 .

So I've made a proxy and two handlers, one for cache and another for database backend as you said "I can definitely also see wanting to know which users had historically exceeded the limits, not just current."
You can make as many as handlers you want, if there's any!

This functionality is completely separated from the core to not produce any security flaws.

Tests for the new functionality has been added and all the tests pass.

An abstract doc has been added for the settings. I wasn't sure if there's more explaining needed on the doc as it's simple and straight forward, but if needed, let me know.

I also did some re-structuring to the tests module because the test file was getting a Mega file.

@jsocol
Copy link
Owner

jsocol commented Nov 12, 2020

Hi, sorry for not responding sooner—I wanted to say this looks like a really good direction! I gave it a read and there are probably some nuances in implementation, like avoiding accidentally running migrations for users who don't want to opt into this behavior.

It'll probably take me a couple of more weeks to do a good, thorough dive, just to set expectations. But I did see this and will follow-up and pursue it.

One global request, if you're willing: this project has preferred single quotes, and small as it might seem I'd rather that be consistent (except in cases where the quoted string contains a single quote, e.g. "'" > '\''). I noticed some places that convention was changed, and the new code generally uses double quotes.

Copy link
Contributor

@Andrew-Chen-Wang Andrew-Chen-Wang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just added some of my thoughts and performance recommendations (though I think this is pretty out of date now):

migrations.CreateModel(
name='ExceededLimitRecord',
fields=[
('id', models.AutoField(auto_created=True, primary_key=True, serialize=False, verbose_name='ID')),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note in Django 3.2 that the default is BigAutoField.

Suggested change
('id', models.AutoField(auto_created=True, primary_key=True, serialize=False, verbose_name='ID')),
('id', models.BigAutoField(auto_created=True, primary_key=True, serialize=False, verbose_name='ID')),

options={
'verbose_name': 'exceeded limit record',
'verbose_name_plural': 'exceeded limit records',
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd use a linter if possible since there's this weird mix of single quote vs double quotes. But I'm just nit picking.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The convention is to not touch the migration files! As we have consistent code signature everywhere else.

migrations.AddField(
model_name="exceededlimitrecord",
name="last_blocked_at",
field=models.DateTimeField(auto_now=True, verbose_name="Last Blocked At"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be a part of the first migration?

from django.utils.translation import gettext_lazy as _


class BaseModel(models.Model):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rename this to:

Suggested change
class BaseModel(models.Model):
class BaseRatelimitLog(models.Model):

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would also mean you need to regenerate the migration file

Comment on lines 36 to 43
limit_record = ExceededLimitRecord.objects.get(
user_agent=user_agent,
ip_address=client_ip_address,
username=username,
path_info=path_info,
)
limit_record.access_attempt_failures += 1
limit_record.save()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be made more efficient with F expressions and only getting the ID so we can update the record via the ID but not have to get all the values from the record (without the only).

Suggested change
limit_record = ExceededLimitRecord.objects.get(
user_agent=user_agent,
ip_address=client_ip_address,
username=username,
path_info=path_info,
)
limit_record.access_attempt_failures += 1
limit_record.save()
limit_record = ExceededLimitRecord.objects.get(
user_agent=user_agent,
ip_address=client_ip_address,
username=username,
path_info=path_info,
).only("id")
limit_record.access_attempt_failures = F("access_attempt_failures") + 1
limit_record.save()

The "only" should be checked in a test. Test case can look like:

with self.assertNumQueries(2):
    limit_record = ExceededLimitRecord.objects.get(
        user_agent=user_agent,
        ip_address=client_ip_address,
        username=username,
        path_info=path_info,
    ).only("id")
    limit_record.access_attempt_failures = F("access_attempt_failures") + 1
    limit_record.save()

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion!

from ratelimit.conf import settings


def get_client_ip_address(request) -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's already a tool that does that so that we don't need to install an external package (i.e. ipware):

def _get_ip(request):
ip_meta = getattr(settings, 'RATELIMIT_IP_META_KEY', None)
if not ip_meta:
ip = request.META['REMOTE_ADDR']
if not ip:
raise ImproperlyConfigured(
'IP address in REMOTE_ADDR is empty. This can happen when '
'using a reverse proxy and connecting to the app server with '
'Unix sockets. See the documentation for '
'RATELIMIT_IP_META_KEY: https://bit.ly/3iIpy2x')
elif callable(ip_meta):
ip = ip_meta(request)
elif isinstance(ip_meta, str) and '.' in ip_meta:
ip_meta_fn = import_string(ip_meta)
ip = ip_meta_fn(request)
elif ip_meta in request.META:
ip = request.META[ip_meta]
else:
raise ImproperlyConfigured(
'Could not get IP address from "%s"' % ip_meta)
if ':' in ip:
# IPv6
mask = getattr(settings, 'RATELIMIT_IPV6_MASK', 64)
else:
# IPv4
mask = getattr(settings, 'RATELIMIT_IPV4_MASK', 32)
network = ipaddress.ip_network('{}/{}'.format(ip, mask), strict=False)
return str(network.network_address)

Comment on lines 30 to 33
def get_client_username(request) -> str:
if request.user.is_authenticated:
return request.user.username
return "Anonymous"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not everyone has a "username" field. You might want to look into django.contrib.auth.models where they define the Username field inside a list (specifically AbstractUser or AbstractBaseUser). I also think this should instead be based on user ID/pk instead of username to be more efficient (string lookup vs integer lookup).

If you're going to stick with username, then try this (but I'd still recommend using the integer pk AutoField):

Suggested change
def get_client_username(request) -> str:
if request.user.is_authenticated:
return request.user.username
return "Anonymous"
from django.contrib.auth.models import AnonymousUser
def get_client_username(request) -> str:
if request.user.is_authenticated:
return getattr(request.user, request.user.USERNAME_FIELD)
return AnonymousUser

Comment on lines 101 to 107
def make_cache_key(key_components):

cache_key_components = "".join(value for value in key_components.values() if value)
cache_key_digest = md5(cache_key_components.encode()).hexdigest()
cache_key = "drl-{}".format(cache_key_digest)

return cache_key
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this in django_ratelimit.core?

@jsocol
Copy link
Owner

jsocol commented May 6, 2021

This probably needs a bit of rework with the conflicts that have cropped up from some other refactors.

One thing I want to avoid is baking too much optional behavior into the "core." I wonder if we could tuck some of this into something like ratelimit.ext.recorder with its own AppConfig, so that installation requires explicit opt-in like INSTALLED_APPS += ['ratelimit.ext.recorder'] and we don't end up installing unused models or causing inadvertent migrations.

@Andrew-Chen-Wang
Copy link
Contributor

Yes definitely doable. I think all you gotta do is just move the dirs and it should be fine (ref the token_blacklist app from here: https://github.com/jazzband/django-rest-framework-simplejwt/tree/master/rest_framework_simplejwt where the rest of the dir is not affiliated with any app)

@liamrabetian
Copy link
Author

This probably needs a bit of rework with the conflicts that have cropped up from some other refactors.

One thing I want to avoid is baking too much optional behavior into the "core." I wonder if we could tuck some of this into something like ratelimit.ext.recorder with its own AppConfig, so that installation requires explicit opt-in like INSTALLED_APPS += ['ratelimit.ext.recorder'] and we don't end up installing unused models or causing inadvertent migrations.

I have updated the PR and fixed the conflicts. Sadly it's hard for me to find some spare time to work on this more but, I will make some more adjustments. I appreciate any help 👍

Eike Rabetian added 2 commits June 21, 2021 18:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants