-
-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define service boundaries within the monolith #7
base: main
Are you sure you want to change the base?
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,128 @@ | ||
* Start Date: 2022-08-31 | ||
* RFC Type: decision | ||
* RFC PR: https://github.com/getsentry/rfcs/pull/7 | ||
|
||
# Summary | ||
|
||
This RFC proposes a file structure for 'compartmentalized services' or 'domain | ||
boundaries'. The concept of service boundaries was introduced in 0002, and this | ||
document aims to provide more detailed guidelines for how 'services' in the | ||
monolith would be structured as Python modules. | ||
|
||
# Motivation | ||
|
||
The sentry monolith continues to grow in scope as we build new product features. | ||
As the application has grown the number of models, endpoints and tasks makes | ||
understanding how the application is inter-connected more challenging. The | ||
current code layout complicates optimizing CI, and impairs our ability to | ||
clearly dilineate product boundaries within the monolith. | ||
|
||
This RFC does not attempt to define what the boundaries and services within the | ||
monolith should be. Nor does it attempt to describe the organization of | ||
Typescript code. | ||
|
||
# Background | ||
|
||
Currently the Sentry monolith is organized as a single Django application that | ||
follows a typical project layout organized by 'kind of class'. For example, all | ||
models are co-located in a small number of directories, as are all endpoints and | ||
serializers. While this repository layout has served us well, it is increasingly | ||
hard to navigate as the application grows. At time of writing, we have: | ||
|
||
* ~275 endpoint modules | ||
* 115 model modules | ||
* 105 serializer modules | ||
|
||
Knowing how each of these classes are related to features in sentry is not | ||
always obvious. A similar problem exists for tests as there is no way to easily | ||
locate all the tests that need to be run when a model class changes. | ||
|
||
# Proposed Python Structure | ||
|
||
As sentry is a django application, we can leverage the | ||
[Django-apps](https://docs.djangoproject.com/en/4.1/ref/applications/) to act as | ||
a container for application services in the future. While not all services will | ||
need all the features of Django Applications, many will. | ||
|
||
## Django app structure | ||
|
||
We'll use 'discover' as an example for the service modules | ||
|
||
``` | ||
src/sentry/discover | ||
__init__.py | ||
app.py | ||
urls.py | ||
models/__init__.py | ||
models/discoversavedquery.py | ||
endpoints/discoverquery.py | ||
serializers/discoverquery.py | ||
tasks/deduplicate_things.py | ||
|
||
tests/__init__.py | ||
tests/models/test_discoversavedquery.py | ||
tests/endpoints/test_discoverquery.py | ||
tests/serializers/test_discoverquery.py | ||
``` | ||
|
||
In addition to the Django related modules, celery tasks, consumers and any other | ||
modules can be contained within a service. If a service doesn't provide | ||
endpoints or use models it can still benefit from the proposed structure. | ||
|
||
## Test location | ||
|
||
Tests for a service would continue to live inside the top-level `tests` | ||
directory. The `tests` tree would mirror continue to mirror the service + module | ||
structure of the application code. Sharing naming conventions should make | ||
running sub-sets of tests simpler to automate. | ||
|
||
## Formal entry points | ||
|
||
Service modules would use `__init__.py` to define the interface they present to | ||
the rest of the Sentry monolith. Having the public interface of a service | ||
formally defined limits the amount of entanglement the rest of the application | ||
can create. | ||
|
||
## Importing service internals is not allowed | ||
|
||
An important change from the present application structure is that modules | ||
outside of a service's scope would be *disallowed* from importing modules inside | ||
a service. Modules outside of a service boundary may only import the top-level | ||
service. | ||
|
||
Disallowing cross-service internal imports could be enforced with the | ||
[flake8-import-graph](https://pypi.org/project/flake8-import-graph/) extension. | ||
|
||
# Options Considered | ||
|
||
Another approach to this would be to put 'services' inside the directories of | ||
each 'kind'. Again using discover as an example: | ||
|
||
``` | ||
src/sentry | ||
endpoints/discover/discover_query.py | ||
models/discover/discoversavedquery.py | ||
tests/models/discover/test_discoversavedquery.py | ||
``` | ||
|
||
This approach dilutes the consistency benefits, and requires a significantly | ||
more complex import graph rules. It also does not improve local development or | ||
offer benefits to CI subsetting. | ||
|
||
# Drawbacks | ||
|
||
This approach will require moving **most** of the application source code | ||
around. We currently store classpaths in several locations in the database. We | ||
may need to use data migrations to update these paths or maintain aliases for | ||
compatibility. | ||
|
||
Co-locating tests with application code is a potentially contentious change. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm unsure how this line relates to the current draft. I don't see that co-located tests are recommended, above. Unless you're calling this directory structure co-location, but this isn't the first thing people think of when they hear "co-located tests".
Test co-location usually means (something equivalent to):
I personally like that model, but as you note it's not commonplace in python. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would propose to separate out the colocation discussion about tests into a separate proposal and to remove it form the scope of this. I think the test location discussion has a high bikeshedding potential but I think underpinning that largely philosophical debate are good reasons for one or the other which deserve to be discussed independently. Either way I agree that we need a proposal for where tests go, but that we can have independently of the service boundaries. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This is a vestigial change from a previous rev. I'll take it out.
Agree that this could be a big bikeshed. |
||
While co-located tests are compatible with pytest test discovery works, it is an | ||
[un conventional](un-conventional) approach in python projects. | ||
|
||
# Unresolved questions | ||
|
||
* What 'services' would we need to add to the application? | ||
* What do we do with models and logic that is shared by many endpoints/domains? | ||
Examples of this include rate limiting, and models like Organization, and | ||
Project? | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There must be no "common" or "utils" or "misc" directory, or you'll eventually get a fresh new cobweb of code with unclear ownership. Under the current proposal, as written, each (kind of, or owner of) shared code must be given a name and a directory at to your examples
It's tempting to combine project and organization into a single "core" or "sentry" service, but then there's no clarity as to what should be added to (or rejected from) that service. Keep in mind that the "project" service will not contain just the one file but all the endpoints, serializers, tasks and tests that are specific to "project". And any proposed additions to this service will have a very clear admission criteria: is it actually about "project" or does it have some other scope. But that's just my two cents :) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
💯 agree that there should be no
Yes, we would need really clear boundaries on what goes into the 'core' service should it be created. Defining what the services are and where the lines in the sand should be should be a separate discussion in my opinion. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The point that I tried to make was that "core" by definition has no clear boundary. Even if you make a document about its boundary, that doc is liable to be "improved" or ignored. Whereas a "project" service does have clear boundary by definition. Note: I'm just trying to be extra clear. I don't actually have a strong opinion on this. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Thank you for the clarification. I agree that 'core' will be magnet of increasing scope. 'projects' could be a good container for organization, project and related settings that those resources have. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should try to prevent the code of a service to directly access models and tables from another service by doing joins across service boundaries. Joining DB tables across modules would defy the idea of creating boundaries.
When referencing the module class this is not different than preventing a service from accessing the internals of another one, though with Django you can run raw SQL queries. Do we allow that to begin with, and if yes, should we prevent it ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. I considered recommending not exporting Django models from a service. The problem came up against though was that python's typehints will want an importable class to use in typehints, which would generally need to be the model 😢