-
Notifications
You must be signed in to change notification settings - Fork 0
ActiveRecord Refactoring
We intend to clean up some of the internals of active record, if you’re interested in helping out, add your ideas here.
- To allow hooks for pluggable cache strategies
- SQL SELECT table_name.id CONDITIONS
- Less DB traffic ( single column only )
- [pk1, pk2, pk3] => Cache Lookups via Memcached
Maybe? What else would we gain?
For one, the need to call reload all over the place would mostly disappear and the stale data bugs those calls to reload represent would go away.
It is also highly valuable as an object-level cache. You can pull records from the db/memcache in the most efficient way possible and then use them naturally, say through associations, without having to worry about i/o performance characteristics.
There are two approaches I’m researching.
- A simple hash, like datamapper, using a simple block approach for clearing it after a request, like the query cache.
- A weak hash implementation where no explicit clearing is needed, the garbage collector handles it for us or not – datamapper guys talking about weakrefs on irc
See one basic implementation in the active_record_context plugin. It’s been serving me very well in Lighthouse.
I second the hash per request implementation. The major problem I ran into implementing a weak hash was record instances living across requests, which lead to unpredictably stale data once you had more than one process. We ended up flushing the map between each request.
with_scope is currently implemented in about 15 seperate spaces, as is the meaning of nested scopes. It’d be nice to tidy up the implementation to something a little more coherent:
- Defined semantics for scope merging
- A single location for all of the implementations.
This will probably lead to some hints as to a much nicer query api too.
All that SQL String manipulation is kinda lame, it’d be nice to hide that all behind a library of some sort, or at least some simple ‘Query’ classes.
- Move all SQL logic out of associations (rely on with_scope) so associations can be extracted to ActiveModel (pipe dream)
Rafis23 adds: I don’t see why this is a pipe dream. Associations are just data relations expressible in Arel. They do not rely on a database at all, so their Ruby implementation should not be tied to a database.
Refactor and extract extractable things, such as validations/errors, observers and callbacks, into includeable modules (shameless self promote: http://github.com/leethal/rails/tree/activemodel_cleanup/activemodel)
Nick’s crazy ideas
I think what I’d like to see happen to AR is more radical, in that I’d like to experiment with new ideas…
I’d like to “generalize” named_scope to underly all AR querying. Not implementation-wise, but spirit-wise: I think all queries should be composable and “generative”:
User.all.conditions(…).limit(…)
This would lead to query objects as mentioned above, and could provide the basis for the DM strategy of “strategic eager loading”. I also think we should embrace eager loading and throw away ALL left outer join code. The few edge cases we’re supporting right now are not worth the enormous complexity wrt/ handling limits and offsets and so on…
Yehuda has argued against the current connection pooling implementation in AR saying that it is not ideal in a threaded environment. This requires more investigation to confirm, but it’d be nice if we could throw out the connection adapters to be replaced by Data Objects, which allegedly solves this problem among others.
I am totally sold on DM’s support for multiple datasources. ActiveModel is a good thought but the problem is not just supporting validations and callbacks outside of a db context, but having a uniform query interface. I think there is tremendous value in this, and though I recognize the concerns that DHH and Michael have wrt/ to different performance profiles of different datasources, I feel that these can be addressed independently. I had a chance to audit both Twitter’s and Friends for Sale’s codebases recently. The extensive use of denormalized memcaching (used as a write-through cache by FFS) was impressive. Memcache is a database! We should be able to just put ARs into memcache.
I’d like to “generalize” the idea of associations. Named_scope is in some use-cases a band-aid around the fact that not all associations are simple has_many’s and so forth. It’d be nice to say something like has_relation … and express any arbitrary generative query. This is NOT the same thing as find_by_sql as the latter cannot be paginated, composed with named_scopes, etc..
In fact, I think all subclasses of AR::Base should be able to set arbitrary queries instead of just the table. In some sense, STI is already the start of this. The “table” for Image < Asset is the query “select * from assets where type = ‘Image’”. Now consider Acts as Paranoid, a straightforward variation on this theme. Now generalize this to support arbitrary queries, even perhaps joins. Is this an intractable problem in the presence of writes? No, in effect you just need to support the :create and :find with_scopes as you already have.
I’d also like to generalize the way that validations work. I have become a big fan of using observers rather than callbacks in all but the most trivial cases. The nice thing about observers is that you can bolt different observers on in different contexts; so it’s easy to do things like creating a user through the web site sends a confirmation email, but through the admin web site sends no confirmation email. I’d like validations too to be publish/subscribe in the way observers are. And, to put this very vaguely (the idea isn’t fully formed), I’d like to use this as the basis of making ActsAsStateMachine a fundamental part of ActiveRecord. Sometimes the worst (most complex and hard-to-follow) AR code I see is code that would benefit from AaSM.
In terms of simple code maintainability, I feel that the test suite is creaking under its own weight. I was pleasantly surprised when playing around with various AR refactorings how extensive and useful the test coverage is. But time and again the organization of the tests, the way many tests are written with dozens of assertions and assertions in loops, the use of fixtures, etc. it all keeps biting me.
In general, I think the AR code would benefit from being broken out into more files; 2000 line files are hard to manage, things just aren’t grouped conceptually. It’d be nice to deprecate all of the various aliased methods in base.rb.
As for the associations code, I’ve had this idea that an AR class is in effect a repository; it represents a relation and can read from and write to that relation. In this sense, associations are the same thing as an AR class; no extra code or class hierarchy necessary. If AR classes support arbitrary queries (instead of simple set_table_name), then an association literally needs no more infrastructure, not an extra line of code beyond the implementation of the macros. To handle some of the proxy magic (such as supporting class methods on the AR class), we could potentially use subclassing rather than method_undefinition and delegation, though I’m uncertain about this last idea.
It’d be nice to use dirty tracking to have cascaded saves work for new and non-new objects.
Finally, unrelated to AR, the biggest hole in the Rails framework right now is that there is not One Preferred Way of doing asynchronous background jobs. These are so common for image resizing, sms delivery, facebook api crap. Every codebase I look at has a completely different way of doing it, and it’s always a huge amount of infrastructure—queues and daemons—that takes week to make robust. It’s a big part of the puzzle of why FFS scaled well and why Twitter scales as far as it does.
Rafis23 adds: I agree with Nick. To clarify, an association isn’t even an AR class, it’s an ActiveModel class. It, like every other query, is simply a relation. These relations should be coded at the object level, in Ruby (i.e. Arel). They are relational algebra concepts independent of a database.