Question about locking #152

cantino · 2015-04-30T22:42:13Z

Hey Matthew, thanks for this great project!

We would like to use closure_tree in our product to provide multi-level task lists, but we have concerns around the locking mechanisms. Specifically, it looks like a global advisory lock would be obtained on the Task model whenever _ct_before_destroy is called. I believe this means that only one process could delete a Task at a time? Our Task table is very high volume and we can't realistically lock the full table around deletes. Are the_ct_before_destroy and rebuild! processes themselves non-multi-process-safe, or are these wrapped to protect find_or_create_by_path (which we don't need)? Since maintenance is performed by ActiveRecord callbacks, I'd expect them to be consistent because they live within the ActiveRecord transaction.

If you can help me better understand why the locking is needed, I'd be happy to write up a more detailed description for the README.

The text was updated successfully, but these errors were encountered:

mceachen · 2015-05-01T02:42:34Z

My use case is read, not write heavy, so I hadn't really thought of optimizing the lock. It's true that the entire hierarchy table doesn't need to be locked for every edit. If two edits happen concurrently in separate sub trees, those shouldn't need to block. If two concurrent edits happen in the same tree, though, they should block during rebuild. You could change the lock name to include the root node id—this would block edits in a given subtree, but not block edits otherwise. You might want to try this out, but think about it, concurrency is really hard to test and reason through.

cantino · 2015-05-01T18:45:35Z

Thanks @mceachen. What I'm not clear about is, if edits are happening inside of the ActiveRecord commit transaction, why do we need a table lock at all?

mceachen · 2015-05-02T02:39:59Z

Transactions don't prevent concurrent writes to a table. The rebuild method does bulk update calls.

cantino · 2015-05-07T04:08:05Z

Thanks for clarifying, @mceachen. I think we're going to use a materialized path approach instead since we have a write-heavy application.

mceachen · 2015-05-07T04:31:44Z

Just for reference, I wrote this gem precisely because materialized paths
don't scale well. That approach requires a table lock per hierarchy
change, and for random edits, you're updating half the rows in your table
per insert. If they aren't locking, concurrent edits are not being done
safely.

It depends on your write patterns, of course. Benchmark different
approaches by replaying production traffic logs, if you can.

cantino · 2015-05-07T17:53:59Z

Thanks @mceachen, I really appreciate your time and insights, and am concerned that I'm missing something important for the scalability of our efforts.

We're updating the materialized paths of any sub-tasks based on the change of a task's parent_id. The update is a single SQL statement:

self.class.where("materialized_path like ?", old_path_string + "%").update_all ["materialized_path = REPLACE(materialized_path, ?, ?)", old_path_string, path_string]

If this is happening inside of a Rails ActiveRecord transaction that also contains the change to parent_id itself, why would we have concurrency issues? I could imagine deadlock issues, but I don't see how the data could end up in an inconsistent state. However, I readily admit that I may not understand transactions as well as I should.

/cc @randycoulman, @shirish-pampoorickal

mceachen · 2015-05-07T18:47:59Z

Bah, I mispoke last night and confused materialized paths with nested sets. Nested sets have horrible performance on write time (as the ordinal for affected nodes have to be updated). The issue with materialized paths is that you're leveraging string indexes to do the work, so writes can be fine, but read performance suffers as the table size grows. (The impact obviously depends on the rdbms, hardware, and data you're using—just try it out in the repl and create an example dataset with 1mm rows, and see how it works).

cantino · 2015-05-07T19:18:07Z

Gotcha. That's a valid concern, and we will need to keep an eye on our read performance. Thanks for all your help!

mceachen closed this as completed May 8, 2015

mceachen added the Question label Jun 13, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about locking #152

Question about locking #152

cantino commented Apr 30, 2015

mceachen commented May 1, 2015 via email

cantino commented May 1, 2015

mceachen commented May 2, 2015 via email

cantino commented May 7, 2015

mceachen commented May 7, 2015

cantino commented May 7, 2015

mceachen commented May 7, 2015 via email

cantino commented May 7, 2015

Question about locking #152

Question about locking #152

Comments

cantino commented Apr 30, 2015

mceachen commented May 1, 2015 via email

cantino commented May 1, 2015

mceachen commented May 2, 2015 via email

cantino commented May 7, 2015

mceachen commented May 7, 2015

cantino commented May 7, 2015

mceachen commented May 7, 2015 via email

cantino commented May 7, 2015