Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Tree#count_recursive #464

Merged
merged 16 commits into from
Mar 9, 2015
6 changes: 6 additions & 0 deletions ext/rugged/rugged.h
Original file line number Diff line number Diff line change
Expand Up @@ -147,6 +147,12 @@ struct rugged_cb_payload
int exception;
};

struct rugged_treecount_cb_payload
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't need to be in the external header because it's only used for the recursive count. Can you move it next to the function? That'll make the implementation easier to follow.

(the other payloads are here because they are used by different files)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely. I actually moved it in here as that's where it looked like all of the other payloads were defined. I'll move it back.

{
unsigned int count;
unsigned int limit;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're using -1 as a "no limit" signifier below, so either this has to go back to a signed type (because otherwise payload->limit >= 0 will always be true) or you can special case 0 instead.

Personally, I would probably make these both ssize_t values. If ssize_t isn't easily usable from within Rugged for portability reasons, you could make both simple int values, too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally correct, was just chatting about that with @brianmario. I think just int is probably fine. What are the portability concerns with ssize_t?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just know in libgit2 we used to avoid it in public headers because they way you include the typedef for it can vary from platform to platform. (Interestingly, it looks like it had recently crept in by way of the new include/git2/sys/stream.h header.)

};

struct rugged_remote_cb_payload
{
VALUE progress;
Expand Down
58 changes: 58 additions & 0 deletions ext/rugged/rugged_tree.c
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,63 @@ static VALUE rb_git_tree_entrycount(VALUE self)
return INT2FIX(git_tree_entrycount(tree));
}

static int rugged__treecount_cb(const char *root, const git_tree_entry *entry, void *data)
{
struct rugged_treecount_cb_payload *payload = data;

if (payload->limit >= 0 && payload->count >= payload->limit)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add braces here? I know we generally don't use braces for single-line bodies, but I feel this looks a bit weird because of the last else condition which spans multiple lines.

return -1;
else if(git_tree_entry_type(entry) == GIT_OBJ_TREE)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: Add a whitespace after if.

return 0;
else {
++(payload->count);
return 1;
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately in the libgit2 git_tree_walk API, there is no maximum depth check and the implementation is recursive, so there are some theoretical DOS vectors here. I think probably the best bet is to fix it at the libgit2 level and either reimplement git_tree_walk using the git_iterator_for_tree API or simply add a depth check internal to the API. (It would be nice to reflect the depth as a parameter to the callback, but that would be a breaking API change, so probably not worth it.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, good point. Rather than just counting blob, I could count all tree entries (including subtrees). This means that providing a limit will give protection from at least one more DOS vector. Would that cover the case you're thinking of?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regrettably, I think not. The issue is actually hard to compensate for except at the libgit2 level (without introducing a waste of time scanning path data) - namely that you can create a 100,000 level deep tree and not hit your walk limit, but burn up a huge amount of stack space internal to libgit2. Let me look over there a bit...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fwiw MRI cannot stack overflow because it protects the end of the stack in the VM and will gracefully raise a StackOverflow exception into Ruby-land if a C extension overflows... So I'm not sure I'd consider this an attack vector.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thank you @vmg for bringing actual knowledge to the table. 😽


/*
* call-seq:
* tree.count_recursive(limit=nil) -> count
*
* `limit` - The maximum number of blobs to the count in the repository.
* Rugged will stop walking the tree after `limit` items to avoid long
* execution times.
*
* Return the number of blobs (up to the limit) contained in the tree and
* all subtrees.
*/
static VALUE rb_git_tree_entrycount_recursive(int argc, VALUE* argv, VALUE self)
{
git_tree *tree;
Data_Get_Struct(self, git_tree, tree);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have to move this statement below other variable declarations, as this generates a warning. (See the travis build logs).

int error;
struct rugged_treecount_cb_payload payload;
VALUE rbLimit;

if (rb_scan_args(argc, argv, "01", &rbLimit) == 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove the if condition and change this to just:

rb_scan_args(argc, argv, "01", &rbLimit);

rb_scan_args will set rbLimit to nil if it was not passed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only use caps for class and module names, so rbLimit should be rb_limit here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arthurschreiber Coulda sworn I tried that, but that's way better. I'll switch to that.

@vmg Still getting used to the C-ext conventions. Thanks.

rbLimit = Qnil;

if (NIL_P(rbLimit))
payload.limit = -1;
else if (TYPE(rbLimit) != T_FIXNUM)
rb_raise(rb_eTypeError, "limit must be a Fixnum");
else
payload.limit = FIX2INT(rbLimit);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could maybe be simplified to something like:

if (!NIL_P(rbLimit)) {
  Check_Type(rbLimit, T_FIXNUM);
  payload.limit = FIX2INT(rbLimit);
}

Check_Type will raise an ArgumentError for you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh cool. I'll switch it to this. Didn't know about Check_Type.


payload.count = 0;

error = git_tree_walk(tree, GIT_TREEWALK_PRE, &rugged__treecount_cb, (void *)&payload);

if(error && giterr_last()->klass == GITERR_CALLBACK) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat. 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: Add a whitespace after if.

giterr_clear();
error = 0;
}

rugged_exception_check(error);

return INT2FIX(payload.count);
}

/*
* call-seq:
* tree[e] -> entry
Expand Down Expand Up @@ -813,6 +870,7 @@ void Init_rugged_tree(void)
*/
rb_cRuggedTree = rb_define_class_under(rb_mRugged, "Tree", rb_cRuggedObject);
rb_define_method(rb_cRuggedTree, "count", rb_git_tree_entrycount, 0);
rb_define_method(rb_cRuggedTree, "count_recursive", rb_git_tree_entrycount_recursive, -1);
rb_define_method(rb_cRuggedTree, "length", rb_git_tree_entrycount, 0);
rb_define_method(rb_cRuggedTree, "get_entry", rb_git_tree_get_entry, 1);
rb_define_method(rb_cRuggedTree, "get_entry_by_oid", rb_git_tree_get_entry_by_oid, 1);
Expand Down
3 changes: 3 additions & 0 deletions test/tree_test.rb
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,9 @@ def test_read_tree_data
assert_equal @oid, @tree.oid
assert_equal :tree, @tree.type
assert_equal 3, @tree.count
assert_equal 6, @tree.count_recursive
assert_equal 5, @tree.count_recursive(5)
assert_equal 6, @tree.count_recursive(10)
assert_equal "1385f264afb75a56a5bec74243be9b367ba4ca08", @tree[0][:oid]
assert_equal "fa49b077972391ad58037050f2a75f74e3671e92", @tree[1][:oid]
end
Expand Down