Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Record Datastore metrics for flatfs and leveldb #1163

Merged
merged 5 commits into from
May 2, 2015
Merged

Record Datastore metrics for flatfs and leveldb #1163

merged 5 commits into from
May 2, 2015

Conversation

tv42
Copy link
Contributor

@tv42 tv42 commented Apr 29, 2015

Inspirational command lines:

curl -s http://localhost:5001/debug/vars | json-prettyprint
go tool pprof http://localhost:5001/debug/pprof/profile

@jbenet jbenet added the status/in-progress In progress label Apr 29, 2015
@jbenet
Copy link
Member

jbenet commented Apr 29, 2015

LGTM will merge when tests finish

@tv42
Copy link
Contributor Author

tv42 commented Apr 29, 2015

Okay I see the test failures, will do something about that..

@tv42
Copy link
Contributor Author

tv42 commented Apr 29, 2015

I used the sequence counter in the name because then real world use leaves the metrics names clean, and only tests and weird situations suffer. Normal runs have just one FSRepo, so normal runs get datastore.blocks.Put.latency.P95 and not datastore.blocks<unguessable junk>.Put.latency.P95 or something. I'm not happy with it either, but it seemed to have least downsides for actual use. If packageLock goes away reposCreated can still exist as an atomic counter.

It's in the FSRepo because that's where the leveldb and flatfs are opened, where mount.New is, and all knowledge of the shape of the datastore; that lets me put metrics at the right locations, yet only do the work once and not in all of leveldb, flatfs, etc.

@tv42
Copy link
Contributor Author

tv42 commented Apr 29, 2015

The travis failures above are:

Error: routing: not found
not ok 5 - cat that entry on a third node
No output has been received in the last ... minutes, this potentially indicates a stalled build or something wrong with the build itself.
--- FAIL: TestBootstrap (2.14s)
    dht_test.go:314: connecting 30 dhts in a ring
    dht_test.go:77: dial attempt failed
03:37:16.571 CRITI   boguskey: TestBogusPrivateKey.Sign -- this better be a test! key.go:54
FAIL

Those seem unrelated.

@cryptix
Copy link
Contributor

cryptix commented Apr 29, 2015

Changes look 💎 clean to me. Also tested it locally and found something juicy already... :)

@jbenet
Copy link
Member

jbenet commented Apr 29, 2015

Normal runs have just one FSRepo, so normal runs get datastore.blocks.Put.latency.P95 and not datastore.blocks<unguessable junk>.Put.latency.P95 or something.

~~This is not reusable outside of ipfs. it breaks abstraction. I can't use multiple metrics datastores in places. ~~ nevermind, it is. the prefix works.

Also, it is not correct that "Normal runs have just one FSRepo". I use tools that embed multiple ipfs nodes in one process.

This is not the right place. this should be done where the metrics are used: https://github.com/jbenet/go-datastore/tree/master/measure nevermind, datastore/metrics has the prefix. we just need to use the right prefix.

then again, it may be that prefix is complex to use (this here is a good example), and it is the right thing to fix it over in https://github.com/jbenet/go-datastore/tree/master/measure -- to make sure that every metrics datastore does not have the ability to clash with another. (i.e. either remove the prefix, or use an atomic counter there).

this reminds me of linux's insistence that we use XXX with tmpfiles.

I'm not happy with it either, but it seemed to have least downsides for actual use. If packageLock goes away reposCreated can still exist as an atomic counter.

Then an atomic counter may be the right thing. I really want to not rely on that package lock. ideally we can get rid of it now.

@jbenet
Copy link
Member

jbenet commented Apr 29, 2015

updated comment o/

@tv42
Copy link
Contributor Author

tv42 commented Apr 29, 2015

By "normal" I mean purely what my experience has been so far: ipfs daemon, ipfs add, etc.

I am not happy with the numerical uniqueness suffix thing. At all. But I couldn't come up with anything better, and it makes the tests pass.

As far as I can see, the only unique ID FSRepo currently has is the pathname of the config file. I really don't want to put that in the metrics identifier.

The only nicer solution I can see right now is somehow giving names of some sort to every FSRepo instance you open. Then your tools that embed multiple ipfs nodes in one process can name then in an understandable way (or fall back to numbers or hashes, with the hope that there is some meaning to the user there).

I'm really open to proposals on how you want this done. My constraints are:

  1. metrics are a global (per-process) namespace, by virtue of being visible under the same /debug/vars path
  2. the library this is using, and just overall simplicity, assumes that the metrics name is a flat string, with no real structure (it looks like foo.bar.baz, but there are no attributes etc)
  3. I would like to keep the simple case (ipfs daemon) very simple, to encourage their use

How do we proceed from here?

@jbenet
Copy link
Member

jbenet commented May 1, 2015

merged ipfs/go-datastore#20

@jbenet
Copy link
Member

jbenet commented May 1, 2015

(looks like this needs rebasing too?)

@tv42
Copy link
Contributor Author

tv42 commented May 1, 2015

And two commits on top, re-vendoring ipfs/go-datastore#20 and calling the Close. I have them waiting, will get to it later today.

@tv42
Copy link
Contributor Author

tv42 commented May 1, 2015

Alright this seems good to go, the Travis error is https://travis-ci.org/ipfs/go-ipfs/jobs/60868636#L2773-L2803 with t0101-iptb-name.sh saying "Error: routing: not found" and so on -- that seems unrelated. make test_sharness_short passes locally.

return mux, nil
}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this something we want to enable all the time? or does it not doing anything unless you access it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They're just handlers for /debug/vars and /debug/pprof, nothing happens unless somebody sends a HTTP request for those. I'm not sure I understand the questions.

@jbenet
Copy link
Member

jbenet commented May 2, 2015

This LGTM.

jbenet added a commit that referenced this pull request May 2, 2015
Record Datastore metrics for flatfs and leveldb
@jbenet jbenet merged commit fab5e91 into master May 2, 2015
@jbenet jbenet deleted the ds-metrics branch May 2, 2015 20:16
@jbenet jbenet removed the status/in-progress In progress label May 2, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants