-
Notifications
You must be signed in to change notification settings - Fork 98
Implement monitoring for registry values #184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Associated pull request to distributed-process-tests: haskell-distributed/distributed-process-tests#15 |
@facundominguez please recheck my code when you'll have time. |
then do liftMX $ unmonitorAsync mref | ||
mxSetLocal $! pid `Map.delete` hm | ||
else mxSetLocal $ Map.insert pid (mref,i') hm | ||
_ -> return () |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The alternatives of this case can be rewritten to use Map.insertLookupWithKey
and Map.updateLookupWithKey
to do a single traversal. The effectful computation can be decided by examining the result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately both of those will not work here.
insertLookupWithKey
will not work as I have to havea
that is(MonitorRef, Int)
and I need to createMonitorRef
only if there is no value.updateLookupWithKey
returns either old value if value was deleted or changed value, so I have no chance
to understand what happens as value(mon,1)
can refer either to changed value or to removed.
This is looking good, just working through the tests. |
LGTM provided I didn't break it with my edits. Better to squash everything into one patch before merging. |
Yes, also I'm planning to merge this after #199. To make integration process more clean. |
Prepared for a merge. |
-- When a remote process is registered, the agent starts monitoring it until it | ||
-- is unregistered or the monitor notification arrives. | ||
-- | ||
-- The agen keeps the amount of labels associated with each registered remote |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo
Can we get a link to the actual issue? I have no idea what DP-100 means. Nor the context for why this needs to go in d-p as opposed to where the source code originally came from (where?). Update the title of this PR accordingly. "Dp 100" means nothing. |
This is issue #212 |
nid <- getSelfNode | ||
-- For each process the map associates the 'MonitorRef' used to monitor it and | ||
-- the amount of labels associated with it. | ||
mxAgent registryMonitorAgentId (Map.empty :: Map ProcessId (MonitorRef, Int)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any downside to simply having as many MonitorRef
's as there are registered labels, rather than reference counting? I don't think the memory saving (if any) is worth it, seeing as multiple labels for the same ProcessId
is not the common case. Dropping the reference counting would make the code simpler, I reckon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that will work, the differences are memory footprint and complexity of the MxUnRegistered
case, that will be slightly higher (we will need traverse the structure and remove MonitorRef
). MxRegistered
will be simplified, another one will be more or less the same.
So I'm open to this change.
@facundominguez, any thoughts here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced the rewriting is worth the trouble. We already have a reasonable PR. Your call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
complexity of
MxUnRegistered
case, that will be slightly higher (we will need traverse the structure and removeMonitorRef
).
I don't see it. The following is untested but compiles:
case ev of
MxRegistered pid _
| processNodeId pid /= nid -> do
mref <- liftMX $ monitor pid
mxUpdateLocal (Map.insert pid mref)
MxUnRegistered pid _
| processNodeId pid /= nid -> do
mrefs <- mxGetLocal
forM_ (pid `Map.lookup` mrefs) $ \mref -> do
liftMX $ unmonitorAsync mref
mxUpdateLocal (Map.delete pid)
_ -> return ()
I expect the memory usage to be pretty much the exactly the same as with reference counting: monitors are cheap, and registering a process under multiple labels is uncommon anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code above do not solve the problem that original version intended to solve, i.e. if process is registered multiple times but unregistered only once then it will be removed from map, but it shouldn't. Also I'm not terribly happy about "leaking" mrefs. I have to check code in order to understand if its safe to do. Also here we have a monitor call for each registered process (I can agree that it's a rare case).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops - late at night pasted wrong code snippet. I meant to paste this one:
case ev of
MxRegistered pid label
| processNodeId pid /= nid -> do
mref <- liftMX $ monitor pid
mxUpdateLocal (Map.insert label mref)
MxUnRegistered pid label
| processNodeId pid /= nid -> do
mrefs <- mxGetLocal
forM_ (label `Map.lookup` mrefs) $ \mref -> do
liftMX $ unmonitorAsync mref
mxUpdateLocal (Map.delete label)
_ -> return ()
i.e. as the previous comment was saying, one monitor ref per label, not per process. The only difference with the previous snippet is the key of the map. Everything else is the same. BUT, there is a complication, as you point out. It's just that it's not in the MxUnregistered
case but in the ProcessMonitorNotification
case, which when written out becomes something like:
mxSink $ \(ProcessMonitorNotification mref _ _) -> do
mxUpdateLocal $ Map.filter (== mref)
mxReady
I assume you meant to write ProcessMonitorNotification
instead of MxUnregistered
. The code is just as simple, but note the linear operation, which I also assume is what you were referring to. It's the same linear operation that's already here, over the exact same set of labels. Processes shouldn't be overusing the registry to register thousands of processes, so that's just about ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about Monoid m => Map ProcesaId (m MonitorRef)
in that case complexity is in MxUnregister
because delete is linear or logarithmic there. So I think I was correct but thought about different solution in design space.
About this code we need to check that reregistration MxUnregistred
then MxRegistered
exactly in this order, otherwise MxRegistered
case is not correct, but I hope that ordering and existence of messages are provided.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, if events get dropped then even the current code will have trouble. Events on the management bus shouldn't get dropped.
If we must assume that messages do get reordered (because the Management API documentation claims to provide no guarantees), or dropped or duplicated, then I'm not sure that using the Management API for building this functionality is a great option after all.
As @facundominguez says, not worth agonizing over the implementation of this function, since it's basically correct as-is. Though on the assumption that the Management API does provide some guarantees. @hyperthunk could you confirm what those are precisely?
Implement registry agent. Registry agent is an agent that monitors remote processes that were added to registry and removes them from registry in case if they will die. This semantics is required for sound implementation of remote nodes stored in registry.
I've addressed comment about BangPatterns, about guarantees that we have, it seems that a. messages couldn't be lost, b. MxUnregistered message will be missing on reregistering: So I want to introduce test for that issue and check if it's a case, or I just miss some bits in code. |
@qnikst I don't think this is a test issue. It's a design issue. We can't assume make assumptions based on the current implementation of an API - we can only make the assumptions that the documentation for that API allows us to make. This PR is currently blocked on feedback from @hyperthunk about the intended guarantees of the Management API. Actually this question should probably be its own Github issue, with this PR marked as blocked on that. |
@qnikst though, rereading your comment - it may be that you've uncovered a bug, independent of the design level discussion about the guarantees of the Management API. |
Guys... Can this be merged once I've resolved #241, which I plan to address today? Also, the tests will need migrating (and master rebased onto this branch, so we can see if travis is happy)... |
Yes, I can rebase everything on the top of the master and check that CI passes, then this PR may be merged. Any objections? |
Could this either, be folded into Other than that, no objection. |
Really we do not have a common policy about where agents should be put. I'm ok with moving that to C.D.P.Node. |
In this case - isn't this a "service process" like any other? If so it should just go where the other processes are. |
Technically it's not a ordinary process, but I agree with this reasoning. |
I'm okay with it living in Node.hs too, or being internal. The code in Node.hs could get a bit cluttered if we keep adding to it. Note that the implementation of the other service processes is currently scattered - the simple built in logger (that backs |
Ok I have something to add to this... I am working on improving and fixing the management event bus. Currently some events aren't firing, or could potentially fire in the wrong order (I.e., receive before send - although we don't guarantee causal ordering, for MxSend and MxReceived in the case of two local processes, we can guarantee it, and there I believe we should endeavour to do so). Anyway, in the process of pulling that path together, I realise that now the management event bus is designed the way it is, the complex trace controller process can be rewritten as a management agent, and we will remove a lot of garbage from the code base by doing that, and simplify the code a lot. So I wonder if we should have |
CI failed, but only for GHC 7.1 - do we really want to support that far back? Also, I suspect we need to rebase with master here. |
There's no such thing as 7.1. I think this a case of writing |
Closed in favor of #322 |
Implement remote Process monitoring as Agent.