Implement monitoring for registry values #184

qnikst · 2015-02-25T19:48:29Z

Implement remote Process monitoring as Agent.

qnikst · 2015-02-25T19:55:33Z

Associated pull request to distributed-process-tests: haskell-distributed/distributed-process-tests#15

qnikst · 2015-02-25T19:56:01Z

@facundominguez please recheck my code when you'll have time.

facundominguez · 2015-02-27T12:12:14Z

src/Control/Distributed/Process/Node/RegistryAgent.hs

+                              then do liftMX $ unmonitorAsync mref
+                                      mxSetLocal $! pid `Map.delete` hm
+                              else mxSetLocal $ Map.insert pid (mref,i') hm
+                    _ -> return ()


The alternatives of this case can be rewritten to use Map.insertLookupWithKey and Map.updateLookupWithKey to do a single traversal. The effectful computation can be decided by examining the result.

Unfortunately both of those will not work here.

insertLookupWithKey will not work as I have to have a that is (MonitorRef, Int) and I need to create MonitorRef only if there is no value.

updateLookupWithKey returns either old value if value was deleted or changed value, so I have no chance
to understand what happens as value (mon,1) can refer either to changed value or to removed.

hyperthunk · 2015-03-07T07:46:49Z

This is looking good, just working through the tests.

facundominguez · 2015-06-19T14:07:00Z

LGTM provided I didn't break it with my edits.

Better to squash everything into one patch before merging.

qnikst · 2015-06-19T14:10:06Z

Yes, also I'm planning to merge this after #199. To make integration process more clean.

qnikst · 2015-06-19T14:29:38Z

Prepared for a merge.

mboes · 2015-06-19T15:29:54Z

src/Control/Distributed/Process/Node/RegistryAgent.hs

+-- When a remote process is registered, the agent starts monitoring it until it
+-- is unregistered or the monitor notification arrives.
+--
+-- The agen keeps the amount of labels associated with each registered remote


mboes · 2015-06-19T15:37:34Z

Can we get a link to the actual issue? I have no idea what DP-100 means. Nor the context for why this needs to go in d-p as opposed to where the source code originally came from (where?). Update the title of this PR accordingly. "Dp 100" means nothing.

facundominguez · 2015-06-19T15:48:02Z

https://cloud-haskell.atlassian.net/browse/DP-100

qnikst · 2015-06-19T16:19:02Z

This is issue #212

mboes · 2015-06-19T19:38:04Z

src/Control/Distributed/Process/Node/RegistryAgent.hs

+  nid <- getSelfNode
+  -- For each process the map associates the 'MonitorRef' used to monitor it and
+  -- the amount of labels associated with it.
+  mxAgent registryMonitorAgentId (Map.empty :: Map ProcessId (MonitorRef, Int))


Any downside to simply having as many MonitorRef's as there are registered labels, rather than reference counting? I don't think the memory saving (if any) is worth it, seeing as multiple labels for the same ProcessId is not the common case. Dropping the reference counting would make the code simpler, I reckon.

Yes, that will work, the differences are memory footprint and complexity of the MxUnRegistered case, that will be slightly higher (we will need traverse the structure and remove MonitorRef). MxRegistered will be simplified, another one will be more or less the same.
So I'm open to this change.

@facundominguez, any thoughts here?

I'm not convinced the rewriting is worth the trouble. We already have a reasonable PR. Your call.

complexity of MxUnRegistered case, that will be slightly higher (we will need traverse the structure and remove MonitorRef).

I don't see it. The following is untested but compiles:

case ev of MxRegistered pid _ | processNodeId pid /= nid -> do mref <- liftMX $ monitor pid mxUpdateLocal (Map.insert pid mref) MxUnRegistered pid _ | processNodeId pid /= nid -> do mrefs <- mxGetLocal forM_ (pid `Map.lookup` mrefs) $ \mref -> do liftMX $ unmonitorAsync mref mxUpdateLocal (Map.delete pid) _ -> return ()

I expect the memory usage to be pretty much the exactly the same as with reference counting: monitors are cheap, and registering a process under multiple labels is uncommon anyways.

The code above do not solve the problem that original version intended to solve, i.e. if process is registered multiple times but unregistered only once then it will be removed from map, but it shouldn't. Also I'm not terribly happy about "leaking" mrefs. I have to check code in order to understand if its safe to do. Also here we have a monitor call for each registered process (I can agree that it's a rare case).

Oops - late at night pasted wrong code snippet. I meant to paste this one:

case ev of MxRegistered pid label | processNodeId pid /= nid -> do mref <- liftMX $ monitor pid mxUpdateLocal (Map.insert label mref) MxUnRegistered pid label | processNodeId pid /= nid -> do mrefs <- mxGetLocal forM_ (label `Map.lookup` mrefs) $ \mref -> do liftMX $ unmonitorAsync mref mxUpdateLocal (Map.delete label) _ -> return ()

i.e. as the previous comment was saying, one monitor ref per label, not per process. The only difference with the previous snippet is the key of the map. Everything else is the same. BUT, there is a complication, as you point out. It's just that it's not in the MxUnregistered case but in the ProcessMonitorNotification case, which when written out becomes something like:

mxSink $ \(ProcessMonitorNotification mref _ _) -> do mxUpdateLocal $ Map.filter (== mref) mxReady

I assume you meant to write ProcessMonitorNotification instead of MxUnregistered. The code is just as simple, but note the linear operation, which I also assume is what you were referring to. It's the same linear operation that's already here, over the exact same set of labels. Processes shouldn't be overusing the registry to register thousands of processes, so that's just about ok.

I was thinking about Monoid m => Map ProcesaId (m MonitorRef) in that case complexity is in MxUnregister because delete is linear or logarithmic there. So I think I was correct but thought about different solution in design space.
About this code we need to check that reregistration MxUnregistred then MxRegistered exactly in this order, otherwise MxRegistered case is not correct, but I hope that ordering and existence of messages are provided.

Well, if events get dropped then even the current code will have trouble. Events on the management bus shouldn't get dropped.

If we must assume that messages do get reordered (because the Management API documentation claims to provide no guarantees), or dropped or duplicated, then I'm not sure that using the Management API for building this functionality is a great option after all.

As @facundominguez says, not worth agonizing over the implementation of this function, since it's basically correct as-is. Though on the assumption that the Management API does provide some guarantees. @hyperthunk could you confirm what those are precisely?

Implement registry agent. Registry agent is an agent that monitors remote processes that were added to registry and removes them from registry in case if they will die. This semantics is required for sound implementation of remote nodes stored in registry.

qnikst · 2015-06-22T08:48:13Z

I've addressed comment about BangPatterns, about guarantees that we have, it seems that a. messages couldn't be lost, b. MxUnregistered message will be missing on reregistering:

https://github.com/haskell-distributed/distributed-process/blob/master/src/Control/Distributed/Process/Node.hs#L857-L874

So I want to introduce test for that issue and check if it's a case, or I just miss some bits in code.

mboes · 2015-06-22T09:19:03Z

@qnikst I don't think this is a test issue. It's a design issue. We can't assume make assumptions based on the current implementation of an API - we can only make the assumptions that the documentation for that API allows us to make. This PR is currently blocked on feedback from @hyperthunk about the intended guarantees of the Management API. Actually this question should probably be its own Github issue, with this PR marked as blocked on that.

mboes · 2015-06-22T09:20:31Z

@qnikst though, rereading your comment - it may be that you've uncovered a bug, independent of the design level discussion about the guarantees of the Management API.

qnikst · 2015-06-22T09:26:04Z

Blocked on #240 , #241

hyperthunk · 2015-10-07T14:20:31Z

@mboes is it just the blocking issues that you believe are a bug in the Management API? I'm going to try and address these with @qnikst.

hyperthunk · 2016-03-03T11:11:34Z

Guys... Can this be merged once I've resolved #241, which I plan to address today?

Also, the tests will need migrating (and master rebased onto this branch, so we can see if travis is happy)...

qnikst · 2016-03-03T11:19:06Z

Yes, I can rebase everything on the top of the master and check that CI passes, then this PR may be merged. Any objections?

mboes · 2016-03-03T12:05:10Z

Could this either, be folded into C.D.P.Node (ideally - do we really need a whole new module just for one definition?) or moved to the Internal.* namespace? AFAIU this agent is just yet another "service process" internal to the Node Controller.

Other than that, no objection.

qnikst · 2016-03-03T12:18:13Z

Really we do not have a common policy about where agents should be put. I'm ok with moving that to C.D.P.Node.

mboes · 2016-03-03T12:21:45Z

In this case - isn't this a "service process" like any other? If so it should just go where the other processes are.

qnikst · 2016-03-03T12:27:36Z

Technically it's not a ordinary process, but I agree with this reasoning.

hyperthunk · 2016-03-03T12:29:57Z

I'm okay with it living in Node.hs too, or being internal. The code in Node.hs could get a bit cluttered if we keep adding to it. Note that the implementation of the other service processes is currently scattered - the simple built in logger (that backs say) resides in Node.hs, but all the management, tracing, and debugging code resides elsewhere.

hyperthunk · 2016-03-03T18:37:11Z

Ok I have something to add to this... I am working on improving and fixing the management event bus. Currently some events aren't firing, or could potentially fire in the wrong order (I.e., receive before send - although we don't guarantee causal ordering, for MxSend and MxReceived in the case of two local processes, we can guarantee it, and there I believe we should endeavour to do so).

Anyway, in the process of pulling that path together, I realise that now the management event bus is designed the way it is, the complex trace controller process can be rewritten as a management agent, and we will remove a lot of garbage from the code base by doing that, and simplify the code a lot.

So I wonder if we should have Control.Distributed.Process.Internal.SystemProcesses (or multiple modules beneath that?) and put the say logger and your registry monitoring agent in there? And export startSystemProcesses from there, so the node controller remains ignorant and unaffected in future?

hyperthunk · 2016-03-11T15:59:47Z

CI failed, but only for GHC 7.1 - do we really want to support that far back? Also, I suspect we need to rebase with master here.

mboes · 2016-03-11T16:32:25Z

There's no such thing as 7.1. I think this a case of writing - 7.10 instead of - '7.10' in travis.yml...

hyperthunk · 2016-03-12T13:00:33Z

Ok @mboes that makes sense now. @qnikst I also need to fix reregister semantics to fire unregistered followed by registered events

qnikst · 2018-11-12T11:05:48Z

Closed in favor of #322

qnikst mentioned this pull request Feb 25, 2015

Registry #178

Closed

facundominguez reviewed Feb 27, 2015
View reviewed changes

qnikst added feature In Progress Question labels Jun 18, 2015

qnikst mentioned this pull request Jun 19, 2015

[DP-100] Implement monitoring for registry values #212

Open

mboes reviewed Jun 19, 2015
View reviewed changes

qnikst changed the title ~~Dp 100~~ Implement monitoring for registry values Jun 19, 2015

mboes reviewed Jun 19, 2015
View reviewed changes

Introduce RegistryAgent

12f08b2

Implement registry agent. Registry agent is an agent that monitors remote processes that were added to registry and removes them from registry in case if they will die. This semantics is required for sound implementation of remote nodes stored in registry.

qnikst added the blocked label Jun 22, 2015

hyperthunk mentioned this pull request Oct 7, 2015

Check that MxUnregistered event is produced whenerver Process leaves registry #240

Closed

hyperthunk mentioned this pull request Mar 3, 2016

Document guarantees provided by Management API #241

Closed

hyperthunk mentioned this pull request Nov 12, 2018

Prevent registered names from leaking #322

Merged

qnikst closed this Nov 12, 2018

Implement monitoring for registry values #184

Implement monitoring for registry values #184

Uh oh!

Conversation

qnikst commented Feb 25, 2015

Uh oh!

qnikst commented Feb 25, 2015

Uh oh!

qnikst commented Feb 25, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hyperthunk commented Mar 7, 2015

Uh oh!

facundominguez commented Jun 19, 2015

Uh oh!

qnikst commented Jun 19, 2015

Uh oh!

qnikst commented Jun 19, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mboes commented Jun 19, 2015

Uh oh!

facundominguez commented Jun 19, 2015

Uh oh!

qnikst commented Jun 19, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

qnikst commented Jun 22, 2015

Uh oh!

mboes commented Jun 22, 2015

Uh oh!

mboes commented Jun 22, 2015

Uh oh!

qnikst commented Jun 22, 2015

Uh oh!

hyperthunk commented Oct 7, 2015

Uh oh!

hyperthunk commented Mar 3, 2016

Uh oh!

qnikst commented Mar 3, 2016

Uh oh!

mboes commented Mar 3, 2016

Uh oh!

qnikst commented Mar 3, 2016

Uh oh!

mboes commented Mar 3, 2016

Uh oh!

qnikst commented Mar 3, 2016

Uh oh!

hyperthunk commented Mar 3, 2016

Uh oh!

hyperthunk commented Mar 3, 2016

Uh oh!

hyperthunk commented Mar 11, 2016

Uh oh!

mboes commented Mar 11, 2016

Uh oh!

hyperthunk commented Mar 12, 2016

Uh oh!

qnikst commented Nov 12, 2018

Uh oh!