Skip to content

Implement monitoring for registry values #184

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Implement monitoring for registry values #184

wants to merge 1 commit into from

Conversation

qnikst
Copy link
Contributor

@qnikst qnikst commented Feb 25, 2015

Implement remote Process monitoring as Agent.

@qnikst
Copy link
Contributor Author

qnikst commented Feb 25, 2015

Associated pull request to distributed-process-tests: haskell-distributed/distributed-process-tests#15

@qnikst
Copy link
Contributor Author

qnikst commented Feb 25, 2015

@facundominguez please recheck my code when you'll have time.

@qnikst qnikst mentioned this pull request Feb 25, 2015
then do liftMX $ unmonitorAsync mref
mxSetLocal $! pid `Map.delete` hm
else mxSetLocal $ Map.insert pid (mref,i') hm
_ -> return ()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The alternatives of this case can be rewritten to use Map.insertLookupWithKey and Map.updateLookupWithKey to do a single traversal. The effectful computation can be decided by examining the result.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately both of those will not work here.

  1. insertLookupWithKey will not work as I have to have a that is (MonitorRef, Int) and I need to create MonitorRef only if there is no value.
  2. updateLookupWithKey returns either old value if value was deleted or changed value, so I have no chance
    to understand what happens as value (mon,1) can refer either to changed value or to removed.

@hyperthunk
Copy link
Member

This is looking good, just working through the tests.

@facundominguez
Copy link
Contributor

LGTM provided I didn't break it with my edits.

Better to squash everything into one patch before merging.

@qnikst
Copy link
Contributor Author

qnikst commented Jun 19, 2015

Yes, also I'm planning to merge this after #199. To make integration process more clean.

@qnikst
Copy link
Contributor Author

qnikst commented Jun 19, 2015

Prepared for a merge.

-- When a remote process is registered, the agent starts monitoring it until it
-- is unregistered or the monitor notification arrives.
--
-- The agen keeps the amount of labels associated with each registered remote
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

@mboes
Copy link
Contributor

mboes commented Jun 19, 2015

Can we get a link to the actual issue? I have no idea what DP-100 means. Nor the context for why this needs to go in d-p as opposed to where the source code originally came from (where?). Update the title of this PR accordingly. "Dp 100" means nothing.

@facundominguez
Copy link
Contributor

@qnikst qnikst changed the title Dp 100 Implement monitoring for registry values Jun 19, 2015
@qnikst
Copy link
Contributor Author

qnikst commented Jun 19, 2015

This is issue #212

nid <- getSelfNode
-- For each process the map associates the 'MonitorRef' used to monitor it and
-- the amount of labels associated with it.
mxAgent registryMonitorAgentId (Map.empty :: Map ProcessId (MonitorRef, Int))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any downside to simply having as many MonitorRef's as there are registered labels, rather than reference counting? I don't think the memory saving (if any) is worth it, seeing as multiple labels for the same ProcessId is not the common case. Dropping the reference counting would make the code simpler, I reckon.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that will work, the differences are memory footprint and complexity of the MxUnRegistered case, that will be slightly higher (we will need traverse the structure and remove MonitorRef). MxRegistered will be simplified, another one will be more or less the same.
So I'm open to this change.

@facundominguez, any thoughts here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced the rewriting is worth the trouble. We already have a reasonable PR. Your call.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

complexity of MxUnRegistered case, that will be slightly higher (we will need traverse the structure and remove MonitorRef).

I don't see it. The following is untested but compiles:

        case ev of
          MxRegistered pid _
            | processNodeId pid /= nid -> do
              mref <- liftMX $ monitor pid
              mxUpdateLocal (Map.insert pid mref)
          MxUnRegistered pid _
            | processNodeId pid /= nid -> do
              mrefs <- mxGetLocal
              forM_ (pid `Map.lookup` mrefs) $ \mref -> do
                liftMX $ unmonitorAsync mref
                mxUpdateLocal (Map.delete pid)
          _ -> return ()

I expect the memory usage to be pretty much the exactly the same as with reference counting: monitors are cheap, and registering a process under multiple labels is uncommon anyways.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code above do not solve the problem that original version intended to solve, i.e. if process is registered multiple times but unregistered only once then it will be removed from map, but it shouldn't. Also I'm not terribly happy about "leaking" mrefs. I have to check code in order to understand if its safe to do. Also here we have a monitor call for each registered process (I can agree that it's a rare case).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops - late at night pasted wrong code snippet. I meant to paste this one:

        case ev of
          MxRegistered pid label
            | processNodeId pid /= nid -> do
              mref <- liftMX $ monitor pid
              mxUpdateLocal (Map.insert label mref)
          MxUnRegistered pid label
            | processNodeId pid /= nid -> do
              mrefs <- mxGetLocal
              forM_ (label `Map.lookup` mrefs) $ \mref -> do
                liftMX $ unmonitorAsync mref
                mxUpdateLocal (Map.delete label)
          _ -> return ()

i.e. as the previous comment was saying, one monitor ref per label, not per process. The only difference with the previous snippet is the key of the map. Everything else is the same. BUT, there is a complication, as you point out. It's just that it's not in the MxUnregistered case but in the ProcessMonitorNotification case, which when written out becomes something like:

mxSink $ \(ProcessMonitorNotification mref _ _) -> do
        mxUpdateLocal $ Map.filter (== mref)
        mxReady

I assume you meant to write ProcessMonitorNotification instead of MxUnregistered. The code is just as simple, but note the linear operation, which I also assume is what you were referring to. It's the same linear operation that's already here, over the exact same set of labels. Processes shouldn't be overusing the registry to register thousands of processes, so that's just about ok.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about Monoid m => Map ProcesaId (m MonitorRef) in that case complexity is in MxUnregister because delete is linear or logarithmic there. So I think I was correct but thought about different solution in design space.
About this code we need to check that reregistration MxUnregistred then MxRegistered exactly in this order, otherwise MxRegistered case is not correct, but I hope that ordering and existence of messages are provided.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, if events get dropped then even the current code will have trouble. Events on the management bus shouldn't get dropped.

If we must assume that messages do get reordered (because the Management API documentation claims to provide no guarantees), or dropped or duplicated, then I'm not sure that using the Management API for building this functionality is a great option after all.

As @facundominguez says, not worth agonizing over the implementation of this function, since it's basically correct as-is. Though on the assumption that the Management API does provide some guarantees. @hyperthunk could you confirm what those are precisely?

Implement registry agent. Registry agent is an agent that
monitors remote processes that were added to registry and
removes them from registry in case if they will die.

This semantics is required for sound implementation of
remote nodes stored in registry.
@qnikst
Copy link
Contributor Author

qnikst commented Jun 22, 2015

I've addressed comment about BangPatterns, about guarantees that we have, it seems that a. messages couldn't be lost, b. MxUnregistered message will be missing on reregistering:

https://github.com/haskell-distributed/distributed-process/blob/master/src/Control/Distributed/Process/Node.hs#L857-L874

So I want to introduce test for that issue and check if it's a case, or I just miss some bits in code.

@mboes
Copy link
Contributor

mboes commented Jun 22, 2015

@qnikst I don't think this is a test issue. It's a design issue. We can't assume make assumptions based on the current implementation of an API - we can only make the assumptions that the documentation for that API allows us to make. This PR is currently blocked on feedback from @hyperthunk about the intended guarantees of the Management API. Actually this question should probably be its own Github issue, with this PR marked as blocked on that.

@mboes
Copy link
Contributor

mboes commented Jun 22, 2015

@qnikst though, rereading your comment - it may be that you've uncovered a bug, independent of the design level discussion about the guarantees of the Management API.

@qnikst
Copy link
Contributor Author

qnikst commented Jun 22, 2015

Blocked on #240 , #241

@qnikst qnikst added the blocked label Jun 22, 2015
@hyperthunk
Copy link
Member

@mboes is it just the blocking issues that you believe are a bug in the Management API? I'm going to try and address these with @qnikst.

@hyperthunk
Copy link
Member

Guys... Can this be merged once I've resolved #241, which I plan to address today?

Also, the tests will need migrating (and master rebased onto this branch, so we can see if travis is happy)...

@qnikst
Copy link
Contributor Author

qnikst commented Mar 3, 2016

Yes, I can rebase everything on the top of the master and check that CI passes, then this PR may be merged. Any objections?

@mboes
Copy link
Contributor

mboes commented Mar 3, 2016

Could this either, be folded into C.D.P.Node (ideally - do we really need a whole new module just for one definition?) or moved to the Internal.* namespace? AFAIU this agent is just yet another "service process" internal to the Node Controller.

Other than that, no objection.

@qnikst
Copy link
Contributor Author

qnikst commented Mar 3, 2016

Really we do not have a common policy about where agents should be put. I'm ok with moving that to C.D.P.Node.

@mboes
Copy link
Contributor

mboes commented Mar 3, 2016

In this case - isn't this a "service process" like any other? If so it should just go where the other processes are.

@qnikst
Copy link
Contributor Author

qnikst commented Mar 3, 2016

Technically it's not a ordinary process, but I agree with this reasoning.

@hyperthunk
Copy link
Member

I'm okay with it living in Node.hs too, or being internal. The code in Node.hs could get a bit cluttered if we keep adding to it. Note that the implementation of the other service processes is currently scattered - the simple built in logger (that backs say) resides in Node.hs, but all the management, tracing, and debugging code resides elsewhere.

@hyperthunk
Copy link
Member

Ok I have something to add to this... I am working on improving and fixing the management event bus. Currently some events aren't firing, or could potentially fire in the wrong order (I.e., receive before send - although we don't guarantee causal ordering, for MxSend and MxReceived in the case of two local processes, we can guarantee it, and there I believe we should endeavour to do so).

Anyway, in the process of pulling that path together, I realise that now the management event bus is designed the way it is, the complex trace controller process can be rewritten as a management agent, and we will remove a lot of garbage from the code base by doing that, and simplify the code a lot.

So I wonder if we should have Control.Distributed.Process.Internal.SystemProcesses (or multiple modules beneath that?) and put the say logger and your registry monitoring agent in there? And export startSystemProcesses from there, so the node controller remains ignorant and unaffected in future?

@hyperthunk
Copy link
Member

CI failed, but only for GHC 7.1 - do we really want to support that far back? Also, I suspect we need to rebase with master here.

@mboes
Copy link
Contributor

mboes commented Mar 11, 2016

There's no such thing as 7.1. I think this a case of writing - 7.10 instead of - '7.10' in travis.yml...

@hyperthunk
Copy link
Member

Ok @mboes that makes sense now. @qnikst I also need to fix reregister semantics to fire unregistered followed by registered events

@qnikst
Copy link
Contributor Author

qnikst commented Nov 12, 2018

Closed in favor of #322

@qnikst qnikst closed this Nov 12, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants