aggregator and client refactor by LucasTrg · Pull Request #578 · epfml/disco

LucasTrg · 2023-05-03T12:18:11Z

In the following PR, we separate the aggregation logic to the communication one by splitting Clients into Client, Aggregator and Preparator.
This will allow to develop more advanced aggregation techniques, different encryption scheme, etc... as well as different network topologies in the future.

martinjaggi · 2023-05-23T14:26:37Z

server/tests/client/decentralized.spec.ts

-test('cleartext', clients.decentralized.ClearText)
-test('secure', clients.decentralized.SecAgg)
+test('cleartext', clients.decentralized.Base)
+test('secure', clients.decentralized.Base)


do both tests still work if called on the same?

I should actually delete the 'secure' one for the time being. As @swaglid explained to us, the current sec_agg is not working properly. My goal is to get this PR merged quick enough for him to begin re-implementing secure aggregation.

the secure decentralized one should be fine, it has unit tests and end2end tests. would be better to keep it or migrate it during the refactor (maybe walid can help?)

did you check with him how it would have to be modified to become compatible with the new aggregator concept here (does the one here include decentralized and federated actually?)

server/tests/client/decentralized.spec.ts

martinjaggi · 2023-05-23T14:27:54Z

discojs/discojs-core/src/default_tasks/index.ts

 export { titanic } from './titanic'
 export { simpleFace } from './simple_face'
 export { geotags } from './geotags'
+export { mnistBandit } from './mnist_bandit'


would you mind having the bandit later in a separate PR, after the refactor (the refactor is bigger)

That's a good idea. It'll make the testing a little bit easier and reduce the scope of the PR a little

discojs/discojs-core/src/client/decentralized/base.ts

discojs/discojs-core/src/aggregator/mean_agg.ts

discojs/discojs-core/src/aggregator/bandit_agg.ts

discojs/discojs-core/src/aggregator/base.ts

discojs/discojs-core/src/client/decentralized/base.ts

discojs/discojs-core/src/training/trainer/distributed_trainer.ts

discojs/discojs-core/src/types.ts

server/tests/end_to_end/decentralized.spec.ts

s314cy · 2023-05-24T10:07:01Z

thanks very much for the changes! let's try and fix the CI this week so that we can tackle the merge conflicts the next one :)

there still remains a few changes to make server-side, right? such as making use of the aggregator classes

LucasTrg

Overall LGTM, looking forwrd to test the edge case we might run into wth the bandit !

discojs/discojs-core/src/aggregator/bandit.ts

discojs/discojs-core/src/aggregator/base.ts

discojs/discojs-core/src/aggregator/index.ts

discojs/discojs-core/src/aggregator/robust.ts

LucasTrg · 2023-06-29T08:31:58Z

discojs/discojs-core/src/async_informant.ts

+      this._averageNumberOfParticipants = this.totalNumberOfParticipants / this.round
+      this._totalNumberOfParticipants += this.currentNumberOfParticipants
+    } else {
+      this._round = this.aggregator.round


Not quite sure of this edge case, or could we get a number of round bigger than the number of aggregation round ?
Maybe some comments in the code could help

this is here because the update method is called once per communication round, meaning possibly multiple times during a single training round, but the stats are over training rounds

but yes the code is unclear 😁

LucasTrg · 2023-06-29T08:35:14Z

discojs/discojs-core/src/async_informant.ts

This is making me think that there is a small distinction that we did not capture for participants. Namely, the participation graph can be directed ("I give you my model, but you did not give me yours"), and I don't think this current implementation would fit.
It's not a hard requirement yet, more of a food or thoughts comment

you are right, the current code allows one to define "what" kind of contribution the client should send to its neighbors, but not "who" it should send its contributions to

similarly, the client will expect a contribution (possibly many) from every active neighbor, with an eventual non-failing timeout in the case where no contribution was received, which is is not ideal because it adds peer idleness

both issues can be fully handled by the aggregator by keeping lists of desirable nodes to receive from/send to (subsets of active nodes), in order to differ from the set of nodes usually kept in sync between the client and aggregator

do you need these changes for bandit? would you like to make these changes yourself? if not, I'll make the changes in a separate PR

I think for the moment I can keep up with this discrepancy by sending empty payloads to the nodes that were not selected for the current round.
We can clean it up if the bandit happens to be a promising project

discojs/discojs-core/src/default_tasks/index.ts

discojs/discojs-core/src/default_tasks/mnist_bandit.ts

Outdated

martinjaggi

amazing work, and excellent tests added also. thanks a lot. only minor comments above

discojs/discojs-core/src/aggregator/base.ts

martinjaggi · 2023-07-03T16:09:41Z

discojs/discojs-core/src/aggregator/base.ts

+
+  protected informant?: AsyncInformant<T>
+  /**
+   * The result promise which, on resolve, will contain the current aggregation result.


want to say if it's a model (params) or model difference, or could be either?

it can be anything really, since the class is generic, but in the case of the subclasses, it can either be the model weights or model weights difference

currently, I'm pretty sure the weights passed to the client are the entire model, which will go back to model difference in the future polishing PR(s): write docs, support a directed communication graph, re-include DP (includes model diff & clipping), re-write byzantine-robustness, etc.

discojs/discojs-core/src/aggregator/base.ts

discojs/discojs-core/src/aggregator/get.ts

martinjaggi · 2023-07-03T16:24:00Z

discojs/discojs-core/src/aggregator/base.ts

+   */
+  log (step: AggregationStep, from?: client.NodeID): void {
+    switch (step) {
+      case AggregationStep.ADD:


add might be maybe misinterpreted as addition potentially, but here you mean more like save or register?

it corresponds to the aggregator's add method and will have a dedicated docstring in #580 so it'll make more sense then :)

martinjaggi · 2023-07-03T16:25:19Z

discojs/discojs-core/src/aggregator/base.ts

+      case AggregationStep.ADD:
+        console.log(`> Adding contribution from node ${from ?? '"unknown"'} for round (${this.communicationRound}, ${this.round})`)
+        return
+      case AggregationStep.UPDATE:


update meaning?

it corresponds to a node overriding its previous contribution with a new one, and will be clearer in #580 with its dedicated docstring

s314cy self-requested a review May 3, 2023 12:40

s314cy added feature New feature or request discojs Related to Disco.js decentralized For the decentralized setting labels May 3, 2023

martinjaggi reviewed May 23, 2023

View reviewed changes

martinjaggi changed the title ~~Bandit aggregator and client refactor~~ aggregator and client refactor May 23, 2023

s314cy previously requested changes May 24, 2023

View reviewed changes

s314cy force-pushed the bandit-agg branch from 7425f56 to 0c0f38a Compare June 5, 2023 11:26

s314cy force-pushed the bandit-agg branch from ea18705 to 6c5b2ea Compare June 12, 2023 11:58

s314cy force-pushed the bandit-agg branch from 16d91c8 to 5433bb2 Compare June 20, 2023 12:14

s314cy marked this pull request as ready for review June 28, 2023 13:36

s314cy force-pushed the bandit-agg branch from c2350bb to abd5b22 Compare June 29, 2023 08:46

s314cy requested a review from martinjaggi June 29, 2023 09:22

s314cy mentioned this pull request Jun 29, 2023

Adding the skin_mnist task to the default tasks #579

Merged

LucasTrg commented Jun 29, 2023

View reviewed changes

s314cy force-pushed the bandit-agg branch from abd5b22 to c851dca Compare July 3, 2023 13:48

martinjaggi requested changes Jul 3, 2023

View reviewed changes

s314cy force-pushed the bandit-agg branch from c851dca to f3c2482 Compare July 4, 2023 12:45

LucasTrg and others added 2 commits July 4, 2023 14:46

discojs+server: add aggregagtor class

abe1262

web-client: match disco lib changes

1040c77

s314cy force-pushed the bandit-agg branch from f3c2482 to 1040c77 Compare July 4, 2023 12:46

martinjaggi approved these changes Jul 4, 2023

View reviewed changes

s314cy merged commit 7f59232 into develop Jul 4, 2023

s314cy deleted the bandit-agg branch July 4, 2023 20:16

s314cy mentioned this pull request Sep 11, 2023

Byzantine robust aggregator #597

Open

Comments

Conversation

LucasTrg commented May 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

s314cy commented May 24, 2023

Uh oh!

LucasTrg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

s314cy Jun 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

martinjaggi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

s314cy Jul 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

LucasTrg commented May 3, 2023 •

edited

Loading

s314cy Jun 29, 2023 •

edited

Loading

s314cy Jul 4, 2023 •

edited

Loading