Skip to content

Comments

aggregator and client refactor#578

Merged
s314cy merged 2 commits intodevelopfrom
bandit-agg
Jul 4, 2023
Merged

aggregator and client refactor#578
s314cy merged 2 commits intodevelopfrom
bandit-agg

Conversation

@LucasTrg
Copy link
Collaborator

@LucasTrg LucasTrg commented May 3, 2023

In the following PR, we separate the aggregation logic to the communication one by splitting Clients into Client, Aggregator and Preparator.
This will allow to develop more advanced aggregation techniques, different encryption scheme, etc... as well as different network topologies in the future.

@s314cy s314cy self-requested a review May 3, 2023 12:40
@s314cy s314cy added feature New feature or request discojs Related to Disco.js decentralized For the decentralized setting labels May 3, 2023
test('cleartext', clients.decentralized.ClearText)
test('secure', clients.decentralized.SecAgg)
test('cleartext', clients.decentralized.Base)
test('secure', clients.decentralized.Base)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do both tests still work if called on the same?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should actually delete the 'secure' one for the time being. As @swaglid explained to us, the current sec_agg is not working properly. My goal is to get this PR merged quick enough for him to begin re-implementing secure aggregation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the secure decentralized one should be fine, it has unit tests and end2end tests. would be better to keep it or migrate it during the refactor (maybe walid can help?)

did you check with him how it would have to be modified to become compatible with the new aggregator concept here (does the one here include decentralized and federated actually?)

export { titanic } from './titanic'
export { simpleFace } from './simple_face'
export { geotags } from './geotags'
export { mnistBandit } from './mnist_bandit'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would you mind having the bandit later in a separate PR, after the refactor (the refactor is bigger)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea. It'll make the testing a little bit easier and reduce the scope of the PR a little

@martinjaggi martinjaggi changed the title Bandit aggregator and client refactor aggregator and client refactor May 23, 2023
s314cy
s314cy previously requested changes May 24, 2023
@s314cy
Copy link
Contributor

s314cy commented May 24, 2023

thanks very much for the changes! let's try and fix the CI this week so that we can tackle the merge conflicts the next one :)

there still remains a few changes to make server-side, right? such as making use of the aggregator classes

Copy link
Collaborator Author

@LucasTrg LucasTrg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, looking forwrd to test the edge case we might run into wth the bandit !

this._averageNumberOfParticipants = this.totalNumberOfParticipants / this.round
this._totalNumberOfParticipants += this.currentNumberOfParticipants
} else {
this._round = this.aggregator.round
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite sure of this edge case, or could we get a number of round bigger than the number of aggregation round ?
Maybe some comments in the code could help

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is here because the update method is called once per communication round, meaning possibly multiple times during a single training round, but the stats are over training rounds

but yes the code is unclear 😁

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is making me think that there is a small distinction that we did not capture for participants. Namely, the participation graph can be directed ("I give you my model, but you did not give me yours"), and I don't think this current implementation would fit.
It's not a hard requirement yet, more of a food or thoughts comment

Copy link
Contributor

@s314cy s314cy Jun 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right, the current code allows one to define "what" kind of contribution the client should send to its neighbors, but not "who" it should send its contributions to

similarly, the client will expect a contribution (possibly many) from every active neighbor, with an eventual non-failing timeout in the case where no contribution was received, which is is not ideal because it adds peer idleness

both issues can be fully handled by the aggregator by keeping lists of desirable nodes to receive from/send to (subsets of active nodes), in order to differ from the set of nodes usually kept in sync between the client and aggregator

do you need these changes for bandit? would you like to make these changes yourself? if not, I'll make the changes in a separate PR

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for the moment I can keep up with this discrepancy by sending empty payloads to the nodes that were not selected for the current round.
We can clean it up if the bandit happens to be a promising project

Copy link
Member

@martinjaggi martinjaggi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

amazing work, and excellent tests added also. thanks a lot. only minor comments above


protected informant?: AsyncInformant<T>
/**
* The result promise which, on resolve, will contain the current aggregation result.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

want to say if it's a model (params) or model difference, or could be either?

Copy link
Contributor

@s314cy s314cy Jul 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it can be anything really, since the class is generic, but in the case of the subclasses, it can either be the model weights or model weights difference

currently, I'm pretty sure the weights passed to the client are the entire model, which will go back to model difference in the future polishing PR(s): write docs, support a directed communication graph, re-include DP (includes model diff & clipping), re-write byzantine-robustness, etc.

*/
log (step: AggregationStep, from?: client.NodeID): void {
switch (step) {
case AggregationStep.ADD:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add might be maybe misinterpreted as addition potentially, but here you mean more like save or register?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it corresponds to the aggregator's add method and will have a dedicated docstring in #580 so it'll make more sense then :)

case AggregationStep.ADD:
console.log(`> Adding contribution from node ${from ?? '"unknown"'} for round (${this.communicationRound}, ${this.round})`)
return
case AggregationStep.UPDATE:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update meaning?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it corresponds to a node overriding its previous contribution with a new one, and will be clearer in #580 with its dedicated docstring

@s314cy s314cy merged commit 7f59232 into develop Jul 4, 2023
@s314cy s314cy deleted the bandit-agg branch July 4, 2023 20:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

decentralized For the decentralized setting discojs Related to Disco.js feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants