Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some FL experiment (learning) parameters not propagated from config file #38

Closed
AbeleMM opened this issue May 21, 2022 · 4 comments · Fixed by #39
Closed

Some FL experiment (learning) parameters not propagated from config file #38

AbeleMM opened this issue May 21, 2022 · 4 comments · Fixed by #39

Comments

@AbeleMM
Copy link

AbeleMM commented May 21, 2022

Bug Report

Current Behavior
The value of some learning parameters (e.g., clients per round and epochs) provided in the config of an experiment is seemingly not correctly propagated to the orchestrator (and, subsequently, federator). It appears that the default value from fltk/util/learning_config.py's FedLearningConfig (e.g. clients_per_round: int = 2 and epochs: int = 1) is always used. The issue might also affect other parameters, although I have not experimented with all of them.

Input Code
Given configs/federated_tasks/example_arrival_config.json:

[
  {
    "type": "federated",
    "jobClassParameters": {
      "networkConfiguration": {
        "network": "FashionMNISTCNN",
        "lossFunction": "CrossEntropyLoss",
        "dataset": "mnist"
      },
      "systemParameters": {
        "dataParallelism": null,
        "configurations": {
          "Master": {
            "cores": "1000m",
            "memory": "1Gi"
          },
          "Worker": {
            "cores": "750m",
            "memory": "1Gi"
          }
        }
      },
      "hyperParameters": {
        "default": {
          "batchSize": 128,
          "testBatchSize": 128,
          "learningRateDecay": 0.0002,
          "optimizerConfig": {
            "type": "SGD",
            "learningRate": 0.01,
            "momentum": 0.1
          },
          "schedulerConfig": {
            "schedulerStepSize": 50,
            "schedulerGamma": 0.5,
            "minimumLearningRate": 1e-10
          }
        },
        "configurations": {
          "Master": null,
          "Worker": {
            "batchSize": 500,
            "optimizerConfig": {
              "learningRate": 0.05
            },
            "schedulerConfig": {
              "schedulerStepSize": 2000
            }
          }
        }
      },
      "learningParameters": {
        "totalEpochs": 5,
        "rounds": 1,
        "epochsPerRound": 3,
        "cuda": false,
        "clientsPerRound": 1,
        "dataSampler": {
          "type": "uniform",
          "qValue": 0.07,
          "seed": 42,
          "shuffle": true
        },
        "aggregation": "FedAvg"
      },
      "experimentConfiguration": {
        "randomSeed": [
          89
        ],
        "workerReplication": {
          "Master": 1,
          "Worker": 1
        }
      }
    }
  }
]

Run helm install flearner charts/orchestrator --namespace test -f charts/fltk-values-abel.yaml --set-file orchestrator.experiment=./configs/federated_tasks/example_arrival_config.json,orchestrator.configuration=./configs/example_cloud_experiment.json

Expected behavior/code
The values of the given config should be correctly reflected within the config_dict of fltk/core/distributed/orchestrator.py and self.config of fltk/core/federator.py after their initialization.

@JMGaljaard
Copy link
Owner

Hi @AbeleMM, thank you for the report, indeed this should not be the case. I have created a branch 38-loading-configuration-parameters, which you can pull/use to resolve the issue.

Note, however, that there may be some issue, as I am busy with writing a test suite for configuration object parsing.

@AbeleMM
Copy link
Author

AbeleMM commented May 21, 2022

Thanks for looking into it!

@JMGaljaard
Copy link
Owner

@AbeleMM It should be fully resolved now. In addition losses are now properly parsed and a typo during instantiation was broken.

In addition, I have added a (admittedly somewhat hacky) test-case for both data_parallel and federated learning experiments.

Note that the jinja templates require some changes as well

@JMGaljaard JMGaljaard linked a pull request May 24, 2022 that will close this issue
9 tasks
@AbeleMM
Copy link
Author

AbeleMM commented May 24, 2022

Got it. Thanks for the update!

@AbeleMM AbeleMM closed this as completed May 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants