Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Async simulation and FedBuff aggregation #197

Merged
merged 6 commits into from
Feb 10, 2023

Conversation

ewenw
Copy link
Contributor

@ewenw ewenw commented Feb 1, 2023

Implements the async simulation mode with FedBuff using device traces without needing a constant arrival parameter, following the min-heap method described here.

Benchmarks comparison for Femnist (Sync vs Async with Fedbuff):

Sync / Async

Round: 100 / 180
Virtual clock: 27,618s / 18,913s
Top_5 eval accuracy: 0.914 / 0.92

Params: 5 clients per round, model = resnet18, max_concurrency=10

The results are consistent with the hypothesis that the async scheduling system increases the number of rounds that can be completed within the same amount of virtual clock time, and improves straggler tolerance. They also show the effect of aggregating stale updates from previous rounds, resulting in more rounds before convergence.

Checks

  • I've included any doc changes needed for https://fedscale.readthedocs.io/en/latest/
  • I've made sure the following tests are passing.
  • Testing Configurations
    • Dry Run (20 training rounds & 1 evaluation round)
    • Cifar 10 (20 training rounds & 1 evaluation round)
    • Femnist (20 training rounds & 1 evaluation round)

@ewenw ewenw marked this pull request as ready for review February 1, 2023 14:57
Copy link
Member

@fanlai0990 fanlai0990 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Ewen. It looks good to me. @AmberLJC @IKACE Can you please run some test? Thanks.

@IKACE
Copy link
Contributor

IKACE commented Feb 9, 2023

Thank you so much for you contribution Ewen! Just one small thing:

  • The model_zoo config in benchmark/configs/fedbuff_femnist/conf.yml seems to cause input/output mismatch error in training (see below). I can verify that commenting it out like benchmark/configs/femnist/conf.yml did would solve the issue.

@ewenw
Copy link
Contributor Author

ewenw commented Feb 10, 2023

Thank you so much for you contribution Ewen! Just one small thing:

  • The model_zoo config in benchmark/configs/fedbuff_femnist/conf.yml seems to cause input/output mismatch error in training (see below). I can verify that commenting it out like benchmark/configs/femnist/conf.yml did would solve the issue.

Thanks for catching this. I just commented it out.

@IKACE
Copy link
Contributor

IKACE commented Feb 10, 2023

Thanks for catching this. I just commented it out.

Great! It looks perfect to me now. Thanks again!!

@fanlai0990 fanlai0990 merged commit 34e07aa into SymbioticLab:master Feb 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants