-
Notifications
You must be signed in to change notification settings - Fork 271
Add EvolvedAttention: A transformer-based neural network strategy for Prisoner's Dilemma #1471
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
8022169
to
a2f8e23
Compare
a2f8e23
to
927371b
Compare
Hi @moderouin, thanks for your contribution! Have you run the tests locally? I'm wondering if the new tests are very slow or if the issue is with Github's CI. |
The test runs within 20 minutes on my PC with Ubuntu 22.04.5 LTS and also seems to work well on Windows and macOS. However, I’m able to reproduce the CI bug within a Docker container using |
It seems like the error was caused by the forking of process on Linux. I use spawn instead to avoid any deadlocks @marcharper |
How does the strategy perform? Have you run any tournaments? |
When running a tournament with 10 repetitions against all the strategies not in long_run_time, it ranked first in median score, with a score close to that of EvolvedLookerUp2_2_2. |
Can you tell us more about how you trained it? |
This strategy was trained by performing multiple rounds of tournaments against all strategies and the current network of EvolvedAttention. After each tournament, the strategy learns to reproduce the moves of the best-performing strategy from the last tournament. |
I’m also currently working on a second version with the same architecture, but incorporating an actor-critic approach with policy and value heads on top of the base network and training it using PPO. |
Is there any adjustment I should make to this strategy? @marcharper |
No, I just haven't had a chance to review thoroughly, and we require two maintainer review. It's not surprising that it does so well if you trained against all the other strategies (it might be overfit), but that's not a blocker to including it. For your future trainings, try using just the short runtime strategies, which has worked fine for the other ML strategies and saves a lot of computation time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall, some minor comments. PTAL and thanks for the contribution!
@@ -29,7 +29,7 @@ class TestMatchOutcomes(unittest.TestCase): | |||
), | |||
turns=integers(min_value=1, max_value=20), | |||
) | |||
@settings(max_examples=5) | |||
@settings(max_examples=5, deadline=None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If these tests are slow perhaps it's better to lower max examples. @drvinceknight wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, even if we go down to 2 that's not a bad idea.
@@ -28,6 +28,7 @@ deps = | |||
isort | |||
black | |||
numpy==1.26.4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@drvinceknight is there a reason we've fixed the versions here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was done here to avoid issues with 2.0: #1446
@jsafyan PTAL, you're more of a transformer expert than I. |
@marcharper Do you have an idea why this test fail in the last check? Seems like it's not related to my strategy? |
Description
This PR introduces the
EvolvedAttention
strategy, a novel approach that uses a transformer neural network with self-attention mechanisms to make decisions. The strategy analyzes game history through attention patterns to determine optimal moves.Features
Technical Implementation
Performance Considerations
The neural network is relatively complex but runs efficiently on modern hardware. The strategy balances analytical depth with reasonable computational requirements.