Skip to content

Add EvolvedAttention: A transformer-based neural network strategy for Prisoner's Dilemma #1471

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: dev
Choose a base branch
from

Conversation

moderouin
Copy link

@moderouin moderouin commented Feb 28, 2025

Description

This PR introduces the EvolvedAttention strategy, a novel approach that uses a transformer neural network with self-attention mechanisms to make decisions. The strategy analyzes game history through attention patterns to determine optimal moves.

Features

  • Transformer architecture with self-attention (24 layers, 8 attention heads)
  • Memory depth of 200 moves with reverse-chronological processing
  • GPU acceleration when available (with CPU fallback)
  • Pre-trained model weights from evolutionary self-play

Technical Implementation

  • Game states encoded as tokens (CC, CD, DC, DD plus special CLS/PAD tokens)
  • Position embeddings to maintain sequence information
  • Decision boundary using sigmoid activation (< 0.5 → Cooperate, ≥ 0.5 → Defect)

Performance Considerations

The neural network is relatively complex but runs efficiently on modern hardware. The strategy balances analytical depth with reasonable computational requirements.

@moderouin moderouin marked this pull request as ready for review February 28, 2025 19:05
@moderouin moderouin force-pushed the attention-strategy branch from 8022169 to a2f8e23 Compare March 1, 2025 05:05
@moderouin moderouin force-pushed the attention-strategy branch from a2f8e23 to 927371b Compare March 1, 2025 05:08
@marcharper
Copy link
Member

Hi @moderouin, thanks for your contribution! Have you run the tests locally? I'm wondering if the new tests are very slow or if the issue is with Github's CI.

@moderouin
Copy link
Author

The test runs within 20 minutes on my PC with Ubuntu 22.04.5 LTS and also seems to work well on Windows and macOS. However, I’m able to reproduce the CI bug within a Docker container using ubuntu:latest; the tests never seem to end. It appears to be an issue with *.rts. I’m currently trying to fix this, but if you have any insights, that would be great! @marcharper

@moderouin moderouin closed this Mar 1, 2025
@moderouin moderouin reopened this Mar 1, 2025
@moderouin
Copy link
Author

moderouin commented Mar 2, 2025

It seems like the error was caused by the forking of process on Linux. I use spawn instead to avoid any deadlocks @marcharper

@marcharper
Copy link
Member

How does the strategy perform? Have you run any tournaments?

@moderouin
Copy link
Author

When running a tournament with 10 repetitions against all the strategies not in long_run_time, it ranked first in median score, with a score close to that of EvolvedLookerUp2_2_2.

@marcharper
Copy link
Member

Can you tell us more about how you trained it?

@moderouin
Copy link
Author

This strategy was trained by performing multiple rounds of tournaments against all strategies and the current network of EvolvedAttention. After each tournament, the strategy learns to reproduce the moves of the best-performing strategy from the last tournament.

@moderouin
Copy link
Author

I’m also currently working on a second version with the same architecture, but incorporating an actor-critic approach with policy and value heads on top of the base network and training it using PPO.

@moderouin
Copy link
Author

Is there any adjustment I should make to this strategy? @marcharper

@marcharper
Copy link
Member

No, I just haven't had a chance to review thoroughly, and we require two maintainer review. It's not surprising that it does so well if you trained against all the other strategies (it might be overfit), but that's not a blocker to including it. For your future trainings, try using just the short runtime strategies, which has worked fine for the other ML strategies and saves a lot of computation time.

Copy link
Member

@marcharper marcharper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, some minor comments. PTAL and thanks for the contribution!

@@ -29,7 +29,7 @@ class TestMatchOutcomes(unittest.TestCase):
),
turns=integers(min_value=1, max_value=20),
)
@settings(max_examples=5)
@settings(max_examples=5, deadline=None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If these tests are slow perhaps it's better to lower max examples. @drvinceknight wdyt?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, even if we go down to 2 that's not a bad idea.

@@ -28,6 +28,7 @@ deps =
isort
black
numpy==1.26.4
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@drvinceknight is there a reason we've fixed the versions here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was done here to avoid issues with 2.0: #1446

@marcharper
Copy link
Member

@jsafyan PTAL, you're more of a transformer expert than I.

@moderouin
Copy link
Author

moderouin commented Mar 16, 2025

@marcharper Do you have an idea why this test fail in the last check? Seems like it's not related to my strategy?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants