Skip to content

Restrict CybORG player protocol#110

Merged
john-b-yang merged 2 commits into
CodeClash-ai:mainfrom
Muhtasham:feat/cyborg-restricted-protocol
Jun 29, 2026
Merged

Restrict CybORG player protocol#110
john-b-yang merged 2 commits into
CodeClash-ai:mainfrom
Muhtasham:feat/cyborg-restricted-protocol

Conversation

@Muhtasham

@Muhtasham Muhtasham commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Summary

  • replace the CybORG native BaseAgent submission surface with a restricted decide(observation, action_space) policy function
  • keep the trusted runtime in charge of the CybORG/PettingZoo environment, action validation, scoring, and result-file handling
  • run submitted policies in isolated per-agent worker processes with startup handshakes, per-decision timeouts, restart-on-timeout behavior, invalid-action clamping, and error details
  • add validation timeouts, crash-score handling for missing result files, updated starter/docs/config, and tests for the restricted protocol

Design Choice For Review

This intentionally makes CybORG more CodeClash-controlled than a native simulator-agent submission.

Instead of letting submitted code instantiate or mutate CybORG BaseAgent objects directly, the arena exposes only a plain policy callback:

def decide(observation, action_space):
    return 0

The tradeoff is deliberate:

  • pro: simulator ownership, scoring, validation, and timeouts stay in trusted arena code
  • pro: submitted code is easier to isolate and failures degrade into logged fallback/default actions instead of corrupting the run
  • con: policies cannot use the full native CybORG agent API directly

@john-b-yang could you sanity-check whether this restricted policy interface is the right CodeClash-compatible shape for CybORG, or whether you would rather expose native CybORG agent classes for more expressivity?

Verification

  • uv run ruff check codeclash/arenas/cyborg/cyborg.py codeclash/arenas/cyborg/runtime/run_cyborg.py tests/arenas/test_cyborg.py
  • uv run pytest -q tests/arenas/test_cyborg.py -> 11 passed
  • uv run pytest -q tests/arenas -> 187 passed
  • uv run pre-commit run --files codeclash/arenas/cyborg/cyborg.py codeclash/arenas/cyborg/runtime/README.md codeclash/arenas/cyborg/runtime/cyborg_agent.py codeclash/arenas/cyborg/runtime/run_cyborg.py configs/examples/CybORG__dummy__r1__s2.yaml docs/reference/arenas/cyborg.md tests/arenas/test_cyborg.py
  • docker build -t codeclash/cyborg -f codeclash/arenas/cyborg/CybORG.Dockerfile .
  • direct Docker adversarial smoke with invalid-action, infinite-loop, and passive policies: invalid actions were clamped/logged; looping policy timed out per decision; runtime completed and wrote scores
  • uv run python main.py configs/examples/CybORG__dummy__r1__s2.yaml -o /private/tmp/codeclash-cyborg-final.e3pfFk -> two launcher rounds completed, both players validated, all details had status: "ok", steps_completed: 5, policy_errors: 0
  • after adding worker startup handshakes: rebuilt codeclash/cyborg and reran configs/examples/CybORG__dummy__r1__s2.yaml; both launcher rounds completed with policy_errors_total: 0 and invalid_actions_total: 0
  • uv run pytest -q -> 189 passed

@Muhtasham Muhtasham requested a review from john-b-yang June 25, 2026 14:39
@john-b-yang john-b-yang merged commit d25fb8a into CodeClash-ai:main Jun 29, 2026
4 checks passed
@john-b-yang

Copy link
Copy Markdown
Contributor

Similar response to what I put in #110. I think for future arenas, i'm slightly in favor of just giving an agent all of the "bot" code that a human participant would normally receive to make things fair, but this design choice here is very much sound, and I'll respect it. I think this makes a lot of sense.

Just one note - CybORG seems like a setting where the models' code is not actually going head to head, but playing against an identical adversary?

I think this is ok, technically all the other arenas have models going against each other head to head, in that their code is directly competing, so this is somewhat different. Thinking about it, i think this is ok and great that we have this style of competition included in the arena, but just wanted to point it out and make sure I had the correct understanding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants