這是一個基於 PPO (Proximal Policy Optimization) 強化學習的博弈實驗。
- 64 分積分賽制:從「驟死賽」進化為「收租馬拉松」,AI 必須學會持續控制盤面。
- 租金加速燈:當對手得分效率提升時,全域警報燈會亮起,誘發 AI 進行干擾。
- 5-Tokens 經濟體系:解禁資金,讓攻防戰更具侵略性。
- 四階段課程學習:從幼兒園到刺客殿堂,自動演化 AI 行為。
environment.py: 六爻宇宙的物理法則與計分邏輯。model.py: 23 維觀測輸入的 PPO 神經網路大腦。train.py: 四階段訓練迴圈與課程學習邏輯。
Any implementation of the Six Ngaau Protocol must adhere to the following startup sequence to ensure synchronized "Awakening":
- Standby State: Infinite loop of text:
目ざめる (wake up). - Trigger: The initiation command.
- Execution: Instantaneous feedback:
HENSHIN! GOOD MORNING PLAYER.
Welcome to the Blue Lock of Logic. If you can't see all 14 million possibilities at T=0, you don't belong here.
This document defines the core logic, resource constraints, and signaling mechanics of the Six Ngaau competitive environment.
- The Space: A fixed 6-bit binary coordinate system corresponding to the 64 Hexagrams.
-
Mutual Distinctness: At the start (T=0), the Global State (
$S$ ), Player A’s Goal ($G_a$ ), and Player B’s Goal ($G_b$ ) must be mutually distinct ($S \neq G_a \neq G_b$ ). - Initial Fog (Turn 1 Silence): During the first turn, all signal feedbacks (Lights) are forced to the "OFF" state to ensure initial information asymmetry.
- Token Supply: Each player receives +3 Tokens at the start of their turn. Tokens are cumulative and have no upper limit.
- Mandatory Action: A player MUST perform exactly 2 bit-flips per turn.
-
Exponential Cost Function: The cost for flipping the same bit multiple times within a single turn scales exponentially. For the
$n$ -th flip of bit$i$ in a single turn:$$Cost(n) = 2^{n-1}$$ -
Turn Execution Sequence:
-
Odd Turns (1, 3, 5...): Sequence follows
${A \rightarrow B \rightarrow A \rightarrow B}$ . -
Even Turns (2, 4, 6...): Sequence follows
${B \rightarrow A \rightarrow B \rightarrow A}$ .
-
Odd Turns (1, 3, 5...): Sequence follows
-
Scoring Logic: A player earns 1 point for every bit in the Global State (
$S$ ) that matches their private Goal State ($G$ ). - Victory Condition: The first player to accumulate 64 points (through consecutive turn scoring) or reach a full 6-bit match wins.
- Surrender: After completing their mandatory actions each turn, a player may choose the "Surrender" option to concede.
The system provides binary feedback based on score progression:
-
Trigger: This light activates ONLY if
$Score_{current} > Score_{previous}$ . - T=1 Constraint: Since there is no prior score for comparison, the Delta Light remains OFF during the first turn.
-
Trigger: This light remains constantly ON if a player's
$Score \geq 32$ . - T=1 Constraint: As the score cannot logically reach 32 in the initialization turn, this light remains OFF during the first turn.
Architect: Gatekeeper-64 Status: Logical Integrity Verified.