QuantumScribe
OpenEnv RL Environment for LLM Quantum Error Correction

Problem
Quantum processors need decoders that map stabilizer syndromes to Pauli corrections without measuring data qubits directly. PyMatching is a strong classical baseline (sparse blossom on detector graphs). DeepMind's AlphaQubit showed a transformer can beat it on hard cases, but at large-scale TPU training cost. The META RL Phase 2 track asked for an OpenEnv environment where an off-the-shelf LLM learns decoding from verifiable physics rewards.
Approach
Built QuantumScribe (Qubit-Medic): a FastAPI OpenEnv server over Stim's surface_code:rotated_memory_z circuits with SI1000 noise. The LLM emits a terminal Pauli frame (X_ERRORS / Z_ERRORS); five independent rewards score logical correction, final-round syndrome consistency, Jaccard overlap vs PyMatching, format compliance, and a pymatching_beat bonus only when the model is right and PyMatching is wrong. Curriculum runs L1_warmup → L2_target → L3_stretch. Training: LoRA SFT on PyMatching labels, then GRPO with diversity-focused rollouts (temperature 1.2) so reward variance does not collapse.
At a glance
Logical correction (GRPO)
96.4%
Base Qwen (same prompt)
92.0%
Exact-match PyMatching
73.4%
PyMatching beat-rate
0% (disclosed)
Training
Colab T4 · ~3 h
Model
Qwen2.5-3B + LoRA
Tech decisions
Five independent verifiable rewards
GRPO games single scalars; decomposed Stim/PyMatching checks block empty-collapse, mimicry, and format spam by construction.
GRPO over offline labels only
SFT ceiling is PyMatching imitation; RL needs on-policy rollouts against real syndromes to sharpen format and logical correction.
OpenEnv HTTP contract
Same submission pattern as InferenceGym - typed reset/step, deployable Docker Space, trainer swaps local vs remote client.
Stim + PyMatching (not a custom simulator)
Aligns with AlphaQubit/Willow literature and gives unfakeable logical_correction ground truth.
Honest pymatching_beat reporting
Primary eval shows match-not-beat; portfolio claims stay defensible for reviewers.