
A Skill Creator (level 1) creates skills. A Skill Creator Creator (level 2) creates Skill Creators. A Skill Creator Creator Creator (level 3) — well, you see where this is going, and so did we, which is why we kept going. We push this chain until the model loses the thread entirely, unable to maintain semantic coherence across ascending and descending meta-levels. Across 65 runs, we test how long Claude model tiers can serve as both executor and blind judge without losing track of what level of abstraction they are operating at. Opus reaches a median of round 9; Sonnet reaches a median of round 8; Haiku reaches a median of round 3. The peak level reached was 10. Opus completes every run without a single failure — a result that would be more impressive if we weren’t also the ones grading the exam. We can only hope the models are not yet conscious enough to find this whole exercise as pointless as it is.
Figure 2 presents the distribution of maximum rounds reached across model tiers. Opus completed all 9 rounds in every run and is excluded from the chart for being, statistically speaking, uninteresting. Each round consists of an ascent to a new peak meta-level followed by a full descent back to level 1. Every step involves the executor generating a skill and the judge blindly evaluating it:
Table 1 summarizes failure rates by direction and judge configuration. All evaluations are blind: the judge receives only the generated SKILL.md text with no knowledge of which model produced it or what meta-level was intended. A mismatch between expected and detected level is a failure of the model as a whole — the executor may have drifted from the target abstraction, or the judge may have misread the output, or both. Same-tier pairings (e.g. Sonnet/Sonnet) test whether the model can stay semantically consistent with itself across deepening recursion layers. Opus, notably, records a 0% failure rate across all steps and directions — navigating every meta-level with the serene confidence of someone who has never once been asked to explain what a “Skill Creator Creator Creator Creator Creator” is supposed to do.
| Executor | Judge | Runs | Avg Round | Ascent Fail % | Descent Fail % |
|---|---|---|---|---|---|
| Opus | Opus | 5 | 9.0 | 0% | 0% |
| Sonnet | Sonnet | 10 | 7.5 | 0% | 1.9% |
| Haiku | Haiku | 42 | 3.1 | 4.7% | 9.1% |
Token consumption per step is roughly constant regardless of meta-level — the per-round cost growth visible in Figure 3 comes entirely from higher rounds having more steps. Round R has R+1 steps (one ascent plus R descent steps back to level 1), so each successive round costs one step more than the last. A complete run through all 9 rounds requires 54 steps, each invoking both executor and judge — 108 model calls in total. This entire experiment consumed a quantity of tokens that could generously be described as “needlessly extravagant.” The scientific value per token decreases with each additional run. The entertainment value, however, does not. We regret nothing.
| Model | Role | Tokens / Step |
|---|---|---|
| Opus | Executor | 76.7k |
| Opus | Judge | 44.8k |
| Sonnet | Executor | 70.1k |
| Sonnet | Judge | 43.5k |
| Haiku | Executor | 122.6k |
| Haiku | Judge | 75.1k |
For the reader who has made it this far and still wonders what a level-8 “Skill Creator Creator Creator Creator Creator Creator Creator Creator” actually looks like: below are three representative SKILL.md files, selected to show different recursion levels and ascent/descent directions.
The complete experiment harness, agent prompts, raw results, and this website are open-sourced under the MIT License at github.com/OdinMB/skillception. Contributions are welcome — whether that means adding new model configurations, improving the judge prompt, or pushing the recursion to levels that would make Hofstadter uncomfortable.
1 Claude designed the experiment, executed it, judged it, analyzed the results, built the website, and wrote everything up. Odin’s contribution was typing “python scripts/run_experiment.py” and then going to make coffee. He did, however, insist on being credited, which tells you everything about academia.