360° Data Athlete: open-sourcing my AI coach experiment (Claude Code + intervals.icu)

tobiasz · 26 May 2026 07:07

A lot of people are building AI coaches right now. I have been doing the same for my own training over the past months and decided to put it on GitHub under MIT, in case anyone here wants to use it or pull ideas from it.

intervals.icu is the heart of the data layer. The system reads wellness, activities, events, athlete feedback, injuries, zones and weather from it, plans the day, and pushes the workout back. Optional inputs from Strava (shoe advisor, title sync), Garmin (running dynamics via the garminconnect library, credentials-based since no OAuth is available), and Telegram are wired in but not required.

Sport-science backbone

The plan is opinionated. Polarized 80/20 in the Seiler tradition (Seiler 2010; Stoeggl & Sperlich 2014/2015). The Z3 grey zone is avoided unless the planner explicitly justifies it. HRV gates intensity: a red HRV signal acts as a soft veto on hard days. No advance planning of intensity sessions. The day is decided in the morning from HRV, sleep, feel, weather, and the prior load. Deload is triggered on evidence via a mesoLoadTrend signal and committed for seven days. When triggers fire (zone drift, new fitness state), DFA-alpha1 is used for zone re-validation (Rogers et al 2021; Doerr et al 2021).

Technical architecture

Claude Code is used as a general-purpose agent harness, not as a coding assistant. 13 specialised sub-agents collaborate per training day: a planner that picks session count, type, duration and intensity; three workout specialists (endurance, complementary, ninja-style strength and agility) that detail each session in sequence, each reading the prior siblings’ outputs; a data-scientist agent that owns the regressions; a coach-analyst for post-activity write-ups; a mental-coach; a video-analyst (via Gemini, optional, for running and strength form); a plan-validator; a config-auditor and a config-fixer for drift detection; and two clinical-adjacent consultants (physio, sports-ortho) with an explicit no-license disclaimer.

Two architectural choices ended up mattering more than I expected:

Two-stage plan validation. First a mechanical validate_plan.py catches hard constraint violations (missing fields, illegal zone IDs, conflicting durations). Then a semantic plan-validator subagent reads the planner output cold and flags substantive inconsistencies (intensity stacked too tight, recovery missed after a hard day). Either alone misses too much.
HRV forecast as per-athlete linear regression. Instead of a black-box predictor, the data-scientist agent fits a per-athlete linear regression on six months of (daily_load, next-morning delta HRV%). Tomorrow’s drop is classified expected, expected-large, or unexpected, and the planner reads that verdict. With a clean six-month dataset this beats any pretrained model I tested, and it stays fully explainable.

Where domain knowledge stays the bottleneck

The models generate clean structures, but the sport-science filter has to sit in the config and in the operator, not as a hope at the end. Without a trainer background and years of actual polarized training, individual recommendations would have been plausible and wrong on the domain side. That is also why the project ships with the disclaimer it does: experimental, not for unsupervised training, no warranty, no support, no audit.

Open source

MIT license. Built as a Claude Code plugin: /plugin marketplace add airbone42/360-data-athlete, then install the framework. Per-athlete facts live in config/. The full architecture, slash commands, glossary, and references are in the README.

Repo: https://github.com/airbone42/360-data-athlete

Happy to discuss design choices, where the system has surprised me, and where it has failed. Feedback from the intervals.icu community very welcome.

matteoagosti · 4 June 2026 20:07

This is great and the “Claude Code as an agent harness, not a coding assistant” framing is exactly the part most people miss.

I came at the same idea from the opposite end. Instead of a team of specialists, I kept it to a single coach persona with a handful of plain workflows (morning check, daily debrief, weekly review, plan-week) over markdown memory files; deliberately minimal, the kind of thing you can read end-to-end in an afternoon (and maybe add to your Obsidian valut). Basically the stripped-down counterpart to yours: GitHub - matteoagosti/coach-harness: An open template that turns an AI coding agent (Claude Code / Gemini CLI / Codex) into a personal endurance coach on top of intervals.icu. · GitHub .

Two genuine questions, since you’re further down this road than I am:

With 13 sub-agents running daily, how’s the cost/latency and coherence? Do the specialists ever pull the plan in different directions, or does the plan-validator reliably catch that?
Really like that you used per-athlete linear regression for the HRV forecast instead of black-box ML. How many weeks of data before it got actually useful?

Either way, nice work shipping this!

tobiasz · 8 June 2026 12:28

Thanks — and your harness looks like a genuinely useful counterpoint to the heavier architecture.

On your two questions:

Cost/latency and coherence: The 13 agents aren’t all firing daily — on a typical morning it’s more like 4–6 (planner + the relevant workout specialists + validator), with the clinical consultants, auditor, and video analyst only triggered on demand. Running this under a standard Claude subscription is fine; on the Max plan it’s barely noticeable, and I’d expect it to fit a Pro plan too. Model choice matters more than agent count: Sonnet for the substantive agents, the mechanical validation step can run on Haiku.

Coherence is mostly solved structurally: each specialist reads its siblings’ prior outputs before writing, so they build on rather than compete with each other. The plan-validator catches the remaining drift cases — stacked intensity, missed recovery windows — and it does catch them reliably, but only because of the two-stage approach. The mechanical pass alone missed domain-level conflicts; the semantic pass is what catches the subtle ones.

HRV regression: Directionally useful around 3–4 weeks if you have enough load variation in that window. Stable and actually actionable closer to 8–12 weeks. The 6-month figure in the post is the “now outperforming any pretrained model I tested” threshold, not a prerequisite for value.

The real bottleneck is data quality, not quantity. A clean 8 weeks beats a noisy 6 months every time. Missing HRV readings and unflagged sick days hurt the coefficients more than anything else.