The AI coach we (self-coached athletes) want

I hear ya. Failed experiment maybe :stuck_out_tongue: but ima keep it for my own use if so.

Apologies for ownership mistake with the thread, it wasn’t intentional I just meant I am OP - I’ll be more mindful in future didn’t mean to upset.

I haven’t looked much because I am too busy (with work), but we have to realize that you were looking for – and are trying to provide – a very-short-term solution and I am trying to have all of us work out a big goal and much larger direction-change idea. There is a mismatch, but that’s nothing to do with fulfillment or fail, it’s part of the process😉

No worries about the “ownership”, we got that straight;-)

1 Like

Hi,

over the course of the last weeks (actual months) I created my AI coaching system.

My ambition for 2026 are Brevets and I did not find a way to adjust goals in platforms like Trainingpeaks or XERT to match these longer rides and use durability as control measure.

Description attached …


Randonneur Performance System (RPS) — Overview

1. Introduction

The Randonneur Performance System (RPS) is a modular training and analytics framework designed for long-distance cycling and brevet preparation (200, 400, 600+ km).
Built on Intervals.icu data, RPS integrates endurance-science principles - durability, metabolic efficiency, and fatigue resistance - into a closed, data-driven coaching process.

Each module operates as an autonomous agent with a defined input/output interface. Together, they form a continuous data and decision workflow that turns training data into evidence-based plans and measurable progress.


2. Core Control Metrics

RPS relies on quantitative control variables drawn from endurance physiology.
Its governing objective is durability - maintaining power and metabolic stability under long-term load.

Primary control metrics

  • TSS / kJ targets – Weekly load and energetic output
  • TSB, CTL, ATL – Training load and recovery balance
  • Pa:Hr, EF, FIR, DI – Efficiency and fatigue-resistance indicators
  • VO₂-TiZ / SST-TiZ – Distribution of high-intensity and threshold work

These KPIs drive progression, deload timing, and phase transitions instead of relying on fixed calendar intervals.


3. System Components

Component Primary Function Output Type
vCoach Strategic macro- and mesocycle control Annual Training Plan, 4-week block directives, weekly feed-forward notes
Planner Operational planning and plan-to-execution management 4W plans, WeekOps, feedback reports
Data Gateway Data ingestion and validation from Intervals.icu Markdown trend and audit files
Workout Builder Structured workout export for Intervals.icu .json workout files

All components communicate through standardized Markdown and JSON artifacts to ensure deterministic data flow and traceability.


4. Component Details

4.1 vCoach

The vCoach is the strategic and analytical core of RPS.
It maintains the Master Annual Training Plan (ATP), which defines the seasonal macrostructure—phases, key events, and target load ranges.
From this plan, vCoach derives rolling four-week block directives and a weekly feed-forward report summarizing:

  • Current performance and recovery trends
  • Phase focus and KPI gates
  • Tactical priorities for the upcoming week

Each 4-week directive specifies block type, TSS/kJ bands, and focus metrics. Feed-forward reports enable the Planner to align the next week’s sessions with current form and fatigue status.


4.2 Planner

The Planner translates vCoach’s strategic directives into executable training schedules.
Operating under strict template and validation rules, it produces:

  • 4-Week plans (initial or updated)
  • Weekly operational plans (WeekOps)
  • Coach-feedback reports comparing PLAN vs ACTUAL data

The Planner enforces backward traceability—keeping Intervals.icu activity names, integrating adherence data, and attaching KPI summaries.
No heuristic modifications of content or structure are permitted.


4.3 Data Gateway

The Data Gateway is RPS’s validation layer between Intervals.icu and internal modules.
It reads exported CSVs, verifies schema integrity, and computes standard aggregates such as VO₂-TiZ, SST-TiZ, Durability Index (DI), and FIR ratios.
Three Markdown outputs are generated each week:

  1. Activities ACTUAL – All raw sessions
  2. Trend Report – Four- to eight-week aggregation
  3. Performance Trend – Power-duration and durability metrics

These standardized files re-enter the Planner and vCoach pipeline as validated factual inputs.


4.4 Workout Builder

The Workout Builder converts validated weekly workout plans into Intervals.icu-ready JSON.
It parses the Planner’s Markdown structure line-by-line—without altering durations or intensities—and outputs a strictly formatted JSON array ready for import.


5. Durability as a System Goal

Durability—defined as the maintenance of sustainable power and efficiency under fatigue—is the central outcome variable.
Using DES (Durability Elite System) metrics such as DI ≥ 1.00, FIR 1.10–1.15, and Pa:Hr ≤ 5 %, RPS evaluates readiness to progress from Base to Build or Specificity phases.
This KPI-driven gating ensures that workload progression follows physiological adaptation, not fixed calendar timing.


6. System Architecture and Feedback Loops

RPS functions as a multi-level feedback system operating on weekly and block timescales.

Operational Weekly Loop

  1. Data Gateway imports the latest Intervals.icu data (actual + trends).
  2. Planner compiles the Coach Feedback file—evaluating adherence, TSS variance, and KPI status.
  3. vCoach Feed-Forward integrates this feedback, interprets trends, and issues the next week’s guidance.
  4. The Planner uses the feed-forward to generate the upcoming WeekOps and workouts.
  5. Executed sessions return to step 1 via the Gateway, closing the loop.

Block-Level Loop

At the end of each four-week cycle:

  1. The vCoach reviews aggregated KPI and trend data.
  2. Based on performance and durability status, it creates or updates the next 4-Week Block Directive for the Planner.
  3. The new block defines target load ranges, training focus, and progression or deload strategy.
  4. The Planner implements the block through its regular weekly cycles.

This tiered feedback architecture keeps the system adaptive, aligning week-to-week execution with long-term strategic goals.


2 Likes

This sounds fantastic, is it “available” somewhere? Could the framework be taken and the specifics be adjusted to bias towards the metrics suitable for adjacent event types, say multi-day mountain bike events or bikepacking races?

Cool!

Could you make this available as a “coach” in i.icu?
(As proposed in the opening post.)

Maybe @eva could help with such an interfacing challenge? :wink:

1 Like

Hi,

@yokuha, @Ben_J_Marshall regarding “availability” it is for the time being only available in my chatGPT account and my private github repo.

BUT I am happy to help, support the endeavor of making it or something similar public available.

@Ben_J_Marshall I asked my (LLM-)coach about the KPIs for adjacent event types and if the framework can be used … short answer: yes.

Content is AI generated and must be taken with a grain of salt …


I’ll explicitly explain how this is realized in RPS, and why the KPI & gate logic being stored in external knowledge files makes the system modular and extensible.

Metrics for Adjacent Ultra-Endurance Event Types

(Multi-Day MTB Events & Bikepacking Races)

1. Design Principle: One System, Different Failure Modes

The Randonneur Performance System (RPS) is built around a core assumption:

Ultra-endurance performance is primarily limited by durability, efficiency, and repeatability — not peak power.

This assumption remains valid for adjacent event types such as:

  • multi-day mountain bike stage races
  • unsupported or semi-supported bikepacking races

However, these event types introduce different failure modes:

  • higher variability (MTB)
  • repeated high-force efforts (MTB)
  • sleep deprivation and fueling constraints (bikepacking)
  • multi-day cumulative fatigue (both)

Therefore, the system does not change, but:

  • the KPI emphasis
  • the gate logic
  • the interpretation horizon (single day vs multi-day)
    must be adapted.

2. Shared Core KPIs (RPS / DES Foundation)

These KPIs remain non-negotiable across all ultra-endurance formats and form the stable core of RPS and the Durability Elite System (DES).

Core Durability & Efficiency Metrics

KPI Purpose
Durability Index (DI) Ability to maintain power output under fatigue
Pa:Hr (≥ 90–120 min Z2) Metabolic control and aerobic robustness
Efficiency Factor (EF) Power produced per unit of cardiovascular cost
Sustained Power Drop (e.g. 3h vs 1h) Quantifies endurance decay
kJ / hour Metabolic work rate and fueling anchor
IF stability after 3–4 h Race-pace robustness

:pushpin: Key point:
These KPIs already exist in KPI_Checkliste.md and KPI_Checkliste_DES_v2026.md and therefore require no structural changes.


3. Event-Specific KPI Extensions

A) Multi-Day MTB Stage Races

Primary stressors

  • variable power output (terrain, technical sections)
  • repeated high-force climbing
  • cumulative neuromuscular fatigue

Key Additional KPIs

Category KPI Rationale
Repeatability Back-to-Back Ratio (BBR) Day-to-day performance consistency
Neuromuscular Load Torque-weighted power / low-cadence load MTB-specific muscular stress
Fatigue Resistance SST or threshold drop under fatigue Resistance to cumulative load
Anaerobic Control FIR (5′ / 20′) Avoid excessive anaerobic cost
Variability VI & time > FTP Controls “over-spiking”

:pushpin: Interpretation shift:
Not absolute values, but trend and decay rate across days determine readiness.


B) Bikepacking Races (Unsupported / Semi-Supported)

Primary stressors

  • extreme duration
  • sleep deprivation
  • fueling and logistics
  • very low but constant intensity

Key Additional KPIs

Category KPI Rationale
Energetics kJ per day Primary load & fueling driver
Pacing Stability IF drift over 6–12 h blocks Fatigue + fueling interaction
Multi-Day Durability DI trend across days Robustness over time
Sleep Stress HRV trend / RHR drift CNS fatigue proxy
Fueling Efficiency CHO g / 1000 kJ Practical self-support metric
Spikes Time above FTP ≈ 0 Spikes are metabolically expensive

:pushpin: Critical insight:
Bikepacking performance is dominated by loss minimization, not capacity maximization.


4. How This Is Implemented in RPS (Architecture)

4.1 Externalized KPI & Gate Logic (Key Design Choice)

In RPS, all KPIs and gate rules live in external knowledge files, for example:

  • KPI_Checkliste.md
  • KPI_Checkliste_DES_v2026.md
  • vCoach_Blocklogik_DES_v2026.md

These files:

  • are not hard-coded
  • are referenced by the vCoach
  • can be modified, extended or replaced without changing system logic

:right_arrow: This makes RPS event-agnostic but context-aware.


4.2 Event-Specific KPI Profiles (Conceptual)

Instead of changing the system, we define event profiles that activate different subsets and thresholds of KPIs.

Example:

Event Profile: MTB_STAGE
- Core KPIs: DI, Pa:Hr, EF
- Add-ons: BBR, Torque Load, VI
- Gate priority: repeatability > peak load
Event Profile: BIKEPACKING
- Core KPIs: DI, Pa:Hr, kJ/day
- Add-ons: IF drift, HRV trend
- Gate priority: energetics & stability

These profiles map directly onto:

  • KPI checklists
  • gate logic
  • block release rules

5. Gate Logic Adaptation (DES Extension)

Example: Gate Logic Differences

KPI Brevet MTB Stage Bikepacking
DI ≥ 1.00 ≥ 0.98 multi-day avg ≥ 0.95 stable
Pa:Hr ≤ 5 % ≤ 6 % ≤ 7 %
BBR optional mandatory ≥ 0.90 optional
kJ/day monitoring monitoring primary gate
Time > FTP limited controlled near zero

:pushpin: Important:
Gate logic is data-driven and file-based, not embedded in code or prompts.


6. Why This Matters (System-Level Insight)

Because KPIs and gate logic are:

  • externalized
  • versioned
  • modular

…the same RPS framework can support:

  • Brevets
  • MTB stage racing
  • Bikepacking
  • future formats (gravel ultras, adventure racing)

:right_arrow: Only the KPI files change — not the system logic.


1 Like

Thanks for thinking about this. An interesting and typically verbose output from an LLM :wink:. Really good to see the modular approach to building this will support extentions, customisation etc. The depth which seems possible with regards KPIs may be excessive for my own needs but its exciting to see the possibilties as I also increase my own experience and understanding of how these physiological aspects have a real world impact on my performance. I think my partner could also get some real benefit from something like as well. Her priority events are also in this ballpark but not exactly the same as mine and would likely want slightly different priority KPIs.

I’d be really interested to explore how to set this, or something very similar for myself. Could you give me an overview of the technology stack and prerequisites to get this setup? I’m already on github if you felt generous enough to make something either publically accessible or open for a bit of collaboration?

1 Like

This post resonates a lot with our experience building NUA coach.

For context and with full humility: today we’re coaching ~1,000 paying athletes, most of them self-coached or coming from a semi-structured background. We’re still learning every week.

When we started NUA almost two years ago, generative AI was not yet viable for production use. But even then, we had a very clear intuition:

we didn’t want to build another training app, but something closer to a coach — combining human-like presence with algorithmic rigor.

As athletes ourselves, two things felt fundamentally broken in most tools we used:

  1. Decisions were opaque (“trust the plan”).
  2. There was no real sense of being accompanied day to day.

So from day one, explanation and presence were first-class requirements. If an athlete doesn’t understand why something changes, belief in the process erodes quickly — regardless of how good the math is.

Our first approach (and where it broke)

Our MVP focused on what we thought mattered most:

  • A very strong training engine (initially aimed at >10h/week and semi-elite athletes).
  • A conversational interface (Telegram / WhatsApp) to emulate the warmth of a coach you can message.
  • A conscious decision not to invest in charts or dashboards — tools like Strava, Garmin or Intervals already do that extremely well.

The idea was simple:

training logic in the engine, human experience in the chat.

That worked… until it didn’t.

The moment we moved from a deterministic UI to a conversational one, user expectations changed dramatically.

Once something talks like a coach, athletes expect it to reason like one — across any situation.

But our system couldn’t.

The conversational layer was explaining decisions it did not actually make. And when reality drifted outside predefined paths, the cracks showed.

That led to a key insight for us:

You cannot separate the entity that decides from the one that explains and expect trust to hold.

Coaches are closed-loop systems

What we eventually realized is that a coach is not a set of features, but a closed loop:

  • Understand initial conditions
  • Plan
  • Observe execution
  • Explain feedback
  • Re-plan when reality diverges

And this loop exists at multiple layers simultaneously:

  • Season / goals
  • Macrocycles
  • Weekly structure
  • Day-to-day readiness and recovery

If different parts of the system reason independently at each layer, coherence is lost.

Where generative AI actually helped us

When generative AI became reliable enough, we initially thought of it mainly as a UX upgrade: more empathy, more flexibility, less rigid flows.

But the deeper value turned out to be elsewhere.

Not “removing rules”, but abstracting them.

Instead of thousands of explicit if/else branches, we now define:

  • What strategies are allowed (e.g. improving FTP, consolidating durability, protecting recovery).
  • What constraints must never be violated.
  • What data is relevant at each decision layer.

Within that controlled space, AI selects and justifies decisions — but never invents capabilities the system doesn’t truly have.

This led us to what we run today (still evolving):

a hybrid system where algorithmic structure guarantees safety and consistency, and AI operates as a constrained decision layer, not a free-form oracle.

Crucially, the same entity both decides and explains.

On metrics, complexity, and cognitive load

Internally, we track a large number of metrics — session execution, load dynamics, readiness trends, performance signals, etc.

But we strongly agree with the point made here:

The athlete should experience clarity, not cognitive load.

Our job as a coach (human or artificial) is to absorb complexity, not externalize it.

In fact, our longer-term view is that as AI systems become more stable and trustworthy, they will increasingly shift where humans choose to spend their cognitive budget.

Athletes won’t want to allocate mental energy to micro-decisions that can be reasoned more consistently by a system — they’ll want to focus on execution, sensation, and intent.

Good coaching has always done this. AI simply allows it to scale.

Still very much a work in progress

We don’t consider NUA “finished” — far from it. Each iteration has come from realizing where our previous mental model of a coach was incomplete.

But this post articulates something we’ve learned the hard way:

building an AI coach is less about adding intelligence, and more about preserving coherence, trust, and explainability as complexity grows.

Really appreciate this discussion — it’s rare to see these ideas framed so clearly.

3 Likes

@Bicio , thank you for the interesting input. Do you think you can, and consider it worth the simplicity, to connect NUA to i.icu? In the way that you get data (activities, races, wellness, athletes’ notes, etc.) from i.icu, put your plan(s) as workouts into i.icu, and let athletes (me) even chat with your system through the i.icu chat?

Hi @albauer. Thanks for that post. Very insightful for a fellow randonneur.
Can you share the forrmula for FIR that you use? I can’t find a unified definition of that calculation anywhere.

Hi @Duppie ,

I use:

FIR = MMP 300s / MMP 1200s

It shows how well VO₂-level power is supported by aerobic / threshold fitness.
Higher FIR = more aerobically supported, durable power; lower FIR = more anaerobic, “spiky” profile.

For each activity I “join” the associated MMP curve.

As I have the MMP Data available I am also calculating the power drop / “slope” over time MMP 20’ , MMP 60’ , MMP 120’, MMP 180’.

Next step is to incorporate the intervals.icu provided power curves after x kJ (e.g. 2.000 kJ).

Alexander

1 Like

I think the spirit of what you’re proposing makes a lot of sense — yes, this could work — but only if we’re very explicit about boundaries and complexity.

At a high level, having Intervals.icu as the observation and visualization layer, and AI coaches focusing on reasoning, planning and explanation, is a very clean idea. Philosophically, we’re aligned.

Where things become non-trivial is the moment the AI coach is not just suggesting, but actually deciding.

Once a system starts to adapt load, replan weeks, react to readiness, and explain why changes happen, it effectively becomes the authority in the coaching loop — and that authority can’t be fragmented without consequences.

That creates a few concrete challenges:

  • Different systems rely on different internal models (load, fatigue, progression).
  • Those differences are fine at the observation layer, but problematic at the decision layer.
  • If explanations don’t fully align with what the dashboard implies, trust erodes quickly.

Because of that, a shallow integration (“just plug the AI into i.icu”) would likely disappoint.

For this to work well in practice, one of two things would be needed:

  1. a sufficiently expressive public API from Intervals, with a clear contract around data vs decisions, or
  2. a deeper partnership where Intervals remains the observability layer and the AI coach becomes the single authority for planning and explanation — similar to the approach we take at NUA.coach.

In the end, the key question is simple: who decides, who explains, and where coherence is enforced.

Happy to keep discussing :slight_smile:

3 Likes

Hi @Bicio ,

if I got it right, I’d like to support what you’re saying.

intervals.icu does a great job when it comes to data collection, number crunching (metrics, KPIs), and visualization.

When it comes to coaching and planning (with or without AI), it really depends on the paradigm you follow and on your ambitions and objectives. Paradigms that work well for race fitness (crits, ~2h races, etc.) might not be the ideal solution for longer-duration events.

Metrics like TSS, CTL, ATL, and TSB don’t describe training objectively.
They only make sense within a philosophy of what performance actually is.

Example:

Same data: CTL 90, TSB +15

Freshness-focused view: “The athlete is ready → add intensity.”

Durability-focused view: “The athlete is fresh but underexposed → add long rides / kJ.”

Nothing changed in the numbers — only the interpretation did.

The same applies to TSS:

Stable weekly TSS can mean consistent endurance, or
the same stress achieved via shorter, harder rides → less durability.

If you don’t explicitly choose a philosophy:

– TSS quietly becomes the goal
– Freshness is mistaken for readiness
– Intensity is mistaken for progress

Bottom line:
Metrics don’t tell you what matters.
Philosophy tells you how to read metrics.

That choice already happens at the observation layer — not later in planning.

2 Likes

Your example highlights the core issue.

The numbers don’t change — only the interpretation framework does. Metrics like CTL, TSB or TSS are only meaningful within an explicit performance philosophy (freshness, durability, exposure, etc.). In that sense, there is no neutral observation layer.

Beyond that, a significant part of coaching input isn’t strictly training data.

It includes:

  • ongoing athlete conversations,
  • subjective intent and perceived effort,
  • mental and emotional state,
  • life constraints,
  • accumulated historical context.

Persisting this kind of semantic and conversational context, and reusing it consistently in future decisions, is non-trivial. It’s part of the coaching system itself (and probably the most important one, or at least, the real differential).

Once planning and explanation depend on both metrics and non-metric context, fragmenting the decision loop across systems leads to incoherence.

Hi @Bicio , thanks😉 Also for the first details on technical challenges. And I do see the challenges you describe and more. But I think they are not conceptional problems, they are challenges in the interaction. All analysis could be done in the “AI coach software”. It can store a state. And it can change the displays and tabs in i.icu. So I do not see the conceptional problem.
At the same time, of course, I do understand that you guys developing the coaching software have to keep your “lives” reasonably simple, have enough challenges anyway.
However, for me as a “naive user” it would make much more sense to interact (largely) through a constant and common i.icu interface instead of having all kinds of inferior “coaching app” interfaces. And in reality I have simply seen too many (mediocre) of these. And I always have to keep both i.icu and that interface “curated”, at least, regarding my data… That’s the real origin of the wish:-)

I have an even better understanding of why it might be closer to a dream than reality for a while, but I will eagerly await great coaches utilizing i.icu as an interface nevertheless.
And, really, if that abstraction exists they could practically immediately do the same with, e.g., TrainingPeaks – LLMs generating specific code (for such systems) is, after all, not that uncommon anymore…

I’ve tried a few AI/LLM/GPT-based running dashboards. Many of them aim to be “smart” by interpreting training and surfacing insights. I’m sure is helpful for lots of people. For me personally, I can already get what I need from Intervals.icu at a glance, so those kinds of dashboards don’t add much.

However, there are two capabilities I still haven’t seen that I think would be genuinely useful for self-coached athletes:

1) Learn my workout structures from history and generate new workouts with the same structure
What I want is for the model to “do the boring part”: look at the structure of workouts I actually do (e.g., warmup, warmup duration, warmup intensities, priming duration and intensities → reps, reps duration, reps intensities → recoveries, recovery duration, recovery intensities → cooldown duration and intensities), learn how I vary them week-to-week, estimate an appropriate future load (short/medium/long term), and then adjust duration/intensity so progression stays sensible toward a target race in X months.
Ideally it would reuse familiar sessions with small variations when necessary, rather than generating totally new workouts that don’t match my existing patterns and habits.

I’m not sure how feasible this is today (parsing workout history reliably, detecting patterns, etc.), but if it is feasible, it feels like an underserved feature across current tools.

2) Read and reuse my workout library
Not sure whether the Intervals.icu API supports this (or whether it’s planned), but being able to pull from a user’s workout library would help a lot, especially for people who have a set of “go-to” sessions they repeat with minor modifications over months. Minor modifications are the boring parts and it would be amazing if an engine can do it instead of me.

Curious if others would find this useful too, and whether any of this is possible via the current API.

3 Likes

This sounds interesting. I think I can give it a try, as I already have most of the foundations needed to understand past workout structures.

2 Likes

I think this is an interesting idea for a real (AI) coach!
Optimally, the coach would even evaluate – mid- to long-term – how these previous workouts worked out when done for many weeks and months and then value them accordingly in preparing new ones.

Of course, that’s exactly one of the points that the “AI coach” should analyze and use all data in the i.icu database/interface, vide supra.

1 Like

Please let me know if you’re looking for ppl for testing.
I also tried yours (Coach Watts) halfway and stopped.
I saw that Coach Watts tried to recognize existing future workouts from my calendar. I like this, but then I stopped using it when I realized that the engine is producing inconsistent outputs of ‘existing workouts’ when detecting the existing workouts between plan generation trials.

My plans and calendars are constructed solely from combinations of workouts from my workout library with modification to the interval warmup duration, rep duration and interval rest duration/intensity.

Don’t bother reading anything in here, just a bunch of chat bots talking nonsense to eachother.

3 Likes