exo's benchmarking layer carries the highest activity risk — 5 functions to address first

exo's top hotspots are concentrated in bench/ and the Svelte dashboard store, where god-functions with CC scores as high as 93 are actively changing right now.

Stephen Collins ·
oss python refactoring code-health

Antipatterns Detected

complex_branching5exit_heavy5god_function5long_function5deeply_nested4

Key Points

What is a god function and why does it matter in exo?

A god function is one that tries to do too much — handling argument parsing, orchestration, error handling, and output all in a single body. In exo, `main` in `bench/exo_bench.py` exemplifies this: with a cyclomatic complexity of 93 and 57 distinct functions it calls out to, a change anywhere in that body can affect an enormous surface area. The practical consequence is that bugs are hard to isolate, tests are hard to write, and every new feature added to the function makes the next change more dangerous.

How do I reduce cyclomatic complexity in Python?

The most direct technique is the extract-method refactoring: identify coherent sub-tasks inside the complex function and move each into its own named function with a clear input/output contract. For functions like `run_planning_phase` with a CC of 57, replacing deeply nested conditionals with early-return guard clauses and replacing repeated branching patterns with lookup tables or strategy objects can significantly reduce the path count.

Is exo actively maintained?

Yes — all five top hotspots are in the 'fire' quadrant, meaning they combine high structural complexity with high recent commit activity. The top-ranked function alone carries an activity risk of 21.1, indicating the codebase is under active development right now, not in maintenance mode.

How do I reproduce this analysis?

Run the Hotspots CLI against the exo-explore/exo repository at commit 4688adb using `hotspots analyze` and you will reproduce the scores and rankings shown here.

What does activity-weighted risk mean?

Complexity × recent commit frequency — functions that are hard to understand AND actively changing are the highest priority for refactoring.

At commit 4688adb, exo-explore/exo surfaces 191 critical functions across a codebase of 1,549 total — and the single highest-priority hotspot is main in bench/exo_bench.py, a god-function with a cyclomatic complexity of 93 and an activity risk of 21.1, placing it squarely in the ‘fire’ quadrant: structurally extreme AND actively changing right now. That combination means every commit touching this function lands on a surface with 93 independent execution paths and 9 levels of nesting, a live regression risk rather than a backlog cleanup item. The pattern repeats across the top five hotspots, all of which are ‘fire’ quadrant, all carrying the god-function and exit-heavy antipatterns, and spanning both the benchmarking harness and the Svelte dashboard layer.

The table below ranks functions by activity-weighted risk — a score that multiplies structural complexity by recent commit frequency. A function that is both hard to understand (high cyclomatic complexity) and actively changing is a higher priority than one that is complex but untouched. CC = cyclomatic complexity (independent execution paths); ND = max nesting depth; FO = fan-out (distinct callees).

Top 5 Hotspots

FunctionFileRiskCCNDFO
mainbench/exo_bench.py21.193957
sendMessagedashboard/src/lib/stores/app.svelte.ts18.551616
mainbench/exo_eval.py18.487540
generateImagedashboard/src/lib/stores/app.svelte.ts17.737613
run_planning_phasebench/harness.py17.457425

Hotspot Analysis

main — bench/exo_bench.py

As the entry point of exo’s benchmarking script, this function almost certainly orchestrates the full benchmark lifecycle — argument parsing, model selection, execution loops, and result reporting — in a single body. A cyclomatic complexity of 93 means 93 independent execution paths, each a required test case and a potential bug surface; a max nesting depth of 9 makes the control flow extremely hard to reason about locally. With a fan-out of 57 distinct callees and an activity risk of 21.1 in the ‘fire’ quadrant, every active commit here is a live regression risk across a very wide call surface.

Recommendation: Apply extract-method refactoring to decompose this function into focused units — argument parsing, benchmark setup, execution, and reporting — each independently testable. Before refactoring, add characterization tests that capture current outputs for representative inputs to establish a regression safety net.

sendMessage — dashboard/src/lib/stores/app.svelte.ts

Located in the Svelte application store, sendMessage likely handles the full message-dispatch flow from the dashboard UI — constructing requests, managing state transitions, handling streaming or error responses, and coordinating multiple downstream calls. A cyclomatic complexity of 51 and a max nesting depth of 6 indicate heavily branched conditional logic that is difficult to follow, while a fan-out of 16 means changes here can ripple into 16 distinct callees. Its activity risk of 18.5 in the ‘fire’ quadrant confirms this is not dormant debt — it is actively changing under high structural load right now.

Recommendation: Extract distinct concerns — request construction, state mutation, and response handling — into separate store actions or helper functions. The 16-callee fan-out warrants an explicit dependency map before any refactoring to avoid unintended side effects in the dashboard layer.

main — bench/exo_eval.py

This second benchmarking entry point — in the evaluation script rather than the general benchmark runner — carries a cyclomatic complexity of 87 and a fan-out of 40, placing it just behind its sibling in bench/exo_bench.py in structural weight. With a max nesting depth of 5 and an activity risk of 18.4 in the ‘fire’ quadrant, it shares the same pattern profile: a god-function accumulating argument parsing, model selection, execution, and result handling in a single body, actively changing under high structural load.

Recommendation: Apply the same extract-method strategy as bench/exo_bench.py — separate argument parsing, evaluation setup, execution, and reporting into distinct, independently testable functions. Decomposing both main functions in parallel reduces duplication risk and clarifies which orchestration logic belongs in a shared harness versus script-specific setup.

generateImage — dashboard/src/lib/stores/app.svelte.ts

Located in the same Svelte store as sendMessage, generateImage handles the image-generation dispatch flow from the dashboard. With a cyclomatic complexity of 37 and a max nesting depth of 6, it shares the branchy, exit-heavy structure of its sibling — multiple conditional paths for request construction, error handling, and state updates compressed into a single function body. Its fan-out of 13 and activity risk of 17.7 confirm it is actively changing alongside sendMessage.

Recommendation: Extract request construction and state-mutation logic into separate helper functions, following the same pattern recommended for sendMessage. Since both functions live in the same store file, refactoring them together reduces the risk of introducing inconsistent patterns across adjacent code paths.

run_planning_phase — bench/harness.py

The name and file path suggest this function drives the planning stage of exo’s benchmark harness — likely coordinating model selection, resource allocation, or test-plan generation before execution begins. A cyclomatic complexity of 57 paired with a fan-out of 25 indicates a function that both branches extensively and reaches broadly into the harness infrastructure, giving it a large blast radius. Its activity risk of 17.4 in the ‘fire’ quadrant means this complexity is being navigated in active development right now, not at some future point.

Recommendation: Decompose the planning phase into discrete, single-responsibility steps that can be tested and reasoned about independently. Audit the 25 fan-out callees to identify which are genuinely planning concerns versus execution concerns that have leaked into this function.

Patterns Found

Antipatterns detected across the top functions in this snapshot:

PatternOccurrences
complex_branching5
exit_heavy5
god_function5
long_function5
deeply_nested4

These labels belong to two tiers — Tier 1 (structural): complex_branching, deeply_nested, exit_heavy, long_function, god_function. Tier 2 (relational/temporal): hub_function, cyclic_hub, middle_man, neighbor_risk, stale_complex, churn_magnet, shotgun_target, volatile_god.

Key Takeaways

  • main in bench/exo_bench.py has a cyclomatic complexity of 93 and a max nesting depth of 9 — add characterization tests before the next commit touches it, because 93 independent paths means breakage is easy to miss.
  • sendMessage in dashboard/src/lib/stores/app.svelte.ts has a fan-out of 16 and is in the ‘fire’ quadrant with an activity risk of 18.5 — map its callee dependencies explicitly before any refactoring to contain ripple effects across the dashboard layer.
  • All five top hotspots share the god-function and exit-heavy patterns, meaning every one of them has multiple return and exit paths that multiply test-coverage burden; prioritize extract-method refactoring in bench/ first, where CC values reach 87–93.

Reproduce This Analysis

git clone https://github.com/exo-explore/exo
cd exo
git checkout 4688adb5d276819d65dd64c7b9f7fa7cf5ad5e2e
hotspots analyze . --mode snapshot --explain-patterns --force

To run the same analysis on your own codebase, run hotspots analyze . --mode snapshot in any local git repo — no configuration required.

Hotspots highlights structural and activity risk — not “bad code.” Findings are a prioritization aid, not a bug predictor. Editorial policy →

Run this on your own codebase

Hotspots runs locally in under a minute — no account, no data leaves your machine.

macOS
$ brew install Stephen-Collins-tech/tap/hotspots
Linux / cargo
$ cargo install hotspots-cli
Run in any repo
$ hotspots analyze .
★ Star on GitHub

Related Analyses