graphify's extract layer carries the highest activity risk — 2 functions to address first

Two god functions in graphify/extract.py — _extract_generic (CC 66, ND 9) and extract_sql (CC 44, ND 9) — are both structurally extreme and actively changing, making them live regression risks at commit 0ca8d3d.

Stephen Collins ·
oss python refactoring code-health
Activity Risk22.52Low
Hottest Function_extract_generic

Antipatterns Detected

complex_branching5deeply_nested5exit_heavy5god_function5long_function5hub_function1

Key Points

What is a god function and why does it matter in graphify?

A god function is one that has accumulated so many responsibilities that it controls or coordinates a disproportionate share of the system's behavior — typically evidenced by very high cyclomatic complexity, deep nesting, and large fan-out all occurring together. In graphify, five of the top hotspots carry the god_function pattern, with _extract_generic being the clearest example: a cyclomatic complexity of 66, nine levels of nesting, and 64 distinct outbound calls mean this single function reaches into a huge fraction of the codebase. The practical problems are threefold: it is nearly impossible to unit-test in isolation because so many dependencies must be satisfied, any change risks breaking one of the 64 callees in a non-obvious way, and reasoning about a bug requires mentally tracking 66 execution paths simultaneously. In a file that is being actively committed to — _extract_generic was touched 4 times in the last 30 days — that complexity translates directly into regression risk on every merge.

How do I reduce cyclomatic complexity in Python?

The most effective first technique is extract-method refactoring: identify coherent clusters of branching logic within the function and move each cluster into its own named function with a clear single responsibility, which directly reduces the path count in the original function. A cyclomatic complexity above 15 is a reasonable threshold for scheduling a refactor; above 30 it warrants immediate attention; _extract_generic at CC 66 and extract_sql at CC 44 are both well into urgent territory. A concrete first step is to run a coverage tool against the existing test suite to identify which branches currently lack tests, then write those tests as a safety net before touching the structure — this is especially important for extract_sql, which has been modified 3 times in the last 30 days without any apparent reduction in complexity. Once tests exist, decompose-conditional refactoring — replacing complex nested conditionals with guard clauses or strategy objects — can flatten nesting depth alongside reducing cyclomatic complexity.

Is graphify actively maintained?

Yes, the data indicates active development: 394 of 681 functions sit in the fire quadrant, meaning they combine structural complexity with recent commit activity. The top two hotspots alone tell the story — _extract_generic was touched 4 times in the last 30 days and was last changed 0 days ago, and extract_sql was touched 3 times in the last 30 days and last changed just 1 day ago. The 394 fire-quadrant functions across the codebase confirm that active development spans well beyond extract.py — graphify is clearly being developed at pace, which is precisely what makes its structural complexity a near-term regression risk rather than a dormant backlog item.

How do I reproduce this analysis?

The analysis was produced by the hotspots CLI (available at github.com/hotspots-dev/hotspots) against commit 0ca8d3d of safishamsi/graphify. To reproduce it, run `git checkout 0ca8d3d` in a local clone of the repository, then execute `hotspots analyze . --mode snapshot --explain-patterns --force` from the repo root. The same command works on any local git repository without additional configuration, so you can run it against your own codebase to get an equivalent risk breakdown.

What does activity-weighted risk mean?

Activity-weighted risk multiplies structural complexity — derived from cyclomatic complexity, nesting depth, and fan-out — by a measure of recent commit frequency, so functions that are both hard to understand and actively changing score the highest. A function with cyclomatic complexity of 80 that has not been touched in two years scores much lower than one with cyclomatic complexity of 20 touched every week, because the complex-but-dormant function presents lower near-term regression risk. In graphify, _extract_generic scores an activity risk of 22.52 precisely because its structural complexity (CC 66, ND 9, FO 64) is combined with 4 touches in the last 30 days — that combination means developers are actively modifying code that is already at the edge of human comprehensibility. This prioritization helps teams focus refactoring effort where it reduces the probability of introducing bugs right now, not just where the code looks complicated in the abstract.

The top risk in safishamsi/graphify is concentrated in a single file: graphify/extract.py hosts the two most dangerous functions in the codebase, both in the ‘fire’ quadrant — structurally extreme and actively changing right now. _extract_generic leads with an activity risk score of 22.52, paired with a cyclomatic complexity of 66 and fan-out of 64, meaning every active commit is modifying a function with 66 independent execution paths and 64 outbound call dependencies simultaneously. Across 681 total functions — 182 of them critical — the entire codebase shows no debt-quadrant functions at all: 394 functions sit in the fire quadrant, signaling that active development is broadly outpacing structural cleanup.

The table below ranks functions by activity-weighted risk — a score that multiplies structural complexity by recent commit frequency. A function that is both hard to understand (high cyclomatic complexity) and actively changing is a higher priority than one that is complex but untouched. CC = cyclomatic complexity (independent execution paths); ND = max nesting depth; FO = fan-out (distinct callees).

Top 5 Hotspots

FunctionFileRiskCCNDFO
_extract_genericgraphify/extract.py22.566964
extract_sqlgraphify/extract.py20.444940
_rebuild_codegraphify/watch.py20.0128682
maingraphify/main.py19.78285228
extract_astrographify/extract.py18.862628

Hotspot Analysis

_extract_generic — graphify/extract.py

Based on its name and location, _extract_generic likely serves as the central dispatch or fallback extraction routine for graphify’s data pipeline — the function other extractors defer to when no specialized handler applies. Its metrics are extreme across every dimension: cyclomatic complexity of 66 means 66 independent execution paths, max nesting depth of 9 means logic is buried nine control-structure levels deep, and fan-out of 64 means it directly calls 64 distinct functions, making it both a god function and a hub function in the dependency graph. With 4 commits touching it in the last 30 days and last changed 0 days ago, this is a live regression risk: developers are actively modifying a function that requires tracking 66 branching paths and 64 callees simultaneously, and its exit-heavy pattern means test coverage must account for a large number of distinct return paths.

Recommendation: Add characterization tests that pin the current output for representative inputs before any refactoring begins, then apply extract-method decomposition to break the function into focused sub-handlers — each addressing one extraction case — targeting a cyclomatic complexity below 15 per extracted unit. Given the fan-out of 64, map the callee graph first to identify which sub-groups of calls can be encapsulated together.

extract_sql — graphify/extract.py

extract_sql, also in graphify/extract.py, almost certainly handles extraction from SQL data sources and shares the same structural pathology as _extract_generic: cyclomatic complexity of 44, max nesting depth of 9, and fan-out of 40. It was touched 3 times in the last 30 days and last changed 1 day ago, firmly placing it in the fire quadrant alongside its neighbor. The deeply_nested and complex_branching patterns together mean that SQL-specific branching logic — likely covering different query shapes, schema variations, or error conditions — is tangled across nine levels of nesting with 44 paths to reason about, making every active change a high-odds source of regressions in an underspecified test surface.

Recommendation: Decompose extract_sql using extract-method to separate SQL-dialect or schema-variant handling into dedicated functions, reducing each branch cluster to a single decision point. Prioritize adding regression tests for the most commonly exercised SQL paths before the next commit, since the function has already been modified 3 times this month.

_rebuild_code — graphify/watch.py

_rebuild_code sits in graphify’s file-watching layer and likely orchestrates the recompilation or re-processing cycle triggered by filesystem change events — a coordination function that routes each watched file through parsing, extraction, and output steps. Its cyclomatic complexity of 128 is the highest in the top five after main, representing 128 independent execution paths; max nesting depth of 6 and fan-out of 82 place it firmly in god_function and complex_branching territory. The combination of CC 128 and FO 82 means the function must simultaneously track a large branching tree and 82 outbound call dependencies — every new file type or watch event added to the system increases both counts. Its activity risk score of 20.0 reflects active commit pressure on this structural load.

Recommendation: Decompose _rebuild_code by file type or event class: create a dedicated handler function for each category of change the watcher processes, then reduce _rebuild_code to a thin dispatcher that identifies the event and delegates. Given the fan-out of 82, map callee clusters by responsibility first — parsing calls, output calls, and error-handling calls will likely form natural extraction boundaries — so that each extracted handler captures a coherent set of dependencies.

main — graphify/main.py

main is graphify’s command-line entry point and is the most structurally extreme function in the codebase: a cyclomatic complexity of 828 means 828 independent execution paths, and fan-out of 228 means it directly calls 228 distinct functions — nearly a third of graphify’s entire function surface. Max nesting depth of 5 is comparatively low, suggesting the complexity is lateral rather than vertical: a wide conditional tree routing CLI arguments through many distinct execution pathways rather than deeply nested logic. As the entry point, main accumulates complexity with every new command or flag added to the CLI, and its activity risk score of 19.7 reflects active modification under this structural load.

Recommendation: Decompose main by CLI subcommand: extract each top-level command or mode into its own handler function with a clear single responsibility, reducing main to an argument-parsing dispatcher. This can be done incrementally — one subcommand per refactoring session — and each extracted handler becomes independently testable. Prioritize extracting the most frequently modified subcommands first to reduce regression risk on the code paths currently being changed.

extract_astro — graphify/extract.py

extract_astro handles extraction for Astro-format sources and shares a file with the top two hotspots, making graphify/extract.py the single largest concentration of structural risk in the codebase. Its metrics — cyclomatic complexity of 62, max nesting depth of 6, and fan-out of 28 — are extreme in absolute terms even if lower than its neighbors: CC 62 reflects the branching complexity inherent in parsing a format with many valid structural variations, and ND 6 means logic is buried six control-structure levels deep in places. The complex_branching and deeply_nested patterns apply here as they do to _extract_generic and extract_sql. Its activity risk score of 18.8 reflects active modification alongside the two higher-ranked functions.

Recommendation: Apply the same extract-method approach as for _extract_generic and extract_sql: identify structural variants of the Astro format as natural extraction boundaries and move each into a dedicated parsing function. Treat extract_astro as part of the same joint refactoring workstream as its neighbors — all three functions share the module and likely share parsing utilities, so changes to shared helpers should be tested against all three call sites simultaneously.

Patterns Found

Antipatterns detected across the top functions in this snapshot:

PatternOccurrences
complex_branching5
deeply_nested5
exit_heavy5
god_function5
long_function5
hub_function1

These labels belong to two tiers — Tier 1 (structural): complex_branching, deeply_nested, exit_heavy, long_function, god_function. Tier 2 (relational/temporal): hub_function, cyclic_hub, middle_man, neighbor_risk, stale_complex, churn_magnet, shotgun_target, volatile_god.

Key Takeaways

  • _extract_generic (CC 66, ND 9, FO 64) is the single highest-priority refactoring target in the repo — write characterization tests before the next commit touches it, given it has already been modified 4 times in 30 days.
  • extract_sql (CC 44, ND 9, FO 40) shares the same file and the same structural pathology, and was changed 3 times in the last 30 days — treating these two functions as a joint refactoring workstream will reduce the blast radius of changes to graphify/extract.py as a whole.
  • With 394 of 681 functions in the fire quadrant and zero in the debt quadrant, graphify’s risk profile is dominated by active churn on complex code — the priority is not catching up on stale debt but slowing down regressions in code being changed right now.

Reproduce This Analysis

git clone https://github.com/safishamsi/graphify
cd graphify
git checkout 0ca8d3d9f755f85d7f7e83fb3ed98cd20a5d7be2
hotspots analyze . --mode snapshot --explain-patterns --force

To run the same analysis on your own codebase, run hotspots analyze . --mode snapshot in any local git repo — no configuration required.

Hotspots highlights structural and activity risk — not “bad code.” Findings are a prioritization aid, not a bug predictor. Editorial policy →

Run this on your own codebase

Hotspots runs locally in under a minute — no account, no data leaves your machine.

macOS
$ brew install Stephen-Collins-tech/tap/hotspots
Linux / cargo
$ cargo install hotspots-cli
Run in any repo
$ hotspots analyze .
★ Star on GitHub

Related Analyses