The top risk in safishamsi/graphify is concentrated in a single file: graphify/extract.py hosts the two most dangerous functions in the codebase, both in the ‘fire’ quadrant — structurally extreme and actively changing right now. _extract_generic leads with an activity risk score of 22.52, paired with a cyclomatic complexity of 66 and fan-out of 64, meaning every active commit is modifying a function with 66 independent execution paths and 64 outbound call dependencies simultaneously. Across 681 total functions — 182 of them critical — the entire codebase shows no debt-quadrant functions at all: 394 functions sit in the fire quadrant, signaling that active development is broadly outpacing structural cleanup.
The table below ranks functions by activity-weighted risk — a score that multiplies structural complexity by recent commit frequency. A function that is both hard to understand (high cyclomatic complexity) and actively changing is a higher priority than one that is complex but untouched. CC = cyclomatic complexity (independent execution paths); ND = max nesting depth; FO = fan-out (distinct callees).
Top 5 Hotspots
| Function | File | Risk | CC | ND | FO |
|---|---|---|---|---|---|
_extract_generic | graphify/extract.py | 22.5 | 66 | 9 | 64 |
extract_sql | graphify/extract.py | 20.4 | 44 | 9 | 40 |
_rebuild_code | graphify/watch.py | 20.0 | 128 | 6 | 82 |
main | graphify/main.py | 19.7 | 828 | 5 | 228 |
extract_astro | graphify/extract.py | 18.8 | 62 | 6 | 28 |
Hotspot Analysis
_extract_generic — graphify/extract.py
Based on its name and location, _extract_generic likely serves as the central dispatch or fallback extraction routine for graphify’s data pipeline — the function other extractors defer to when no specialized handler applies. Its metrics are extreme across every dimension: cyclomatic complexity of 66 means 66 independent execution paths, max nesting depth of 9 means logic is buried nine control-structure levels deep, and fan-out of 64 means it directly calls 64 distinct functions, making it both a god function and a hub function in the dependency graph. With 4 commits touching it in the last 30 days and last changed 0 days ago, this is a live regression risk: developers are actively modifying a function that requires tracking 66 branching paths and 64 callees simultaneously, and its exit-heavy pattern means test coverage must account for a large number of distinct return paths.
Recommendation: Add characterization tests that pin the current output for representative inputs before any refactoring begins, then apply extract-method decomposition to break the function into focused sub-handlers — each addressing one extraction case — targeting a cyclomatic complexity below 15 per extracted unit. Given the fan-out of 64, map the callee graph first to identify which sub-groups of calls can be encapsulated together.
extract_sql — graphify/extract.py
extract_sql, also in graphify/extract.py, almost certainly handles extraction from SQL data sources and shares the same structural pathology as _extract_generic: cyclomatic complexity of 44, max nesting depth of 9, and fan-out of 40. It was touched 3 times in the last 30 days and last changed 1 day ago, firmly placing it in the fire quadrant alongside its neighbor. The deeply_nested and complex_branching patterns together mean that SQL-specific branching logic — likely covering different query shapes, schema variations, or error conditions — is tangled across nine levels of nesting with 44 paths to reason about, making every active change a high-odds source of regressions in an underspecified test surface.
Recommendation: Decompose extract_sql using extract-method to separate SQL-dialect or schema-variant handling into dedicated functions, reducing each branch cluster to a single decision point. Prioritize adding regression tests for the most commonly exercised SQL paths before the next commit, since the function has already been modified 3 times this month.
_rebuild_code — graphify/watch.py
_rebuild_code sits in graphify’s file-watching layer and likely orchestrates the recompilation or re-processing cycle triggered by filesystem change events — a coordination function that routes each watched file through parsing, extraction, and output steps. Its cyclomatic complexity of 128 is the highest in the top five after main, representing 128 independent execution paths; max nesting depth of 6 and fan-out of 82 place it firmly in god_function and complex_branching territory. The combination of CC 128 and FO 82 means the function must simultaneously track a large branching tree and 82 outbound call dependencies — every new file type or watch event added to the system increases both counts. Its activity risk score of 20.0 reflects active commit pressure on this structural load.
Recommendation: Decompose _rebuild_code by file type or event class: create a dedicated handler function for each category of change the watcher processes, then reduce _rebuild_code to a thin dispatcher that identifies the event and delegates. Given the fan-out of 82, map callee clusters by responsibility first — parsing calls, output calls, and error-handling calls will likely form natural extraction boundaries — so that each extracted handler captures a coherent set of dependencies.
main — graphify/main.py
main is graphify’s command-line entry point and is the most structurally extreme function in the codebase: a cyclomatic complexity of 828 means 828 independent execution paths, and fan-out of 228 means it directly calls 228 distinct functions — nearly a third of graphify’s entire function surface. Max nesting depth of 5 is comparatively low, suggesting the complexity is lateral rather than vertical: a wide conditional tree routing CLI arguments through many distinct execution pathways rather than deeply nested logic. As the entry point, main accumulates complexity with every new command or flag added to the CLI, and its activity risk score of 19.7 reflects active modification under this structural load.
Recommendation: Decompose main by CLI subcommand: extract each top-level command or mode into its own handler function with a clear single responsibility, reducing main to an argument-parsing dispatcher. This can be done incrementally — one subcommand per refactoring session — and each extracted handler becomes independently testable. Prioritize extracting the most frequently modified subcommands first to reduce regression risk on the code paths currently being changed.
extract_astro — graphify/extract.py
extract_astro handles extraction for Astro-format sources and shares a file with the top two hotspots, making graphify/extract.py the single largest concentration of structural risk in the codebase. Its metrics — cyclomatic complexity of 62, max nesting depth of 6, and fan-out of 28 — are extreme in absolute terms even if lower than its neighbors: CC 62 reflects the branching complexity inherent in parsing a format with many valid structural variations, and ND 6 means logic is buried six control-structure levels deep in places. The complex_branching and deeply_nested patterns apply here as they do to _extract_generic and extract_sql. Its activity risk score of 18.8 reflects active modification alongside the two higher-ranked functions.
Recommendation: Apply the same extract-method approach as for _extract_generic and extract_sql: identify structural variants of the Astro format as natural extraction boundaries and move each into a dedicated parsing function. Treat extract_astro as part of the same joint refactoring workstream as its neighbors — all three functions share the module and likely share parsing utilities, so changes to shared helpers should be tested against all three call sites simultaneously.
Patterns Found
Antipatterns detected across the top functions in this snapshot:
| Pattern | Occurrences |
|---|---|
complex_branching | 5 |
deeply_nested | 5 |
exit_heavy | 5 |
god_function | 5 |
long_function | 5 |
hub_function | 1 |
These labels belong to two tiers — Tier 1 (structural): complex_branching, deeply_nested, exit_heavy, long_function, god_function. Tier 2 (relational/temporal): hub_function, cyclic_hub, middle_man, neighbor_risk, stale_complex, churn_magnet, shotgun_target, volatile_god.
Key Takeaways
- _extract_generic (CC 66, ND 9, FO 64) is the single highest-priority refactoring target in the repo — write characterization tests before the next commit touches it, given it has already been modified 4 times in 30 days.
- extract_sql (CC 44, ND 9, FO 40) shares the same file and the same structural pathology, and was changed 3 times in the last 30 days — treating these two functions as a joint refactoring workstream will reduce the blast radius of changes to graphify/extract.py as a whole.
- With 394 of 681 functions in the fire quadrant and zero in the debt quadrant, graphify’s risk profile is dominated by active churn on complex code — the priority is not catching up on stale debt but slowing down regressions in code being changed right now.
Reproduce This Analysis
git clone https://github.com/safishamsi/graphify
cd graphify
git checkout 0ca8d3d9f755f85d7f7e83fb3ed98cd20a5d7be2
hotspots analyze . --mode snapshot --explain-patterns --force
To run the same analysis on your own codebase, run hotspots analyze . --mode snapshot in any local git repo — no configuration required.
Hotspots highlights structural and activity risk — not “bad code.” Findings are a prioritization aid, not a bug predictor. Editorial policy →