Scrapling’s highest-priority refactoring targets sit inside two files — scrapling/spiders/engine.py and scrapling/engines/_browsers/_stealth.py — where CC scores of 41 and 42, fan-out values above 25, and recent commit activity combine to create live regression risk right now. The top-ranked function, crawl, is both structurally complex and actively changing — this is not a backlog cleanup item. Across 391 total functions, Scrapling has 36 rated critical and 32 in the fire quadrant, signaling that the engine and browser abstraction layers deserve immediate attention.
The table below ranks functions by activity-weighted risk — a score that multiplies structural complexity by recent commit frequency. A function that is both hard to understand (high cyclomatic complexity) and actively changing is a higher priority than one that is complex but untouched. CC = cyclomatic complexity (independent execution paths); ND = max nesting depth; FO = fan-out (distinct callees).
Top 5 Hotspots
| Function | File | Risk | CC | ND | FO |
|---|---|---|---|---|---|
crawl | scrapling/spiders/engine.py | 18.6 | 41 | 7 | 26 |
fetch | scrapling/engines/_browsers/_stealth.py | 16.3 | 42 | 5 | 25 |
fetch | scrapling/engines/_browsers/_controllers.py | 16.2 | 40 | 5 | 24 |
parse | scrapling/core/shell.py | 15.9 | 62 | 3 | 30 |
_cloudflare_solver | scrapling/engines/_browsers/_stealth.py | 15.8 | 34 | 5 | 17 |
Hotspot Analysis
crawl — scrapling/spiders/engine.py
As the central orchestration point in Scrapling’s spider engine, crawl almost certainly coordinates request dispatch, response handling, and spider lifecycle decisions — the kind of function that touches everything. Its cyclomatic complexity of 41 means 41 independent execution paths, each a required test case and a potential regression surface. With a fan-out of 26, a max nesting depth of 7, and 6 commits in the last 30 days, this is a fire-quadrant function: it is structurally complex and actively changing right now, making every commit a live regression risk across a broad call graph.
Recommendation: Add characterization tests covering the dominant execution paths before any further changes, then extract the deeply nested branching blocks (ND 7) into named sub-functions to bring CC below 15 and reduce the blast radius of future edits.
fetch — scrapling/engines/_browsers/_stealth.py
fetch in the stealth browser engine likely handles the full HTTP fetch lifecycle for browser-based, anti-detection requests — a surface that must juggle headers, timing, proxy routing, and response parsing simultaneously. Its CC of 42 is the highest in the dataset alongside crawl, its fan-out of 25 signals broad coupling across 25 distinct callees, and its multiple exit paths add test-coverage burden. It sits in the fire quadrant and was last changed 17 days ago, meaning structural complexity and active development are colliding in real time.
Recommendation: Map the 25 fan-out callees to identify which are shared with crawl — overlapping dependencies between these two god-functions compound blast-radius risk; decoupling shared concerns into dedicated service objects will reduce both CC and cross-function coupling.
fetch — scrapling/engines/_browsers/_controllers.py
The fetch function in the controllers browser engine almost certainly implements the HTTP fetch lifecycle for standard (non-stealth) browser-based requests — a parallel surface to its counterpart in the stealth engine, with its own set of routing, header management, and response-handling branches. Its CC of 40 places it just below the stealth fetch, with a fan-out of 24 and max nesting depth of 5 confirming broad coupling across the browser abstraction layer. Together, the two fetch implementations share structural patterns (god_function, exit_heavy) and likely share callees — a coupling that amplifies the blast radius of changes to either.
Recommendation: Treat the controllers and stealth fetch functions as a paired refactoring target. Identify shared callees first, extract them into a common browser-agnostic service, and then reduce each fetch independently — this approach eliminates duplicated complexity in a single pass.
parse — scrapling/core/shell.py
parse in the shell module carries the highest cyclomatic complexity in the dataset at 62 — 62 independent execution paths, each a required test case and a potential regression surface. Its fan-out of 30 is also the highest in the table, meaning changes here ripple across 30 distinct callees. Despite this structural load, its nesting depth of 3 suggests the complexity is driven by wide branching rather than deep nesting — likely many parsing strategies, format handlers, or fallback cases evaluated in sequence. The low nesting depth is the one structural positive; the CC and fan-out together make this the highest-priority refactoring target for pure structural debt.
Recommendation: Before touching parse, write a characterization test suite covering its dominant output types. Then apply extract-method refactoring to each distinct parsing strategy or format branch, targeting CC below 20 and bringing each sub-function into independent testability.
_cloudflare_solver — scrapling/engines/_browsers/_stealth.py
From its name and location in the stealth browser engine, _cloudflare_solver almost certainly implements the challenge-solving logic for bypassing Cloudflare bot detection — a flow that by nature requires many conditional branches to handle different challenge types, timeouts, and fallback strategies. Its CC of 34 and fan-out of 17 confirm this complexity, and with 5 levels of nesting it is difficult to reason about in isolation. Critically, it sits in the debt quadrant: it has not been touched in 62 days, meaning this structural complexity is dormant for now but carries high blast radius when the next development push arrives.
Recommendation: Before the next feature work on Cloudflare handling, add a characterization test suite covering the branching outcomes so the existing behavior is locked down; then extract each challenge-type handler into its own function to reduce CC to a manageable level.
Patterns Found
Antipatterns detected across the top functions in this snapshot:
| Pattern | Occurrences |
|---|---|
exit_heavy | 8 |
complex_branching | 7 |
deeply_nested | 7 |
god_function | 7 |
long_function | 6 |
These labels belong to two tiers — Tier 1 (structural): complex_branching, deeply_nested, exit_heavy, long_function, god_function. Tier 2 (relational/temporal): hub_function, cyclic_hub, middle_man, neighbor_risk, stale_complex, churn_magnet, shotgun_target, volatile_god.
Key Takeaways
- crawl in scrapling/spiders/engine.py has been touched 6 times in 30 days with a CC of 41 and fan-out of 26 — add characterization tests before the next commit to prevent silent regressions across its 26 callees.
- parse in scrapling/core/shell.py has a cyclomatic complexity of 62 and a fan-out of 30 — the highest values in the dataset for both metrics — making it the largest structural debt item in the codebase and the highest-priority target for extract-method refactoring.
- _cloudflare_solver and fetch share the same file (scrapling/engines/_browsers/_stealth.py) and together account for CC 34 + CC 42 with overlapping god-function and exit-heavy patterns — treating them as a paired refactoring target will reduce the stealth engine’s overall risk profile more efficiently than addressing either alone.
Reproduce This Analysis
git clone https://github.com/D4Vinci/Scrapling
cd Scrapling
git checkout a2285436e15952b8cf1cdafa9892210de84d4ac8
hotspots analyze . --mode snapshot --explain-patterns --force
To run the same analysis on your own codebase, run hotspots analyze . --mode snapshot in any local git repo — no configuration required.
Hotspots highlights structural and activity risk — not “bad code.” Findings are a prioritization aid, not a bug predictor. Editorial policy →