AI-generated pull requests contain up to 2.74x more security vulnerabilities than human-written code β and they tend to arrive in patches far too large for standard review processes to absorb. That combination is why PR size has become a governance variable, not just a developer preference. Most engineering teams inherited review processes built for human-scale output. AI coding agents produce faster, larger, and more defect-prone patches. A formal PR size policy is the lowest-overhead control teams can add right now.
Why Agent-Generated PRs Break Standard Review
More than one in five code reviews on GitHub now involves an agent, with agent-powered review growing 10x in under a year. Agentic PR volume is scaling faster than review infrastructure can adapt.
The core problem is that agents optimize for task completion, not reviewability. A prompt to implement an authentication module produces a single coherent block β often spanning hundreds of lines across a dozen files β rather than the incremental slices a human developer would produce across multiple commits. Reviewers calibrate differently at scale: a 20-line diff gets read carefully; a 600-line diff across 15 files gets skimmed. That compression is where security issues pass undetected. A 2025 empirical study of code review agents at scale found that agentic PR volumes are growing well ahead of teams' capacity to review them with appropriate depth.
The quality data makes the case for intervention. Beyond the vulnerability rate, AI-co-authored PRs contain 1.7x more issues overall than human-written ones, including nearly 2x the rate of error-handling gaps. When those defects are buried inside a large patch, a reviewer's natural tendency to treat a sizable diff as inherently complete β the agent clearly thought it through β works against finding them. Size policy is not about slowing agents down. It is about keeping review effective.
What a Size Policy Needs to Define
A useful PR size policy covers five elements. Each needs to be machine-enforceable, not just documented in a wiki.
Line-count threshold: Set a soft limit at 400 changed lines and a hard limit at 800. Below the soft limit, the PR proceeds through the normal review queue. Above the hard limit, a second reviewer sign-off is required before merge. Thresholds should be tunable per repository β a large refactor in a low-risk internal tool warrants different treatment than one touching an authentication service.
File-count cap: Limit agent-generated PRs to 20 changed files. More than 20 files almost always signals a scope problem β the agent solved more than it was asked to, or the prompt was not scoped tightly enough. A file-count gate catches cross-cutting changes that a line-count check can miss.
Atomic scope rule: One PR addresses one concern. Agent-generated PRs that mix feature code with refactors, configuration changes, or dependency upgrades need to be split before review. Agents tend to apply opportunistic cleanup beyond the stated task β a useful behaviour that becomes a governance problem when the cleanup is not independently reviewable.
Exception protocol: Define who can approve an override and require a logged justification. Unlogged exceptions are how size policies erode. Each exception is a data point β if one category of change generates most overrides, the thresholds need adjusting, not bypassing.
Agent-generated label: Tag every AI-generated PR with a machine-readable label such as ai-generated. Selective enforcement β stricter size checks for agent-authored code, lighter for a human-authored hotfix β avoids friction for teams running mixed workflows and makes policy audit straightforward.
Three Enforcement Mechanisms That Work
Policy without enforcement is commentary. These three mechanisms operate at the repository level and require no external tooling beyond a standard CI pipeline:
CI status check: A lightweight job that counts changed lines and files on every opened PR. If either count exceeds the hard limit, the check fails and blocks merge until the PR is resized or an exception is approved and logged. This runs in seconds and adds no meaningful latency to the pipeline for compliant PRs.
PR template with scope field: Require every AI-generated PR to complete a scope statement before review begins: what does this PR change, and what does it explicitly not change? Forcing that definition surfaces over-broad patches before a reviewer has to diagnose them. Templates with required fields are enforceable via branch protection β a PR cannot be opened without completing them.
Tiered review routing: Route PRs above the soft line-count limit automatically to a senior engineer or security reviewer rather than the default assignee. Small PRs continue moving at normal velocity. Large agent-generated patches receive proportionate scrutiny. When size correlates with agent authorship β and it will β the routing logic becomes part of a defensible AI governance posture.
Most teams set PR size limits for velocity reasons β smaller diffs review faster and merge cleaner. With AI coding agents in the pipeline, the same limits serve a second purpose: preventing the review fatigue that allows security-relevant changes to slip through without genuine scrutiny. The engineering process that worked at human-scale output is already underpowered for what agents routinely produce. A size policy is a one-time configuration that pays forward on every PR your agents open from here on.
To understand how patch size, file exposure, and change complexity are distributing across your AI coding workflows today, re-entry.ai gives engineering teams full visibility into PR risk before it reaches production.