bpo-30620: remove bogus/dead lines from dedent #2064
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
textwraphas almost 100% test coverage. I've submitted PRs closing 2 of the remaining 3 gaps (bpo-30591 and bpo-30603). Just one small segment left to go! But, studying that segment, I discovered it's bogus logic. It can never execute.Below is a deductive proof of this claim, plus the results of empirical checks. The associated PR deletes those two spurious lines.
The key
textwrap.dedent()code in question, with logical names superimposed. Other than names, exactly as found in textwrap in the CPython repo.Deductive Proof
All of the conditions that could lead to the Clause 4 loop
elseare precluded. Here are the steps to that conclusion, usingtypingnotation.margin == Noneto start, and by the end of the first iteration of the main loop, has narrowed frommargin: str | Nonetomargin: str(namely,indents[0]).indents: List[str]because it's assigned fromre.findall() -> List[str].indent: strbecause it's assigned iteratively viaforfromindents: List[str].marginorindentare prefixes of one another.marginnorindentis a prefix of the other.margin != '' and indent != ''. Both contain characters and have a non-zero length. Had either been an empty string, it would have been considered a legitimate (zero-length) prefix of the other, and handled by either Clause 2 or Clause 3.s.startswith('') == Truefor anys: str.marginandindentare both non-empty strings,zipwill pairwise iterate down their respective characters to the length of the shortest. Let's call that pointkwherek = min(len(margin), len(indent)). The Clause 4 loop will iterateithroughrange(k), settingx == margin[i] and y == indent[i]at each step.xandymust differ. If they did not, eithermarginwould be a proper prefix ofindent, orindentwould be a proper prefix ofmargin, which we have already established would handled in Clause 2 or 3, and would not enter Clause 4. So there will always be an0 <= i < ksuch thatx != y. The loop will therefore always exit viabreak. Because the loop exit viabreak, theelseof the Clause 4forcan never be executed.forloopelseclause can be triggered is exhaustion of the iteration. Either (a) its iterable is empty at the outset, so theelseis immediately triggered, or (b) the iterable is non-empty and the loop naturally exhausts it through iteration without abreak. We have shown that neither (a) nor (b) can happen. (a) cannot happen because neithermarginnorindentare empty strings. There will always be some characters to co-iterate over. But neither can (b) happen because we've shown that there must be some point at whichxandydiffer (else one would have been a substring, and previously handled prior to Clause 4).else: margin = margin[:len(indent)]is dead code, which onlly appears to be live code. It can never execute.Randomized and Empirical Testing
Humbly remembering that even apparent proofs are occasionally faulty, and not wanting to be caught out or surprised by any magical or oddball cases that escaped the above logic, I ran two kinds of empirical verification.
I added a
raise RuntimeErrorto the suspectelse.First I randomly generated texts with varying amounts and composition of leading whitespace, and called
dedent()on those, looking to trigger the exception. After 500 million attempts, the exception was never triggered.I separately ran
dedenton the contents of text-ish files from my development system. These include numerous files from Web (HTML, CSS, JS, SVG, Less, SCSS, ...), JavaScript, Java, Python, Perl, PHP, Ruby, SQL, XML, Unix (shell, config, logging...), data and configuration (JSON, YAML, INI, ...), authoring (Markdown, RST, TXT...), C, and DevOps communities.None of those >1.2 million files triggered the exception.
It's impossible to absolutely "prove a negative" through empirical means. One can always imagine testing against a larger random or actual corpus. But that such large bodies of likely suspects never jiggled the tripwire adds confidence that the suspect code lines simply never execute under any input conditions, just as the proof says.
Conclusion
Having tried constructively to identify any cases where the
elsein the Clause 4 loop would execute, I've deductively shown it never will. Arguing in the alternative, I tried large bodies of both randomized and empirical data to find a counter-example; after extensive tests, no counter-examples were found. I therefore assert it is dead code, can never be tested or executed, and should be removed.