bpo-30620: remove bogus/dead lines from dedent #2064

jonathaneunice · 2017-06-10T04:17:34Z

textwrap has almost 100% test coverage. I've submitted PRs closing 2 of the remaining 3 gaps (bpo-30591 and bpo-30603). Just one small segment left to go! But, studying that segment, I discovered it's bogus logic. It can never execute.

Below is a deductive proof of this claim, plus the results of empirical checks. The associated PR deletes those two spurious lines.

The key textwrap.dedent() code in question, with logical names superimposed. Other than names, exactly as found in textwrap in the CPython repo.

Deductive Proof

All of the conditions that could lead to the Clause 4 loop else are precluded. Here are the steps to that conclusion, using typing notation.

margin == None to start, and by the end of the first iteration of the main loop, has narrowed from margin: str | None to margin: str (namely, indents[0]).
indents: List[str] because it's assigned from re.findall() -> List[str].
indent: str because it's assigned iteratively via for from indents: List[str].
Clause 2 and Clause 3 definitively handle the cases where margin or indent are prefixes of one another.
Therefore by Clause 4, neither margin nor indent is a prefix of the other.
By Clause 4, margin != '' and indent != ''. Both contain characters and have a non-zero length. Had either been an empty string, it would have been considered a legitimate (zero-length) prefix of the other, and handled by either Clause 2 or Clause 3. s.startswith('') == True for any s: str.
Because margin and indent are both non-empty strings, zip will pairwise iterate down their respective characters to the length of the shortest. Let's call that point k where k = min(len(margin), len(indent)). The Clause 4 loop will iterate i through range(k), setting x == margin[i] and y == indent[i] at each step.
At some point, x and y must differ. If they did not, either margin would be a proper prefix of indent, or indent would be a proper prefix of margin, which we have already established would handled in Clause 2 or 3, and would not enter Clause 4. So there will always be an 0 <= i < k such that x != y. The loop will therefore always exit via break. Because the loop exit via break, the else of the Clause 4 for can never be executed.
The only way a for loop else clause can be triggered is exhaustion of the iteration. Either (a) its iterable is empty at the outset, so the else is immediately triggered, or (b) the iterable is non-empty and the loop naturally exhausts it through iteration without a break. We have shown that neither (a) nor (b) can happen. (a) cannot happen because neither margin nor indent are empty strings. There will always be some characters to co-iterate over. But neither can (b) happen because we've shown that there must be some point at which x and y differ (else one would have been a substring, and previously handled prior to Clause 4).
Ergo the Clause 4 else: margin = margin[:len(indent)] is dead code, which onlly appears to be live code. It can never execute.

Randomized and Empirical Testing

Humbly remembering that even apparent proofs are occasionally faulty, and not wanting to be caught out or surprised by any magical or oddball cases that escaped the above logic, I ran two kinds of empirical verification.

I added a raise RuntimeError to the suspect else.

First I randomly generated texts with varying amounts and composition of leading whitespace, and called dedent() on those, looking to trigger the exception. After 500 million attempts, the exception was never triggered.

I separately ran dedent on the contents of text-ish files from my development system. These include numerous files from Web (HTML, CSS, JS, SVG, Less, SCSS, ...), JavaScript, Java, Python, Perl, PHP, Ruby, SQL, XML, Unix (shell, config, logging...), data and configuration (JSON, YAML, INI, ...), authoring (Markdown, RST, TXT...), C, and DevOps communities.
None of those >1.2 million files triggered the exception.

It's impossible to absolutely "prove a negative" through empirical means. One can always imagine testing against a larger random or actual corpus. But that such large bodies of likely suspects never jiggled the tripwire adds confidence that the suspect code lines simply never execute under any input conditions, just as the proof says.

Conclusion

Having tried constructively to identify any cases where the else in the Clause 4 loop would execute, I've deductively shown it never will. Arguing in the alternative, I tried large bodies of both randomized and empirical data to find a counter-example; after extensive tests, no counter-examples were found. I therefore assert it is dead code, can never be tested or executed, and should be removed.

remove bogus/dead lines from dedent

e7b014c

the-knights-who-say-ni added the CLA signed label Jun 10, 2017

Mariatta approved these changes Jun 12, 2017

View reviewed changes

Mariatta added the type-feature A feature request or enhancement label Jun 13, 2017

Mariatta merged commit 7a80183 into python:master Jun 16, 2017

jonathaneunice deleted the fix-issue-30620 branch June 16, 2017 02:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

bpo-30620: remove bogus/dead lines from dedent #2064

bpo-30620: remove bogus/dead lines from dedent #2064

Uh oh!

jonathaneunice commented Jun 10, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

bpo-30620: remove bogus/dead lines from dedent #2064

bpo-30620: remove bogus/dead lines from dedent #2064

Uh oh!

Conversation

jonathaneunice commented Jun 10, 2017

Deductive Proof

Randomized and Empirical Testing

Conclusion

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants