Optimization batch 6: make full use of exact renames#842
Optimization batch 6: make full use of exact renames#842newren wants to merge 2 commits intogitgitgadget:temporary/ort-perf-measurementsfrom
Conversation
b63e932 to
580ba9a
Compare
|
/submit |
|
Submitted as pull.842.git.1612331345.gitgitgadget@gmail.com To fetch this version into To fetch this version to local tag |
|
User |
|
User |
580ba9a to
7ae9460
Compare
|
/submit |
|
Submitted as pull.842.v2.git.1612382628.gitgitgadget@gmail.com To fetch this version into To fetch this version to local tag |
|
On the Git mailing list, Junio C Hamano wrote (reply to this): |
|
On the Git mailing list, Elijah Newren wrote (reply to this): |
|
On the Git mailing list, Junio C Hamano wrote (reply to this): |
|
On the Git mailing list, Jeff King wrote (reply to this): |
|
On the Git mailing list, Elijah Newren wrote (reply to this): |
|
This patch series was integrated into seen via git@72384d7. |
7ae9460 to
a0b6dc9
Compare
diffcore_rename() had some code to avoid having destination paths that already had an exact rename detected from being re-checked for other renames. Source paths, however, were re-checked because we wanted to allow the possibility of detecting copies. But if copy detection isn't turned on, then this merely amounts to attempting to find a better-than-exact match, which naturally ends up being an expensive no-op. In particular, copy detection is never turned on by the merge machinery. For the testcases mentioned in commit 557ac03 ("merge-ort: begin performance work; instrument with trace2_region_* calls", 2020-10-28), this change improves the performance as follows: Before After no-renames: 14.263 s ± 0.053 s 14.119 s ± 0.101 s mega-renames: 5504.231 s ± 5.150 s 1802.044 s ± 0.828 s just-one-mega: 158.534 s ± 0.498 s 51.391 s ± 0.028 s Signed-off-by: Elijah Newren <newren@gmail.com>
We have to look at each entry in rename_src a total of rename_dst_nr times. When we're not detecting copies, any exact renames or ignorable rename paths will just be skipped over. While checking that these can be skipped over is a relatively cheap check, it's still a waste of time to do that check more than once, let alone rename_dst_nr times. When rename_src_nr is a few thousand times bigger than the number of relevant sources (such as when cherry-picking a commit that only touched a handful of files, but from a side of history that has different names for some high level directories), this time can add up. First make an initial pass over the rename_src array and move all the relevant entries to the front, so that we can iterate over just those relevant entries. For the testcases mentioned in commit 557ac03 ("merge-ort: begin performance work; instrument with trace2_region_* calls", 2020-10-28), this change improves the performance as follows: Before After no-renames: 14.119 s ± 0.101 s 13.815 s ± 0.062 s mega-renames: 1802.044 s ± 0.828 s 1799.937 s ± 0.493 s just-one-mega: 51.391 s ± 0.028 s 51.289 s ± 0.019 s Signed-off-by: Elijah Newren <newren@gmail.com>
|
The pull request has 411 commits. The max allowed is 30. Please split the patch series into multiple pull requests. Also consider squashing related commits. |
a0b6dc9 to
dd6595b
Compare
|
/submit |
|
Submitted as pull.842.v3.git.1613288101.gitgitgadget@gmail.com To fetch this version into To fetch this version to local tag |
This series makes full use of exact renames; removing not only a destination pair, but a source pair as well when an exact rename is found and copy detection is not turned on.
Changes since v2:
CC: Derrick Stolee dstolee@microsoft.com
CC: Jonathan Tan jonathantanmy@google.com
CC: Taylor Blau me@ttaylorr.com
CC: Junio C Hamano gitster@pobox.com
CC: Jeff King peff@peff.net
CC: Karsten Blees blees@dcon.de
cc: Derrick Stolee stolee@gmail.com
cc: Elijah Newren newren@gmail.com