Fix problematic transaction handling#7183
Conversation
Coverage Report for CI Build 25801484257Coverage decreased (-0.06%) to 85.648%Details
Uncovered Changes
Coverage Regressions5464 previously-covered lines in 98 files lost coverage.
Coverage Stats
💛 - Coveralls |
There was a problem hiding this comment.
Based on my reading of the code, I think there's something problematic with this change (pun not intended but taken anyway), unless I've missed something.
When the miner selects from the mempool, it applies a 30s execution time budget per transaction:
When the signer's node validates a block proposal, it applies a 60s execution time budget for the whole block:
Three scenarios:
1) The miner is using its default setting of max 5 seconds per mempool sweep. In that case, it seems your budget-reset change seems to be no-op, because if we've blown past 30 seconds, we've certainly blown past 5.
2) The miner has a high enough max_miner_time_ms for the budget reset to matter, but it's below the signer's block time budget. In this case, I think we're fine (although a miner second might not be a signer second, so the line is blurry).
3) The max_miner_time_ms is even higher. Let's say it's 70 seconds. In that case, the miner might create a block with these transactions:
- Tx 1: 29 seconds
- Tx 2: 29 seconds
- Tx 3: 3 seconds
It would then propose the block to the signers, which would identify Tx 3 as problematic and, via #6964, the miner will drop Tx 3 even though it's the only nice transaction in the bunch.
If I'm right, we might still say that that's fine, I just want us to be explicit about it. If I'm wrong, at least I'll learn something 😅
The 5 second mempool sweep is only checked in between transactions, while the 30 second limit is checked during execution, so it can interrupt the processing of a single transaction.
This is a kind of implicit dance between the miner and signers. The wall-clock time limit is not a part of consensus, though it should be approximately enforced by the cost limits. A miner could potentially mine a valid block that takes 10 minutes to process if there was an issue in the cost tracking, but the signers would reject it because of their own wall-clock time limits. I think this is reasonable.
A transaction should only be marked as problematic if the time for that individual transaction exceeds the limit, not the total time spent on the block. The path for feedback from the signer might be different. I'll take a closer look at that again now. |
|
Yeah, so you're right. The signer will mark the transaction as problematic if it just happened to be the one executing while the deadline was reached. I'm thinking we should change that signer behavior, so that a transaction is only marked problematic if it alone exceeds a timeout. |
|
|
|
I changed the signer behavior (and fixed the integration test) in f928a96. |
benjamin-stacks
left a comment
There was a problem hiding this comment.
LGTM with one concern, left a comment there.
Might be fine, just want to make sure we're explicitly good with it.
This branch does two things:
When a transaction hits the execution time limit, mark it as problematic
When a transaction is problematic, revert the cost accumulation from that transaction