-
-
Notifications
You must be signed in to change notification settings - Fork 239
Split queries to avoid overflowing device buffers #502
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportPatch coverage is
📢 Thoughts on this report? Let us know!. |
|
I don't think it's a typo as the usage data is actually provided by the schedule module. Could you try enabling the debug mode ( Alternatively, you could use the script in #404 . Here's how the Also, what is the exact model for your device? The US variant has been known to have some issues (see #451), maybe we need to add the schedule to the skiplist, too? |
|
My exact model is KL125(US). I have two bulbs that I am testing with. One of the bulbs (x.x.x.64) becomes unresponsivemuch easier than the other (x.x.x.73). I've attached create_module_fixture.py outputs for the following: 64, 73 with PR change and the 'kasa -d' output for each bulb as well. I notice that:
Here's some debug output without my change. The bulb becomes unresponsive very quickly. |
|
Looking at the response payload for your 64, it is very close to being 4K, so your guess that it has something to do with usage times may very well be the reason for this behavior... Could you try the linked PR just to confirm that, it doesn't change any behavior but will print out the expected and read payload lengths. |
fc76bdc to
1a38b97
Compare
|
The returned buffer I am getting is 4961 bytes. It is definitely the 'fetch all' query that crashes it. So we're overflowing their output buffer asking for so much info at once. I pushed up a PR that splits the module update list to avoid overflowing the buffer. This seems to be working well for me so far (though I haven't updated test cases). |
The KL-125(US) bulb appears to crash if a request is made to it which produces an output large than 4096 bytes. This commit splits up the module update based on the expected response size to avoid making a request which will overflow the device's output buffer.
|
I've been trying to probe buffer sizes before crash and have noticed a couple things while trying to construct queries to generate different sized responses. These are all about request sizes, not the responses.
|
|
Thanks for the update, interesting findings which will make the fix easier I suppose. Apparently it does not affect all devices, but I suppose that capping the request sizes for bulbs is much nicer solution than trying to estimate the potential response sizes. On the repeated requests, do they also cause the device drop from the network? If not, that might be just to protect the device from DoS. |
Unfortunately I think the 4k output buffer is still a thing. The 'input buffer' is just another way it becomes unresponsive.
This seems to differ as well. If I overflow the output buffer on the KL-125 it still responds to pings but I get 'connection reset by peer' or simply time out when trying to connect to it. Spamming it with lots of connections quickly takes it off the network. When querying 1/s, it's pretty random when they stop responding. Sometimes I feel like I see pattern, but then more testing breaks the pattern. 1/s usually doesn't take it off the network. |
|
More testing. It ends up that my test query itself was causing issues. I was sending queries like padding the request with these until the target size. It ends up that the bulb doesn't like this that much. After processing approximately 720 "None" entries it crashes. I could send { "0": None } approx 720 times before crashing, { "0": None, "1": None" } approx 360 times before crashing. I wonder if maybe it is keeping some sort of internal error log that gets overflowed? Padding out my requests to { "0": { "0": None } } helps, a lot. I can spam it indefinitely without a crash. Takeaways:
I think that the 'output estimate/split requests' is the only thing that is going to be needed for this. I'll try to get it cleaned up and tests updated in the next few days |
a4cd72a to
2c108b3
Compare
|
I'm hoping with this change that several of the other 'fixes' may end up unnecessary. I might try another PR later to try removing previous possibly-related fixes for testing. |
|
I tested this on my |
|
I think this patchset is good to go from my perspective. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with minor changes, glad to hear it's working for @brianthedavis too!
@cobryan05 would you mind adding some explicit tests for the new SmartDevice and Module API properties, and removing the skiplist functionality? Also, please update the PR description also to include the reasoning and findings leading to this change (if not even adding a word about it to the docs) for future developers.
The KL-125(US) bulb appears to crash if a request is made to it which produces an output large than 4096 bytes. This commit splits up the module update based on the expected response size to avoid making a request which will overflow the device's output buffer.
Move maximum response size from a 'Protocol' property to a 'Device' property, defaulting to 16K. Some bulbs, eg KL-125US, are known to have smaller buffers, so bulbs are set to 4k.
Run linter, fix response deep-copying making tests unhappy, updated call count in emeter test.
2c108b3 to
775bd5a
Compare
This commit removes the module skip list as it seems this workaround is no longer necessary when splitting up large queries.
775bd5a to
2712af8
Compare
rytilahti
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thanks @cobryan05! 🥇
Several KASA devices seem to have pretty strict buffer size limitations on incoming/outgoing data transfers.
Testing on KL125-US and HL103 has shown that sending a request size larger than about ~768 bytes will immediately crash the device. Additionally, a query that generates a response larger than ~4096 bytes will crash the KL125-US. I was unable to generate such a large response to test the HL103.
The KL125-US will only return such large queries when its monthly usage stats have been populated. This means that a new bulb would work fine, but after a month of data collection the bulb would break the 4K limit and start to crash.
To work around this issue, an estimated worst-case response size is calculated before sending a request by summing up all modules estimated response size. If the estimated size is greater than the device's max_response_payload_size then the query will be split into multiple queries.
This PR implements splitting queries expected to have large responses and also removes the module 'skip list' which was a previous workaround to the crash (which worked by simply reducing the number of modules queried, which prevented the overflow) since it is no longer necessary.
This PR does not attempt to address the "input buffer size limit." Thus far this limit has not been an issue.