-
Notifications
You must be signed in to change notification settings - Fork 8.1k
Fix <img /> detection regex in web cmdlets #12099
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
...hell.Commands.Utility/commands/utility/WebCmdlet/Common/BasicHtmlWebResponseObject.Common.cs
Show resolved
Hide resolved
|
@PoshChan please retry macos |
|
@vexx32, successfully started retry of |
|
We have 6 Regex in the file. It would nice to add tests. xUnit could be most stable. |
|
There are no existing tests except the test to verify the last change didn't regress. We should add tests, but as this is regression, and we are early in 7.1, my question is have you manually tested it. But to port it to 7.0, we need to add tests to cover at least a couple of cases. here is the test I added: PowerShell/test/powershell/Modules/Microsoft.PowerShell.Utility/WebCmdlets.Tests.ps1 Line 1904 in 920b671
|
|
@TravisEz13 yep I manually tested it with regex101. On a page that contains a bunch of images (literally just a page about travel blogs with a bunch of pictures), here's the breakdown: "Pathological" regex from the test:
Currently-used regex in the codebase:
Proposed new regex in this PR:
There may be a more effective test case available, but I just grabbed the first thing I could think of 😅 |
|
for it to be pathological, you need to test it with the pathological case (lots of spaces after |
|
@vexx32 if you aren't planning on adding the cases, can you open an issue to add the cases and port both PR's back to 7.0 after the cases have been added. |
|
I don't mind adding the cases, but I'm not really clear on where / how these cases are defined (would I need to add something to that Get-WebListenerUrl function?), or exactly what additional cases you're looking to have added here? Tests for something like |
|
Is there a compliance analyzer tool for Regex? |
|
@vexx32 I think @TravisEz13 says that we could add tests like existing one for rest 5 Regex. |
|
@iSazonov we are supposed to use a tool in azure. It's public, but to use it free, we are supposed to use an internal version. To use it, we need a test case (or a few) where the content going to the regex is in a file (separate file per case). The tool is basically and AI and will vary the content of the file until all codepaths have been hit. I'd really appreciate if someone added a case like this. We could remove the pathological test cases if we had cases like this and those cases are unreliable. |
|
@vexx32 for this (#12099 (comment)) new case, we don't need to add a large number of spaces. |
|
@TravisEz13 Do you ask to create one HTML test file per Regex from this 5 Regex in the PR? Should it be full featured HTML page or simple string like |
|
@iSazonov
|
|
@TravisEz13 The docs say that we could have up to 1000 seed files. I think we could create a list of sites based on different engines, grab pages from them to use as seed files. I believe it will be more productive and native than manually create HTML files. |
|
@iSazonov The tool may allow that, but during our testing internet access is disabled. So, the test would fail. (It may not be that way today, but it will be in the future.). The whole purpose of this tool is that it does the variation so that you don't have to produce every input. |
|
@TravisEz13 My suggestion is not to access Internet at test time. My suggestion is:
|
|
@iSazonov in theory that works, but there are legal issues with that. Sharepoint is fine, but the other sites we may need to get the sites permission. |
|
Opened #12138; feel free to continue that discussion there, as it doesn't look like this will be a simple decision of what tests to add for this PR. 🙂 |
|
@vexx32 Can you add a test like you were suggesting here: #12099 (comment) minus the spaces (that's already covered) |
|
Happy to, but I'm not not clear on how cases are added here, as it looks like the content generation is handled elsewhere. Can you point me to where that is? |
|
Here is the commit where I added the last test case and tho WebListener "Controller" that the test cases uses: c64a28e |
@TravisEz13 If I understand right you will make the tests internally so the site list will be not public. We are not going to change, share or distribute the pages. We are not going to download any artifacts like pictures - we need only html. We are not going to check these sites and distribute any information about it - we are going to test only that our code works correctly on the data. |
| contentType = "text/html; charset=utf8"; | ||
| body = "<img" + (new string(' ', dosLength)); | ||
| break; | ||
| case "img-attribute": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't actually a DOS... can you just add a comment explaining that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TravisEz13 Yeah I was torn on that myself. I could also just add a new Controller class here instead, since we plan on adding future cases to test the actual regex patterns' capability to parse what we expect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need to. I'll leave it up to you, but at least add a comment that this is not a DOS case, but you didn't want to create a new controller for a case that was so similar.
| $pathologicalRatio | Should -BeGreaterThan 5 | ||
| } | ||
|
|
||
| It 'correctly parses an image with id, class, and src attributes' { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And you should probably move this to a new context too.
|
I used fixing a conflict in your PR to partially fix a mistake I made... Brought a test back, but pended it. |
|
🎉 Handy links: |
|
🎉 Handy links: |
# Conflicts: # test/powershell/Modules/Microsoft.PowerShell.Utility/WebCmdlets.Tests.ps1
PR Summary
In c64a28e, the img tag regex was changed to
<img\s+[^\s>]*>to avoid a DoS issue.This regex will not match common image tags that contain more than just the
srcattribute (e.g.,<img src="$url" class="className"/>Fix is to adjust the regex to not preclude spaces after other content in the tag.
PR Context
Resolves #12089
PR Checklist
.h,.cpp,.cs,.ps1and.psm1files have the correct copyright headerWIP:or[ WIP ]to the beginning of the title (theWIPbot will keep its status check atPendingwhile the prefix is present) and remove the prefix when the PR is ready.