Skip to content

Conversation

@robertbrignull
Copy link
Contributor

@robertbrignull robertbrignull commented Jul 7, 2020

This is only possible solution to double-globs in path filters. We make use of the LGTM_INDEX_FILTERS environment variable, as well as the two others that we already use, because LGTM_INDEX_FILTERS does support **.

Merge / deployment checklist

  • Confirm this change is backwards compatible with existing workflows.
  • Confirm the readme has been updated if necessary.

@robertbrignull
Copy link
Contributor Author

I made a test repo for this change at https://github.com/dsp-testing/robertbrignull-filter-test and you can see the changes that switches to this branch makes at https://github.com/dsp-testing/robertbrignull-filter-test/pull/1/checks?check_run_id=845738229

@robertbrignull
Copy link
Contributor Author

@aibaars, would you be able to give this a review? Can also ask someone else if you'd rather not.

// It does not understand double-globs because that would require it to
// traverse the entire file tree to determine which files are matched.
if (config.paths.length !== 0) {
core.exportVariable('LGTM_INDEX_INCLUDE', config.paths.join('\n'));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The LGTM_INDEX_INCLUDE and LGTM_INDEX_EXCLUDE properties don't support any form of wildcards. I think you should only put proper paths into these fields. The * symbol is not allowed in paths on Windows, so you might even get an exception.

I just remembered that there is some good documentation about these settings in the javascript autobuilder: https://github.com/github/codeql/blob/bfb734e1d74df1ae810feb7198cb271b767c4b9a/javascript/extractor/src/com/semmle/js/extractor/AutoBuild.java#L80

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Makes sense to me. I'll filter those out of here.

I'd also forgotten that the JS extractor is public so thanks for that link.


// Checks that a paths of paths-ignore entry is valid, possibly modifying it
// to make it valid, or if not possible then throws an error.
export function validateAndSanitisePath(originalPath: string, propertyName: string, configFile: string): string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also check for the special characters mentioned in https://docs.github.com/en/actions/reference/workflow-syntax-for-github-actions#filter-pattern-cheat-sheet that are not supported by the underlying project layout implementation. Just report an error telling that these are not allowed, to avoid users getting confused why their perfectly sensible pattern string does not match any files.


// Builds an environment variable suitable for LGTM_INDEX_INCLUDE or LGTM_INDEX_EXCLUDE
function buildIncludeExcludeEnvVar(paths: string[]): string {
return paths.filter(p => p.indexOf('**') === -1).join('\n');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should filter out paths containing any kind of special matching character mentioned in https://docs.github.com/en/actions/reference/workflow-syntax-for-github-actions#filter-pattern-cheat-sheet .

For windows I'd also paths containing any the forbidden characters mentioned in https://stackoverflow.com/a/31976060 . Otherwise things will crash at some point with a less friendly error message.

@aibaars
Copy link
Contributor

aibaars commented Jul 9, 2020

This looks fine to me. Two more suggestions to add more checks to improve or avoid error messages.

// Check for any characters that are illegal in path names
if (process.platform === 'win32' && path.match(illegalWindowsCharactersRegex)) {
core.warning('"' + path + '" contains an invalid character and will be ignored. ' +
'The characteres <, >, :, ", !, |, ?, * are forbidden to use in paths on windows.');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

characteres -> characters

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are no longer filtering *, so best to adjust the error message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've moved this to analysis-paths.ts and kept it as not containing * because that is checked for separately on all platforms.

}

// Check for any characters that are illegal in path names
if (process.platform === 'win32' && path.match(illegalWindowsCharactersRegex)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant for this check to be performed when adding a path to the exclude/include env vars. The paths in those env variables will be interpreted as path strings by the autobuilders and should therefore only contain valid path characters.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, it should be fine here too as you no longer fail on *.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've moved this check to when we construct the env vars. After some thought I agree it should be here and the right thing to do is to ignore it. Since the path can't legally exist anyway there should be no effect from not adding it to the include/exclude env vars.

I'd also prefer to not output any kind of error/warning about this because as explained above it shouldn't affect anything as the path legally can't exist anyway.


// Characters that are supported by filters in workflows, but not by us.
// See https://docs.github.com/en/actions/reference/workflow-syntax-for-github-actions#filter-pattern-cheat-sheet
const filterPatternCharactersRegex = /.*[\?\+\[\]!].*/;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you might also want to exclude \ . I think excluding windows-style path separators would be helpful to ensure patterns mean the right thing cross platform.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to not do any special handling for \ for now. We can add that in later. This PR is getting complicated enough as it is and I don't understand exactly what you're proposing. I recommend once this PR is in it would be easiest for you to make the change you're suggesting for \/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest not allowing \ symbols at all. I don't know what the project layouts do with windows path separators and a windows user can just write /. I think it is better to be restrictive in what we accept now, and perhaps make things more liberal later if there is any use (which I doubt).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the point that windows users can use \ instead, and it's nice to be restrictive in the start. I do note that linux paths can legitimately contain \ characters (although it is rather unlikely) so we would be actively preventing people from using filters with files with those characters.

Ok, I'll just ban them. I don't anticipate many problems if any from file legitimately containing this character, so let's hope it's fine. As you say we can make it less restrictive in the future and that's easier than going the other way.


// Builds an environment variable suitable for LGTM_INDEX_INCLUDE or LGTM_INDEX_EXCLUDE
function buildIncludeExcludeEnvVar(paths: string[]): string {
return paths.filter(p => p.indexOf('**') === -1).join('\n');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return paths.filter(p => p.indexOf('**') === -1).join('\n');
return paths.filter(p => p.indexOf('*') === -1).join('\n');

Paths with either * or ** make no sense in the include/exclude env variables.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. You're right and this now checks for any instances of *.

@robertbrignull
Copy link
Contributor Author

Ok, I think this is finally ready. I've addressed all the comments from yesterday, I've just had another look through, and my tests over at https://github.com/dsp-testing/robertbrignull-filter-test are working as expected.

@aibaars, do you want to give this another sanity check? Otherwise I will merge today in a little bit.

while (path.charAt(0) === '/') {
path = path.substring(1);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose empty paths "" are fine and would be interpreted as ".", but might be good to check this and add a test case for both JS and Python analysis runs with such a path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tested manually and including / seems to be fine.

@aibaars
Copy link
Contributor

aibaars commented Jul 10, 2020

Looks good to me.

@robertbrignull
Copy link
Contributor Author

I'm going to add some more filters to the integration. Can't believe I forgot to do that until now. Then I'll merge this.

@robertbrignull robertbrignull force-pushed the lgtm_filters branch 2 times, most recently from d5d447f to 64ca715 Compare July 10, 2020 14:13
@robertbrignull
Copy link
Contributor Author

Seems while javascript is fine, the python autobuilder code borks if you include certain characters like *, ?, or < anywhere in one of the filters. I'm pretty sure this happens on the main branch too so this PR doesn't make things any worse.

@robertbrignull
Copy link
Contributor Author

Also seems the python case throws an error if the INCLUDE env var includes paths to files, because it only accepts directories.

@robertbrignull
Copy link
Contributor Author

Turns out adding / is fine on javascript but causes an error for python. So I take it back and let's ban that path. Also now turns out there's basically nothing useful I can add an integration tests as everything even slightly interesting causes errors for python.

Regardless, this PR is going to be merged as I've done enough testing now manually. In the future this situation needs to be sorted out to bring the two languages more in line because currently the behaviour is unacceptably different.

@robertbrignull
Copy link
Contributor Author

I don't believe this PR will make any configs that were previous valid to be invalid, so from that point of view this is a safe change. There is a reasonable chance that we will find bugs or otherwise want to make changes to this area, but we should be able to response and address those problems when we find them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants