Require rebuild if any codegen targets are outdated by hdgarrood · Pull Request #4053 · purescript/purescript

hdgarrood · 2021-04-11T20:20:01Z

Description of the change

Fixes #3911, #3914: prevents an issue where a module would be considered
up-to-date if only some of the requested codegen targets were up-to-date
(as opposed to all of them).

Checklist:

Added the change to the changelog's "Unreleased" section with a reference to this PR (e.g. "- Made a change (#0000)")
Linked any existing issues or proposals that this pull request should close
(n/a) Updated or added relevant documentation
Added a test for the contribution (if applicable)

Fixes purescript#3911, purescript#3914: prevents an issue where a module would be considered up-to-date if only some of the requested codegen targets were up-to-date (as opposed to all of them).

hdgarrood · 2021-04-11T20:21:05Z

github didn't interpret my original message the way I intended, so this comment is just for github; this fixes #3914 as well as #3911.

f-f

👏

hdgarrood · 2021-04-11T20:31:45Z

Thanks for the review!

Apparently the "fixes #xyz" doesn't count if it's not in the original PR description >:(

hdgarrood · 2021-04-11T21:00:34Z

I'm reasonably confident that this change does fix the issue, I think it's just that the tests are flaky.

rhendric · 2021-04-11T21:34:13Z

What OS(es) are you testing on? It's a little suspicious that the macOS and Linux builds are both failing precisely these two new tests and no others, and I can't see anything in the tests themselves which would make them less reliable than the others in the suite. (I wonder if there's something like an OS-specific issue with reading file timestamps.)

hdgarrood · 2021-04-11T21:36:37Z

I'm testing on Linux and both new tests seem to pass about half the time. In CI, different OSes will be using different filesystems - eg I'd guess the linux CI box is using ext4, and the windows CI box is probably using NTFS - and so I think we do expect some OS-specific behaviour here as a result of that, eg because different filesystems have different timestamp resolutions.

hdgarrood · 2021-04-12T11:21:30Z

Actually until I can figure out why the tests are failing I think I’m going to say that I’m only somewhat confident this works.

rhendric · 2021-04-12T18:17:53Z

Ah, no, I think your confidence is valid and you're absolutely right that it's a timestamp resolution thing. I added some code to the tests to log output file modification times and submitted that to Travis, and in the cases when it failed, the timestamps were exactly equal—either because, as on macOS, the filesystem was only recording with one-second resolution, or because, as on Linux, the timestamps appeared to be equal down to the nanosecond, which I can only assume is because the kernel has a limited time resolution. But regardless of the cause, based on its view of the world the changed code was correctly responding that nothing needed rebuilding.

In all of these tests, we're deterministically setting the modification times of the inputs, but not of the outputs, and these are the first tests to rely on the latter. Can we just fudge the modification times of docs.json/corefn.json in the tests? It's not clean but it's probably better than flaky tests.

hdgarrood · 2021-04-12T18:40:59Z

I think we do have to be able to gracefully handle the cases where timestamps are exactly equal, unfortunately, as when I was trying to figure out what was going on here I found that it was relatively common that when running the compiler normally, the externs.json, index.js, and docs.json files all had the exact same timestamp. Experimentally, for me (ext4 on an SSD on linux), it appears that I do at least have microsecond resolution locally. In fact I am slightly surprised that the timestamps being equal is causing the tests to fail, because I deliberately used <= rather than < in getOutputTimestamp because I was expecting this to happen.

I don't think these are the first tests to rely on output timestamps incidentally - we compare output timestamps in order to ensure that downstream modules are recompiled if upstream modules change, so I think the two recompiles downstream modules tests also rely on output timestamps in a similar-ish way to these new ones.

rhendric · 2021-04-12T19:10:26Z

The problem isn't that the timestamps are equal after the first compile; it's that the timestamps are equal after the second compile, between a file touched only by the first and a file recreated by the second. Your code does correctly handle the case when a single compile generates all its outputs with equal timestamps; in that case, everything appears to be up-to-date. This only causes an issue if a second compile then overwrites some of those outputs, making the others not up-to-date, but in a way that is invisible because the second compile happened within the same tick as the first. Outside of tests, I think that's an extremely unlikely scenario; and if you wanted to handle it correctly anyway, I think you'd have to abandon modification times altogether, instead embedding high-resolution timestamps or generation counters or input hashes in the files themselves, or in a cache database.

You might be right about recompiles downstream modules, but I had thought from an admittedly brief review that those tests were relying on the changed timestamp of one of the input files to trigger a rebuild of that module, and the rebuild of that module triggers a rebuild of any downstream modules, none of which depends on the timestamp of the output file, just on the previously cached timestamp of the input file. It's not a terribly important point either way, but I do think it explains why we haven't generally seen these tests flake before, whereas I'm pretty sure that without some modification, these new tests are likely to fail quite frequently.

hdgarrood · 2021-04-12T19:30:57Z

Ah! Thanks. Your analysis - both for why these new tests are flaky and for why the existing ones aren't - is persuasive to me. In that case, I think adding a sleep 1 in between each of the compiler invocations in these new tests would solve the problem, and that feels preferable to me than messing with the output file timestamps.

hdgarrood · 2021-04-12T20:07:10Z

Adding threadDelay (10^6) does seem to fix the flakiness for me locally - both tests have passed 10 times in a row :)

tests/TestMake.hs

Tasty and hspec are both testing frameworks: they are meant to satisfy the same need, and they're not really intended to be used together. See e.g. this comment from the tasty-hspec Haddocks: > hspec and tasty serve similar purposes; consider using one or the > other. > > However, in a pinch, this module allows you to run an hspec Spec as a > tasty TestTree. I'd like to switch back to using hspec throughout for two reasons: 1. With hspec, failures are summarised at the end of the test output. Tasty reports failures inline, which can be very annoying, especially in CI, as you then have to search through the logs to find the failure. Displaying failures at the end seems like a much more sensible default to me. 2. We get to take advantage of HasCallStack in hspec-expectations to help narrow down where a failure is coming from. This is particularly helpful in tests like the ones in TestMake.hs where we run something like ``someAction `shouldReturn` someResult`` more than once in a particular test, as without the source location from HasCallStack it's not immediately clear which assertion is failing. (This would have been useful in purescript#4053). In purescript#2848, when tasty was first introduced, one of the main benefits we were getting was that it would let us run individual tests via CLI options. This is still possible with this PR: hspec has a `--match` CLI option for this. There's no `--accept` flag for golden tests, but instead you can set an `HSPEC_ACCEPT` environment variable, or simply delete the files you want to regenerate.

Require rebuild if any codegen targets are outdated

7c453ae

Fixes purescript#3911, purescript#3914: prevents an issue where a module would be considered up-to-date if only some of the requested codegen targets were up-to-date (as opposed to all of them).

f-f approved these changes Apr 11, 2021

View reviewed changes

Sleep for a second in between compiler invocations

3480105

hdgarrood mentioned this pull request Apr 12, 2021

v0.14.1 release #4054

Closed

rhendric reviewed Apr 12, 2021

View reviewed changes

tests/TestMake.hs Show resolved Hide resolved

tests/TestMake.hs Outdated Show resolved Hide resolved

Code review comments

65e33c6

rhendric approved these changes Apr 12, 2021

View reviewed changes

hdgarrood merged commit 06e38cb into purescript:master Apr 12, 2021

hdgarrood deleted the fix-3911 branch April 12, 2021 21:07

This was referenced Apr 12, 2021

purs fails to update corefn output #3914

Closed

Remove tasty, use only hspec #4056

Closed

JordanMartinez mentioned this pull request Jan 18, 2022

Update version to v0.15.0 working-group-purescript-es/purescript#4

Closed

Conversation

hdgarrood commented Apr 11, 2021

Uh oh!

hdgarrood commented Apr 11, 2021

Uh oh!

f-f left a comment

Choose a reason for hiding this comment

Uh oh!

hdgarrood commented Apr 11, 2021

Uh oh!

hdgarrood commented Apr 11, 2021

Uh oh!

rhendric commented Apr 11, 2021

Uh oh!

hdgarrood commented Apr 11, 2021

Uh oh!

hdgarrood commented Apr 12, 2021

Uh oh!

rhendric commented Apr 12, 2021

Uh oh!

hdgarrood commented Apr 12, 2021

Uh oh!

rhendric commented Apr 12, 2021

Uh oh!

hdgarrood commented Apr 12, 2021

Uh oh!

hdgarrood commented Apr 12, 2021

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants