Require rebuild if any codegen targets are outdated#4053
Require rebuild if any codegen targets are outdated#4053hdgarrood merged 3 commits intopurescript:masterfrom
Conversation
Fixes purescript#3911, purescript#3914: prevents an issue where a module would be considered up-to-date if only some of the requested codegen targets were up-to-date (as opposed to all of them).
|
Thanks for the review! Apparently the "fixes #xyz" doesn't count if it's not in the original PR description >:( |
|
I'm reasonably confident that this change does fix the issue, I think it's just that the tests are flaky. |
|
What OS(es) are you testing on? It's a little suspicious that the macOS and Linux builds are both failing precisely these two new tests and no others, and I can't see anything in the tests themselves which would make them less reliable than the others in the suite. (I wonder if there's something like an OS-specific issue with reading file timestamps.) |
|
I'm testing on Linux and both new tests seem to pass about half the time. In CI, different OSes will be using different filesystems - eg I'd guess the linux CI box is using ext4, and the windows CI box is probably using NTFS - and so I think we do expect some OS-specific behaviour here as a result of that, eg because different filesystems have different timestamp resolutions. |
|
Actually until I can figure out why the tests are failing I think I’m going to say that I’m only somewhat confident this works. |
|
Ah, no, I think your confidence is valid and you're absolutely right that it's a timestamp resolution thing. I added some code to the tests to log output file modification times and submitted that to Travis, and in the cases when it failed, the timestamps were exactly equal—either because, as on macOS, the filesystem was only recording with one-second resolution, or because, as on Linux, the timestamps appeared to be equal down to the nanosecond, which I can only assume is because the kernel has a limited time resolution. But regardless of the cause, based on its view of the world the changed code was correctly responding that nothing needed rebuilding. In all of these tests, we're deterministically setting the modification times of the inputs, but not of the outputs, and these are the first tests to rely on the latter. Can we just fudge the modification times of |
|
I think we do have to be able to gracefully handle the cases where timestamps are exactly equal, unfortunately, as when I was trying to figure out what was going on here I found that it was relatively common that when running the compiler normally, the externs.json, index.js, and docs.json files all had the exact same timestamp. Experimentally, for me (ext4 on an SSD on linux), it appears that I do at least have microsecond resolution locally. In fact I am slightly surprised that the timestamps being equal is causing the tests to fail, because I deliberately used I don't think these are the first tests to rely on output timestamps incidentally - we compare output timestamps in order to ensure that downstream modules are recompiled if upstream modules change, so I think the two |
|
The problem isn't that the timestamps are equal after the first compile; it's that the timestamps are equal after the second compile, between a file touched only by the first and a file recreated by the second. Your code does correctly handle the case when a single compile generates all its outputs with equal timestamps; in that case, everything appears to be up-to-date. This only causes an issue if a second compile then overwrites some of those outputs, making the others not up-to-date, but in a way that is invisible because the second compile happened within the same tick as the first. Outside of tests, I think that's an extremely unlikely scenario; and if you wanted to handle it correctly anyway, I think you'd have to abandon modification times altogether, instead embedding high-resolution timestamps or generation counters or input hashes in the files themselves, or in a cache database. You might be right about |
|
Ah! Thanks. Your analysis - both for why these new tests are flaky and for why the existing ones aren't - is persuasive to me. In that case, I think adding a |
|
Adding |
Tasty and hspec are both testing frameworks: they are meant to satisfy the same need, and they're not really intended to be used together. See e.g. this comment from the tasty-hspec Haddocks: > hspec and tasty serve similar purposes; consider using one or the > other. > > However, in a pinch, this module allows you to run an hspec Spec as a > tasty TestTree. I'd like to switch back to using hspec throughout for two reasons: 1. With hspec, failures are summarised at the end of the test output. Tasty reports failures inline, which can be very annoying, especially in CI, as you then have to search through the logs to find the failure. Displaying failures at the end seems like a much more sensible default to me. 2. We get to take advantage of HasCallStack in hspec-expectations to help narrow down where a failure is coming from. This is particularly helpful in tests like the ones in TestMake.hs where we run something like ``someAction `shouldReturn` someResult`` more than once in a particular test, as without the source location from HasCallStack it's not immediately clear which assertion is failing. (This would have been useful in purescript#4053). In purescript#2848, when tasty was first introduced, one of the main benefits we were getting was that it would let us run individual tests via CLI options. This is still possible with this PR: hspec has a `--match` CLI option for this. There's no `--accept` flag for golden tests, but instead you can set an `HSPEC_ACCEPT` environment variable, or simply delete the files you want to regenerate.
Description of the change
Fixes #3911, #3914: prevents an issue where a module would be considered
up-to-date if only some of the requested codegen targets were up-to-date
(as opposed to all of them).
Checklist: