Skip to content

Float compiler-synthesized function applications#3915

Merged
JordanMartinez merged 1 commit intopurescript:masterfrom
rhendric:cse-for-typeclass-dicts
May 18, 2022
Merged

Float compiler-synthesized function applications#3915
JordanMartinez merged 1 commit intopurescript:masterfrom
rhendric:cse-for-typeclass-dicts

Conversation

@rhendric
Copy link
Copy Markdown
Member

@rhendric rhendric commented Aug 3, 2020

This commit implements a common subexpression elimination pass during
CoreFn optimization. The optimizer only targets Apps that the compiler
has created, meaning: applying functions to typeclass dictionaries,
accessing superclasses from a typeclass dictionary, or creating new
instances of IsSymbol. Furthermore, the pass only moves subexpressions
if they can be moved to an outer function or if the subexpression
appears more than once inside the scope to which it would be moved.


This is a start at what I'm told is the uncontroversial part of #687. Here's some before-and-after:

(before)

var ap = function (dictMonad) {
    return function (f) {
        return function (a) {
            return Control_Bind.bind(dictMonad.Bind1())(f)(function (f$prime) {
                return Control_Bind.bind(dictMonad.Bind1())(a)(function (a$prime) {
                    return Control_Applicative.pure(dictMonad.Applicative0())(f$prime(a$prime));
                });
            });
        };
    };
};

(after)

var ap = function (dictMonad) {
    var bind = Control_Bind.bind(dictMonad.Bind1());
    var pure = Control_Applicative.pure(dictMonad.Applicative0());
    return function (f) {
        return function (a) {
            return bind(f)(function (f$prime) {
                return bind(a)(function (a$prime) {
                    return pure(f$prime(a$prime));
                });
            });
        };
    };
};

I don't know many public PureScript projects that have sufficiently long-running automated tests to be suitable for benchmarking, but purescript-bytestrings is one. On that project, this patch (backported to the 0.13.x branch) increased total compilation time by about 2% and decreased test execution time by about 20%. I would be happy to run the same benchmarks on any other project you suggest.

In order to verify that the optimization happens, a new (small) suite of tests in tests/purs/optimize has been added. In this folder, *.purs files are compiled and the result is compared to the contents of the corresponding .out.js file. These tests are potentially more expensive than usual in terms of maintenance burden, because they are sensitive to any change in the compiled output, but they can be used to prevent regressions in optimizations that are too subtle to be tested by other means.

@rhendric rhendric force-pushed the cse-for-typeclass-dicts branch from 7f0cf1e to 3d0d6d9 Compare August 3, 2020 22:10
@natefaubion
Copy link
Copy Markdown
Contributor

One potential benchmark might be the one in purescript-run
https://github.com/natefaubion/purescript-run/blob/0946aefaaa360910d72a298d702d72af07b1adcb/test/Bench.purs

@rhendric
Copy link
Copy Markdown
Member Author

rhendric commented Aug 4, 2020

Thanks! I will try that as soon as I fix the TCO regression in purescript-free that this revealed. 😁

@rhendric rhendric force-pushed the cse-for-typeclass-dicts branch from 3d0d6d9 to 65e90ff Compare August 4, 2020 17:44
@rhendric rhendric changed the title Lift compiler-synthesized function applications Float compiler-synthesized function applications Aug 4, 2020
@rhendric
Copy link
Copy Markdown
Member Author

rhendric commented Aug 4, 2020

I could not measure a significant time difference compiling purescript-run with 0.13.8 versus 0.13.8 plus this patch.

Running Bench.purs produced the following output:

0.13.8:

Transformers (monomorphic/trampoline)
mean   = 105.21 ms
stddev = 33.67 ms
min    = 82.96 ms
max    = 352.80 ms
Transformers (monomorphic/identity)
mean   = 46.71 ms
stddev = 14.51 ms
min    = 37.98 ms
max    = 127.36 ms
Transformers (mtl/trampoline)
mean   = 182.59 ms
stddev = 35.99 ms
min    = 153.22 ms
max    = 415.49 ms
Transformers (mtl/identity)
mean   = 100.16 ms
stddev = 24.06 ms
min    = 75.20 ms
max    = 196.17 ms
Run (free)
mean   = 28.73 ms
stddev = 31.97 ms
min    = 21.52 ms
max    = 299.17 ms

0.13.8 + patch:

Transformers (monomorphic/trampoline)
mean   = 62.52 ms
stddev = 33.37 ms
min    = 52.73 ms
max    = 376.43 ms
Transformers (monomorphic/identity)
mean   = 19.94 ms
stddev = 4.69 ms
min    = 9.62 ms
max    = 40.80 ms
Transformers (mtl/trampoline)
mean   = 99.94 ms
stddev = 22.32 ms
min    = 85.75 ms
max    = 279.82 ms
Transformers (mtl/identity)
mean   = 36.07 ms
stddev = 9.28 ms
min    = 20.93 ms
max    = 68.23 ms
Run (free)
mean   = 31.56 ms
stddev = 41.24 ms
min    = 20.69 ms
max    = 304.61 ms

Everything is significantly faster except for Run, which looks like it's about the same.

@natefaubion
Copy link
Copy Markdown
Contributor

natefaubion commented Aug 4, 2020

I wouldn't expect an improvement for Run. It's benchmark is not going to allocate dictionaries, since they are totally monomorphic. The MTL dictionaries are allocated at every bind, however, so you get a drastic speedup due to reduced allocation.

In particular this brings mtl/identity into the same park as Run, which indicates that it's overhead is no longer in dictionary allocation. This is pretty inline with comparing freer libraries with mtl in Haskell without inlining optimizations.

@rhendric rhendric force-pushed the cse-for-typeclass-dicts branch from 65e90ff to 4bf7699 Compare September 27, 2020 17:55
@rhendric rhendric force-pushed the cse-for-typeclass-dicts branch from 4bf7699 to 33b66bc Compare April 8, 2021 23:42
@nwolverson
Copy link
Copy Markdown
Contributor

I tried out this change with the purerl backend, together with some caching of typeclass dictionary creation there was a significant performance benefit.

@rhendric rhendric force-pushed the cse-for-typeclass-dicts branch from 33b66bc to 3898203 Compare May 28, 2021 02:58
nwolverson added a commit to purerl/purerl that referenced this pull request Jun 9, 2021
- To use memoization, define (eg in rebar.config) PURERL_MEMOIZE (eg to true)
  and enable memoization parse transform.
- purs change to hoist compiler synthesised function applications
  (purescript/purescript#3915) required to make much use of this
@rhendric rhendric force-pushed the cse-for-typeclass-dicts branch 3 times, most recently from 416b160 to 56d6837 Compare June 23, 2021 00:40
@JordanMartinez
Copy link
Copy Markdown
Contributor

The Windows CI build failed with this error:

  purescript    > Failures:
  purescript    > 
  purescript    >   tests\TestCompiler.hs:201:7: 
  purescript    >   1) compiler, Optimization examples, 'Monad.purs' should compile to expected output
  purescript    >        Test output differed from 'tests\purs\optimize\Monad.out.js'; got:
  purescript    >        "use strict";
  purescript    >        var Control_Applicative = require("../Control.Applicative/index.js");
  purescript    >        var Control_Bind = require("../Control.Bind/index.js");
  purescript    >        var Monad = function (Applicative0, Bind1) {
  purescript    >            this.Applicative0 = Applicative0;
  purescript    >            this.Bind1 = Bind1;
  purescript    >        };
  purescript    >        var liftM1 = function (dictMonad) {
  purescript    >            var bind = Control_Bind.bind(dictMonad.Bind1());
  purescript    >            var pure = Control_Applicative.pure(dictMonad.Applicative0());
  purescript    >            return function (f) {
  purescript    >                return function (a) {
  purescript    >                    return bind(a)(function (a$prime) {
  purescript    >                        return pure(f(a$prime));
  purescript    >                    });
  purescript    >                };
  purescript    >            };
  purescript    >        };
  purescript    >        var ap = function (dictMonad) {
  purescript    >            var bind = Control_Bind.bind(dictMonad.Bind1());
  purescript    >            var pure = Control_Applicative.pure(dictMonad.Applicative0());
  purescript    >            return function (f) {
  purescript    >                return function (a) {
  purescript    >                    return bind(f)(function (f$prime) {
  purescript    >                        return bind(a)(function (a$prime) {
  purescript    >                            return pure(f$prime(a$prime));
  purescript    >                        });
  purescript    >                    });
  purescript    >                };
  purescript    >            };
  purescript    >        };
  purescript    >        module.exports = {
  purescript    >            Monad: Monad,
  purescript    >            liftM1: liftM1,
  purescript    >            ap: ap
  purescript    >        };
  purescript    > 
  purescript    >   To rerun use: --match "/compiler/Optimization examples/'Monad.purs' should compile to expected output/"
  purescript    > 
  purescript    >   tests\TestCompiler.hs:201:7: 
  purescript    >   2) compiler, Optimization examples, 'Symbols.purs' should compile to expected output
  purescript    >        Test output differed from 'tests\purs\optimize\Symbols.out.js'; got:
  purescript    >        "use strict";
  purescript    >        var Data_Symbol = require("../Data.Symbol/index.js");
  purescript    >        var Record_Unsafe = require("../Record.Unsafe/index.js");
  purescript    >        var Type_Proxy = require("../Type.Proxy/index.js");
  purescript    >        var get = function (dictIsSymbol) {
  purescript    >            var reflectSymbol = Data_Symbol.reflectSymbol(dictIsSymbol);
  purescript    >            return function (dictCons) {
  purescript    >                return function (l) {
  purescript    >                    return function (r) {
  purescript    >                        return Record_Unsafe.unsafeGet(reflectSymbol(l))(r);
  purescript    >                    };
  purescript    >                };
  purescript    >            };
  purescript    >        };
  purescript    >        var get1 = get(new Data_Symbol.IsSymbol(function () {
  purescript    >            return "foo";
  purescript    >        }))();
  purescript    >        var foo = Type_Proxy["Proxy"].value;
  purescript    >        var g = function (n) {
  purescript    >            return get1(foo)({
  purescript    >                foo: n,
  purescript    >                bar: 42
  purescript    >            });
  purescript    >        };
  purescript    >        var f = function (n) {
  purescript    >            return get1(foo)({
  purescript    >                foo: n
  purescript    >            });
  purescript    >        };
  purescript    >        module.exports = {
  purescript    >            get: get,
  purescript    >            foo: foo,
  purescript    >            f: f,
  purescript    >            g: g
  purescript    >        };
  purescript    > 
  purescript    >   To rerun use: --match "/compiler/Optimization examples/'Symbols.purs' should compile to expected output/"
  purescript    > 
  purescript    > Randomized with seed 881715853
  purescript    > 

@rhendric
Copy link
Copy Markdown
Member Author

Yeah, I'm trying to figure out why. My current guess is line endings?

@rhendric rhendric force-pushed the cse-for-typeclass-dicts branch 2 times, most recently from 6a2eb56 to a272d41 Compare June 23, 2021 03:13
@rhendric
Copy link
Copy Markdown
Member Author

I've updated this PR to take advantage of the renamer improvements in #4096; the CSE pass now floats definitions all the way to the outermost scope, when possible, which improves sharing and decreases the number of IIFEs.

I've also finally added a few tests that verify that the optimization happens by comparing compiler output to golden files. Other optimizations can also be tested in this suite, in the future.

@JordanMartinez
Copy link
Copy Markdown
Contributor

Nice!

@@ -0,0 +1 @@
*.out.js -text
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This, for anyone who's curious, is my solution to the tests/purs/optimize tests failing on Windows. Git has some sort of heuristic for determining whether a file is text—I suspect it's based on the extension—and if it believes a file is text, by default (at least on the GitHub runners?) it will normalize the newlines in that file to the operating system's default when the file is checked out, and map them back on commit. (Obviously, this breaks tests if it happens to golden test files.) .gitattributes files can be used to tell Git, among other things, that certain files are not text and therefore shouldn't have this behavior.

There are other approaches—we could use git config to disable newline conversion globally on GitHub runners, for example, but that wouldn't fix things for developers on Windows. I'm refreshing some rather old memories about all this now so I don't know if this is the best possible solution, but it seems pretty good.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're doing the same thing for the golden layout tests here:

(Maybe consider also adding -merge here)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh thanks, that makes me feel better about this. Must not be extension-based then; probably it's sniffing out non-printable characters or something.

I don't think -merge is needed here; I think these files will merge pretty cleanly.

Copy link
Copy Markdown
Member

@kritzcreek kritzcreek Jun 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The -merge is to make sure Git doesn't automatically merge when pulling. Golden files always need to be accepted wholesale, trying to "merge" them by combining lines from multiple branches is basically always wrong.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the important question whether the changes made to the files have good locality? If branch A changes the way imports are encoded in JS, and branch B changes the way that TCO loops get compiled, both of those branches change the compiled JS output but in different parts of the file, and Git's merge algorithm will probably successfully produce the golden file which should result from the combination of the features, right?

@JordanMartinez
Copy link
Copy Markdown
Contributor

Just FYI. There are 3 merge conflicts here. The first two are obvious, but the third one touches code I'm not familiar with. So I didn't resolve the conflict here.

@rhendric rhendric force-pushed the cse-for-typeclass-dicts branch from cdaa8bc to 0eab8ce Compare July 26, 2021 06:31
@rhendric
Copy link
Copy Markdown
Member Author

Thanks; sorted!

purefunctor pushed a commit to vtrl/purescript that referenced this pull request Nov 12, 2021
@rhendric rhendric marked this pull request as draft February 27, 2022 05:19
@rhendric
Copy link
Copy Markdown
Member Author

Work on this is paused pending a resolution to #4179.

@rhendric
Copy link
Copy Markdown
Member Author

Hmm, that's a thornier case, and decent evidence that I should rethink how I'm approaching this aspect of the problem.

Update on this thread: This PR has (passing) tests for both of the cases @nwolverson submitted. Having gotten #4283 in first meant that I no longer need to change anything in this PR about how binding groups work; I'm pretty sure it was my previously-necessary tinkering with the binding groups desugaring that caused this class of issues. So I think this thread is addressed—although I don't know if @nwolverson will be able to confirm until the purerl backend is compliant with #4283.

Glob >=0.10.1 && <0.11,
haskeline >=0.8.2 && <0.9,
language-javascript ==0.7.0.0,
lens >=4.19.2 && <4.20,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is tangential to the PR, but I thought I'd still bring it up. We already depend on microlens. By adding yet another lens library as a dependency, it makes the binary larger. I'm assuming that it's being added here because microlens doesn't suffice. And if that's the case, then could we refactor the codebase to remove microlens so that we only depend on one lens library?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

monoidal-containers (which I added in #4283 before realizing this fact) already depends on lens; if lens and its dependencies (most of which we were already pulling in even before monoidal-containers) result in too much binary bloat, I'll need to inline the parts of monoidal-containers we're using. I don't actually know how much of a difference in binary size we're talking about here, with all the optimizations on; do you?

We can absolutely replace the various microlens usages with lens—I've done so in a local branch—but I figured I'd submit that in a separate PR if we merge this deciding we want to keep lens.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comment comes mainly from reading a blogpost a couple of years ago pointing out an issue like this: a project depends on a small lens library but transitively depends on all the other lens libraries such that the author just wondered why one didn't just depend on lens and call it a day. (Or something like that).

Could you clarify for me what the difference between microlens and lens is (if any) besides microlens just being smaller? For example, do they use the same underlying encoding for optics?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding, based on little more than a skimming of microlens's README and some unattributable background impressions, is that the implementation of microlens is more or less directly lifted from lens; the main if not only difference is how much API and instances are provided and how many dependencies are taken.


instance Ord k => Semigroup (PluralityMap k) where
PluralityMap l <> PluralityMap r =
PluralityMap $ M.mapWithKey (\k -> (|| k `M.member` r)) l `M.union` r
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify, is this

PluralityMap $ (M.mapWithKey (\k -> (|| k `M.member` r)) l) `M.union` r

or this?

PluralityMap $ M.mapWithKey (\k -> (|| k `M.member` r)) (l `M.union` r)

I believe it's the second one.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's the first, actually. Prefix function application has higher precedence than all operators, including backticked identifiers.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😄 Maybe we should add parenthesis here just to ensure clarity? Though I don't think there's any difference between the two.

Copy link
Copy Markdown
Member Author

@rhendric rhendric May 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the only difference would be performance, if any. I think I'd have to fight with hlint about those parentheses but I'll try and see.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Oh, I shouldn't have tried to comment on this while going to bed. There is an important difference between the two: if l = PluralityMap M.empty and r = PluralityMap $ M.singleton 0 False, then in the first, l <> r == PluralityMap $ M.singleton 0 False, but in the second, l <> r == PluralityMap $ M.singleton 0 True.)

Copy link
Copy Markdown
Member

@purefunctor purefunctor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be nice to have optimization tests for recursive computation in type classes, like so:

module Main where

import Prelude

import Prim.Row as R
import Prim.RowList as RL
import Type.Prelude (class IsSymbol, Proxy(..), reflectSymbol)

class FindKeysAux :: forall k. RL.RowList k -> Constraint
class FindKeysAux a where
  findKeysAux :: Proxy a -> Array String

instance FindKeysAux RL.Nil where
  findKeysAux _ = []

else instance (IsSymbol l, FindKeysAux r) => FindKeysAux (RL.Cons l t r) where
  findKeysAux _ = [ reflectSymbol (Proxy :: Proxy l) ] <> findKeysAux (Proxy :: Proxy r)

findKeys :: forall r rl. RL.RowToList r rl => FindKeysAux rl => Proxy r -> Array String
findKeys _ = findKeysAux (Proxy :: Proxy rl)

findKeys1 = findKeys (Proxy :: Proxy (a :: Unit))
findKeys2 = findKeys (Proxy :: Proxy (a :: Unit, b :: Unit))
findKeys3 = findKeys (Proxy :: Proxy (a :: Unit, b :: Unit, c :: Unit))
findKeys4 = findKeys (Proxy :: Proxy (a :: Unit, b :: Unit, c :: Unit, d :: Unit))
findKeys5 = findKeys (Proxy :: Proxy (a :: Unit, b :: Unit, c :: Unit, d :: Unit, e :: Unit))
findKeys6 = findKeys (Proxy :: Proxy (a :: Unit))
findKeys7 = findKeys (Proxy :: Proxy (a :: Unit, b :: Unit))
findKeys8 = findKeys (Proxy :: Proxy (a :: Unit, b :: Unit, c :: Unit))
findKeys9 = findKeys (Proxy :: Proxy (a :: Unit, b :: Unit, c :: Unit, d :: Unit))
findKeys10 = findKeys (Proxy :: Proxy (a :: Unit, b :: Unit, c :: Unit, d :: Unit, e :: Unit))

@rhendric rhendric added this to the v0.15.2 milestone May 11, 2022
This commit implements a common subexpression elimination pass during
CoreFn optimization. The optimizer only targets `App`s that the compiler
has created, meaning: applying functions to typeclass dictionaries,
accessing superclasses from a typeclass dictionary, or creating new
instances of `IsSymbol`. Furthermore, the pass only moves subexpressions
if they can be moved to an outer function or if the subexpression
appears more than once inside the scope to which it would be moved.
@rhendric rhendric force-pushed the cse-for-typeclass-dicts branch from 7e39d09 to 283c057 Compare May 15, 2022 23:04
@JordanMartinez
Copy link
Copy Markdown
Contributor

@rhendric Should we pull the trigger and merge this? @purefunctor approved after your latest changes, so it's technically valid. Or are you still testing this or waiting for others to review it?

@rhendric
Copy link
Copy Markdown
Member Author

I'm on board with merging. I figure getting a prerelease of this out early in the release cycle is probably our best strategy for getting it tested by more people before it shows up in a release.

@JordanMartinez
Copy link
Copy Markdown
Contributor

I'm on board with merging. I figure getting a prerelease of this out early in the release cycle is probably our best strategy for getting it tested by more people before it shows up in a release.

Agreed. We might want to make a Discourse announcement about this as well. That will get more people to try it out.

@JordanMartinez JordanMartinez merged commit d721f48 into purescript:master May 18, 2022
@rhendric rhendric deleted the cse-for-typeclass-dicts branch May 18, 2022 18:20
@rhendric
Copy link
Copy Markdown
Member Author

Rad, I'll write that up.

@nwolverson
Copy link
Copy Markdown
Contributor

I've updated purerl to work with the latest, and I'm happy to say I see no problems in our projects - and it enables a signifcant performance improvement. Happy to see this merged!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants