Poor groupBy performance

Running this code:

``` java
import fj.data.List;
import fj.data.TreeMap;

public class GroupByPerf
{
 public static void main(String[] args)
 {
 System.out.println("warmup");
 for (int i = 0; i < 10; i++)
 {
 testIt(5000);

 }

 System.out.println("real run");
 for (int i = 0; i < 100000; i++)
 {
 testIt(50000);

 }
 }

 public static void testIt(int size)
 {
 List<Shrubbery> sh = List.range(0, size).map(x -> new Shrubbery(x%20, String.valueOf(x)));
 long now = System.currentTimeMillis();
 TreeMap<Long, List<Shrubbery>> grouped = sh.groupBy(x -> x.id);
 System.out.println("duration " + (System.currentTimeMillis()-now));
 }

 public static class Shrubbery
 {
 public final long id;
 public final String name;

 public Shrubbery(long id, String name)
 {
 this.id = id;
 this.name = name;
 }
 }
```

, yields ~350 millisecond run time for each iteration post the warmup phase, on my machine.

I did a flight recording with jmc, a large portion of the time is spent rebalancing the underlying set, or otherwise in the set.

I took a shot at a naive hashmap implementation, of course the semantic of the method is changed, but they are still grouped:

``` java
private static <A, B> HashMap<A, List> groupBy(List list, F<B,A> keyF)
 {
 HashMap<A, List.Buffer> res = HashMap.hashMap();
 list.foreachDoEffect(x-> {
 A key = keyF.f(x);
 Option<List.Buffer> entry = res.get(key);
 if(entry.isSome())
 {
 entry.some().snoc(x);
 }
 else
 {
 res.set(key, List.Buffer.empty().snoc(x));
 }
 });

 return res.map(x->x, x->x.toList());
 }
```

The performance of the above code is roughly on par with javautil Stream. (1-2 millis per run)

Do you think should be at least a variant of the groupBy method which does not use a red-black tree? I am sure there are good reasons the signature is as it is, but also I think 300ms for 50000 elements is not something one will use in production.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Poor groupBy performance #288

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Poor groupBy performance #288

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions