Efficient intersection of two List<String> in Java?

Question

Question is simple:

I have two List

List<String> columnsOld = DBUtils.GetColumns(db, TableName);
List<String> columnsNew = DBUtils.GetColumns(db, TableName);

And I need to get the intersection of these. Is there a quick way to achieve this?

@Ungeheuer that doesn't work if you want duplicates only to be included if they are in both lists — xeruf
– xeruf, Commented Jan 8, 2020 at 13:13

Paolo Fulgoni · Accepted Answer · 2014-07-10 07:04:45Z

133

You can use retainAll method:

columnsOld.retainAll (columnsNew);

edited Jul 10, 2014 at 7:04

Paolo Fulgoni

5,5563 gold badges42 silver badges55 bronze badges

answered Mar 8, 2010 at 11:17

Roman

66.5k93 gold badges247 silver badges342 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Topper Harley Over a year ago

Note: for this to work with other objects than String, you need of course to implement equals and hashCode.

John Kugelman Over a year ago

The code is simple but the algorithmic complexity is poor: O(n×m), versus O(n+m) for the set version. With two million-item lists it's the difference between trillions of operations and millions of operations.

razor Over a year ago

if you use retainAll on Lists, it runs O(n^2)

John Kugelman · Accepted Answer · 2020-05-08 11:38:36Z

30

Using Google's Guava library:

Sets.intersection(Sets.newHashSet(setA), Sets.newHashSet(setB))

Note: This is much more efficient than naively doing the intersection with two lists: it's O(n+m), versus O(n×m) for the list version. With two million-item lists it's the difference between millions of operations and trillions of operations.

edited May 8, 2020 at 11:38

John Kugelman

365k70 gold badges555 silver badges600 bronze badges

answered Mar 28, 2013 at 14:27

Serhii Shevchyk

39.7k12 gold badges53 silver badges63 bronze badges

Comments

bjornhol · Accepted Answer · 2010-03-08 12:24:03Z

20

Since retainAll won't touch the argument collection, this would be faster:

List<String> columnsOld = DBUtils.GetColumns(db, TableName); 
List<String> columnsNew = DBUtils.GetColumns(db, TableName); 

for(int i = columnsNew.size() - 1; i > -1; --i){
    String str = columnsNew.get(i);
    if(!columnsOld.remove(str))
        columnsNew.remove(str);
}

The intersection will be the values left in columnsNew. Removing already compared values fom columnsOld will reduce the number of comparisons needed.

answered Mar 8, 2010 at 12:24

bjornhol

5502 silver badges7 bronze badges

4 Comments

Roman Over a year ago

But your code definitly should be extracted to a new separate method because it's absolutely unclear from this code what does it do. And I also wouln't have refused an additional unit test for this code.

bjornhol Over a year ago

Agree, good method separation, naming and unit tests is always rule number one.

Calon Over a year ago

Shouldn't this method add the elements that can not be found in the columnsOld to the columnsNew? It looks like those elements will be missing in the result.

Bogdan Calmac Over a year ago

The optimization of removing columns from columnsOld might actually make no difference (the remove has itself a cost) or even be slower in cases like ArrayList where a remove shifts the elements.

Gigas · Accepted Answer · 2013-01-05 22:45:52Z

8

How about

private List<String> intersect(List<String> A, List<String> B) {
    List<String> rtnList = new LinkedList<>();
    for(String dto : A) {
        if(B.contains(dto)) {
            rtnList.add(dto);
        }
    }
    return rtnList;
}

answered Jan 5, 2013 at 22:45

Gigas

971 silver badge1 bronze badge

2 Comments

juan2raid Over a year ago

If B contains elements which are not contained in A, there is no need to iterate over those elements because we are trying to find all elements in both A and B.

razor Over a year ago

This is O(n^2) ! you should use contains on a Set

user_3380739 · Accepted Answer · 2021-03-27 04:07:28Z

4

using retainAll if don't care occurrences, otherwise using N.intersection

a = N.asList(12, 16, 16, 17, 19);
b = N.asList(16, 19, 107);
a.retainAll(b); // [16, 16, 19]
N.println(a);

a = N.asList(12, 16, 16, 17, 19);
b = N.asList(16, 19, 107);
a = N.intersect(a, b);
N.println(a); // [16, 19]

N is an utility class in abacus-common

edited Mar 27, 2021 at 4:07

answered Nov 9, 2016 at 0:20

user_3380739

1,32416 silver badges15 bronze badges

Comments

Deutro · Accepted Answer · 2014-09-11 15:09:24Z

3

There is a nice way with streams which can do this in one line of code and you can two lists which are not from the same type which is not possible with the containsAll method afaik:

columnsOld.stream().filter(c -> columnsNew.contains(c)).collect(Collectors.toList());

An example for lists with different types. If you have a realtion between foo and bar and you can get a bar-object from foo than you can modify your stream:

List<foo> fooList = new ArrayList<>(Arrays.asList(new foo(), new foo()));
List<bar> barList = new ArrayList<>(Arrays.asList(new bar(), new bar()));

fooList.stream().filter(f -> barList.contains(f.getBar()).collect(Collectors.toList());

answered Sep 11, 2014 at 15:09

Deutro

3,3734 gold badges20 silver badges27 bronze badges

3 Comments

Андрей Щеглов Over a year ago

c -> columnsNew.contains(c) lambda can be rewritten more concise as a method reference: columnsNew::contains.

Aaron_H Over a year ago

won't this run in O(n^2) time though?

razor Over a year ago

This is O(n^2) ! you should use contains on a Set

Ravi Sanwal · Accepted Answer · 2016-05-06 23:56:45Z

3

If you put the second list in a set say HashSet. And just iterate over the first list checking for presence on the set and removing if not present, your first list will eventually have the intersection you need. It will be way faster than retainAll or contains on a list. The emphasis here is to use a set instead of list. Lookups are O(1). firstList.retainAll (new HashSet (secondList)) will also work.

answered May 6, 2016 at 23:56

Ravi Sanwal

6546 silver badges14 bronze badges

Comments

Dheeraj Sachan · Accepted Answer · 2019-04-09 08:09:16Z

1

use org.apache.commons.collections4.ListUtils#intersection

answered Apr 9, 2019 at 8:09

Dheeraj Sachan

4,1932 gold badges19 silver badges20 bronze badges

Comments

Mišo Stankay · Accepted Answer · 2022-05-26 08:33:58Z

0

With Java 8 Stream API (and Java 9 List.of()) you can do following:

List<Integer> list1 = List.of(1, 1, 2, 2);
List<Integer> list2 = List.of(2, 2, 3, 3);

List<Integer> intersection = list1.stream()
    .filter(list2::contains)
    .distinct()
    .collect(Collectors.toList());

answered May 26, 2022 at 8:33

Mišo Stankay

3391 silver badge8 bronze badges

Collectives™ on Stack Overflow

Efficient intersection of two List<String> in Java?

9 Answers 9

3 Comments

Comments

4 Comments

2 Comments

Comments

3 Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

9 Answers 9

3 Comments

Comments

4 Comments

2 Comments

Comments

3 Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related