83

Question is simple:

I have two List

List<String> columnsOld = DBUtils.GetColumns(db, TableName);
List<String> columnsNew = DBUtils.GetColumns(db, TableName);

And I need to get the intersection of these. Is there a quick way to achieve this?

2
  • 1
    @JohnnyCoder seriously? Commented Jun 3, 2015 at 13:33
  • @Ungeheuer that doesn't work if you want duplicates only to be included if they are in both lists Commented Jan 8, 2020 at 13:13

9 Answers 9

133

You can use retainAll method:

columnsOld.retainAll (columnsNew);
Sign up to request clarification or add additional context in comments.

3 Comments

Note: for this to work with other objects than String, you need of course to implement equals and hashCode.
The code is simple but the algorithmic complexity is poor: O(n×m), versus O(n+m) for the set version. With two million-item lists it's the difference between trillions of operations and millions of operations.
if you use retainAll on Lists, it runs O(n^2)
30

Using Google's Guava library:

Sets.intersection(Sets.newHashSet(setA), Sets.newHashSet(setB))

Note: This is much more efficient than naively doing the intersection with two lists: it's O(n+m), versus O(n×m) for the list version. With two million-item lists it's the difference between millions of operations and trillions of operations.

Comments

20

Since retainAll won't touch the argument collection, this would be faster:

List<String> columnsOld = DBUtils.GetColumns(db, TableName); 
List<String> columnsNew = DBUtils.GetColumns(db, TableName); 

for(int i = columnsNew.size() - 1; i > -1; --i){
    String str = columnsNew.get(i);
    if(!columnsOld.remove(str))
        columnsNew.remove(str);
}

The intersection will be the values left in columnsNew. Removing already compared values fom columnsOld will reduce the number of comparisons needed.

4 Comments

But your code definitly should be extracted to a new separate method because it's absolutely unclear from this code what does it do. And I also wouln't have refused an additional unit test for this code.
Agree, good method separation, naming and unit tests is always rule number one.
Shouldn't this method add the elements that can not be found in the columnsOld to the columnsNew? It looks like those elements will be missing in the result.
The optimization of removing columns from columnsOld might actually make no difference (the remove has itself a cost) or even be slower in cases like ArrayList where a remove shifts the elements.
8

How about

private List<String> intersect(List<String> A, List<String> B) {
    List<String> rtnList = new LinkedList<>();
    for(String dto : A) {
        if(B.contains(dto)) {
            rtnList.add(dto);
        }
    }
    return rtnList;
}

2 Comments

If B contains elements which are not contained in A, there is no need to iterate over those elements because we are trying to find all elements in both A and B.
This is O(n^2) ! you should use contains on a Set
4

using retainAll if don't care occurrences, otherwise using N.intersection

a = N.asList(12, 16, 16, 17, 19);
b = N.asList(16, 19, 107);
a.retainAll(b); // [16, 16, 19]
N.println(a);

a = N.asList(12, 16, 16, 17, 19);
b = N.asList(16, 19, 107);
a = N.intersect(a, b);
N.println(a); // [16, 19]

N is an utility class in abacus-common

Comments

3

There is a nice way with streams which can do this in one line of code and you can two lists which are not from the same type which is not possible with the containsAll method afaik:

columnsOld.stream().filter(c -> columnsNew.contains(c)).collect(Collectors.toList());

An example for lists with different types. If you have a realtion between foo and bar and you can get a bar-object from foo than you can modify your stream:

List<foo> fooList = new ArrayList<>(Arrays.asList(new foo(), new foo()));
List<bar> barList = new ArrayList<>(Arrays.asList(new bar(), new bar()));

fooList.stream().filter(f -> barList.contains(f.getBar()).collect(Collectors.toList());

3 Comments

c -> columnsNew.contains(c) lambda can be rewritten more concise as a method reference: columnsNew::contains.
won't this run in O(n^2) time though?
This is O(n^2) ! you should use contains on a Set
3

If you put the second list in a set say HashSet. And just iterate over the first list checking for presence on the set and removing if not present, your first list will eventually have the intersection you need. It will be way faster than retainAll or contains on a list. The emphasis here is to use a set instead of list. Lookups are O(1). firstList.retainAll (new HashSet (secondList)) will also work.

Comments

1

use org.apache.commons.collections4.ListUtils#intersection

Comments

0

With Java 8 Stream API (and Java 9 List.of()) you can do following:

List<Integer> list1 = List.of(1, 1, 2, 2);
List<Integer> list2 = List.of(2, 2, 3, 3);

List<Integer> intersection = list1.stream()
    .filter(list2::contains)
    .distinct()
    .collect(Collectors.toList()); 

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.