1

let's say I have the following data structure:

   var someList=  new List<int>[]{
                     new List<int>() {-1,-1,2},
                     new List<int>() {-1,0,1}
                   });

I need to able to avoid:

someList.Add(new int[] {-1,1,0});

since [-1,0,1] are in the set already and I cannot allow duplicate sets.

Values can come in any other and still I need to ensure that there are no duplicates. So [0,-1,1] or [1,-1,0] are basically the same set.

This data structure will receive thousands of entries.

What do you think is the best way to ensure uniqueness?

I thought about using a dictionary, however, I am not quite sure since I will be increasing the memory and probably processing and I wonder if it is avoidable.

10
  • 4
    HashSet eliminates duplicates. As for not being quite sure about memory and processing time: benchmark and profile. If your sets and/or your elements are always small there are many ways to optimize this in terms of storage and/or processing time, but premature optimization is the root of all evil. Commented Jun 30, 2021 at 18:06
  • 1
    instead of declaring var someArray as a list use a Set<T> of a custom wrapper to List<int> and define in there an equals method so you can tell apart duplicated elements Commented Jun 30, 2021 at 18:08
  • Thanks @JeroenMostert I will certainly try it Commented Jun 30, 2021 at 18:11
  • 3
    Also, @Dalorzo your someArray is an array of Lists, so it has no Add method. Commented Jun 30, 2021 at 18:16
  • 1
    By the way, Arrays don't have an Add method. You can replace things at an index, but an array of length 2 will always have items in the array. Commented Jun 30, 2021 at 18:27

1 Answer 1

2

HashSet guarantees uniqueness, but list, array or hashset are compared by reference, not their elements values, so you should implement your own comparer, like below. We order them first to guarantee that the order doesn't matter.

class SequenceComparer : IEqualityComparer<IEnumerable<int>>
{
    public bool Equals(IEnumerable<int> a,IEnumerable<int> b){
        if(a==null && b==null)
            return true;
        
        if(a==null || b==null)
            return false;       

        return a.OrderBy(x=>x).SequenceEqual(b.OrderBy(x=>x));      
    }
    public int GetHashCode(IEnumerable<int> a){
        var hashCode = new HashCode();      
        foreach(var el in a.OrderBy(x=>x)){
            hashCode.Add(el);
        }       
        return hashCode.ToHashCode();
    }
}

So, you can do like the following:

var comparer = new SequenceComparer();
var someArray = new HashSet<List<int>>(comparer){
                 new List<int>() {-1,-1,2},
                 new List<int>() {-1,0,1}
               };
someArray.Add(new List<int>() {-1,1,0});
Sign up to request clarification or add additional context in comments.

4 Comments

Your use of HashCode.Combine isn't how it should be done - you should have var result = new HashCode(); ... result.Add(el);...return result.ToHashCode();
@NetMage Thanks! Fixed.
if speed is critical you could even create a wrapper for List, and pre-calculate the hashcode. That would avoid using some iteratorst and doing some operations. I've not benchmarked, though...
It may also pay to make the comparer for List<int> and compare the lengths of a and b before calling SequenceEqual, which I don't believe will be able to optimize after OrderBy, assuming they are not all length 3.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.