1

My concern is adding string to array of string, but I wanted to make sure that this string is unique before inserting into the array. I searched and found many approaches for this but my concern is to make faster rather than checking all array elements for duplicate before adding the string, so I decided to do the following:

  1. Get the string (URL from URL Mining Project, that may return thousands of URLs and may be duplicated in sometimes, as cross referenced).
  2. Get the ASCII for all characters in the URL and add them up multiplied by the index of the char (this is to make unique identifier for each URL).
  3. This value in point 2 will be the index in the array to insert this URL in.
  4. The Problem now, this array should be dynamic (How to resize it depending on number of URLS I'm mining?).
  5. The array will be porous (means array with many nulls), is there any efficient way to get the cells that have values only?
  6. Below code is used to get the position for unique string.
int index = 1;
int position = 0;
string s = Console.ReadLine();
byte[] ASCIIValues = Encoding.ASCII.GetBytes(s);

foreach(byte b in ASCIIValues) 
{
    position += b * index;
    index++;
    Console.WriteLine(b);
}
6
  • 2
    Why not use a list (List<T>)? You may always check if list.Contains(value). Or .Add(value). Commented Feb 8, 2017 at 9:15
  • 7
    Try HashSet<T> Commented Feb 8, 2017 at 9:17
  • or just use a hash algorithm (SHA256) and insert this into a dictionary. you can check the same hash already exists or not before inserting. Commented Feb 8, 2017 at 9:30
  • you can try DateTime.Now().ToString() i always be unique Commented Feb 8, 2017 at 9:33
  • I will try to use this technique Commented Feb 8, 2017 at 9:37

2 Answers 2

9

As mentioned in the comments a HashSet would be the collection to use for this case. It represents a (unique) set of values and has O(1) lookup. So you would just loop the strings you want to insert and add them to the set. If the string is already in there it will not be added again.

var set = new HashSet<string>();
foreach(var s in strings)
   set.Add(s);
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks Magnus, I optimized my code with hashset very well, and working fine until now, although I did not test it on huge amount of elements yet,
0

I used Dictionary and managed to solve it ..please check my code in below link

Hashset handling to avoid stuck in loop during iteration

and although I used proc that add two dictionary and make sure that there is no duplicate, sometime my code gives error that there was try to add a duplicate key !!!

below code I found it somewhere and works fine and in above link I used iteration to add remove during the iteration.

 public static void Add2Dic(IDictionary firstDict, IDictionary secondDict, bool bReplaceIfExists)
    {
        foreach (object key in firstDict.Keys)
        {
            if (!secondDict.Contains(key))
                secondDict.Add(key, firstDict[key]);
            else if (bReplaceIfExists)
                secondDict[key] = firstDict[key];
        }
    }

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.