I ran across a question today while browsing. The person asking wanted to know how to find duplicate entries in an int array, and the answer that someone else quickly gave was to make nested for loops to iterate over an array of integers. My first thought was a little different than that, so I looked around to see if it had been answered other places in the same way. Nested for loops seems to be the most common suggestion. I also found a few that related to strings that use a HashSet, which I have used to identify duplicates before. I wondered which way is really faster, which is where this quest begins.

I was tempted to go into some Big O notation ramblings, but it would take longer than I have here to analyze the methods fully, so let’s skip that and see what happens when we write some code and run it. It’s more fun to do an experiment anyway.

I intentionally duplicated code and logic in each section so each test would be as similar as possible, and I add the duplicates that are identified to an ArrayList<String> so I will have taken some action on it. It does add to the elapsed time, but it adds the same time for each test. When the sample size is small, you can print the duplicates ArrayList to see what was found. It’s probably not a good idea to do that when you start with 10,000 elements though.

## Sorted vs. Nested Loop:

The results were as I expected. The relative overhead of sorting the arrays is very high when there are only a few values to search. The efficiency, in my test environment, evens out at around 145 elements. When there are fewer than 145, the nested loop is faster than the sorted array, but when there are more, the sorted array with sequential search method is faster than the nested loop. The same holds true for Strings. If you have over 100,000 elements in the int test, the difference is more than 6 seconds in favor of the sorted method.

## HashSet Method:

For strings, the HashSet method is the clear winner in every case. Int arrays are a slightly different story. To put the number into the HashSet, in my test, I converted it to a String first. When the array contains 15000 or less elements, the HashSet is the same or slower than sorted, but when the array is larger, the HashSet method is quite a bit faster than either of the other two. Just for fun I tried one with 1,000,000 int elements, and the sorted array much faster than the HashSet.

## Conclusion:

If speed is your only concern, and you have some strict rules that your array size conforms to, then you can tune your code to match. Don’t go overboard with this though. For many programmers, 1.9ms is not something you really need to worry about. Some of us work in very high transaction rate systems, and every millisecond counts. We still need to weigh that against memory usage, so saving 1.9ms while burning a Megabyte of memory might not be the right way to go. Every application is going to be a little different, so do your homework and experiment a little. Take a look at the results and code I added below. If you have a different way that works better, I’d love to hear from you. I’d love to see what this looks like when it is multi-threaded.

## Results with 10 elements:

Random numbers generated = 10

sorted numbers duration = 558311

loopCounter = 10

unsorted numbers duration = 11733

loopCounter = 45

unsorted numbers in HashSet duration = 39600

loopCounter = 10

sorted strings duration = 494267

loopCounter = 10

unsorted strings duration = 26400

loopCounter = 45

unsorted strings in HashSet duration = 14178

loopCounter = 10

## Results with 100 elements:

Random numbers generated = 100

sorted numbers duration = 651689

loopCounter = 100

unsorted numbers duration = 387688

loopCounter = 4950

unsorted numbers in HashSet duration = 379867

loopCounter = 100

sorted strings duration = 997333

loopCounter = 100

unsorted strings duration = 871689

loopCounter = 4950

unsorted strings in HashSet duration = 276711

loopCounter = 100

## Results with 1,000 elements:

Random numbers generated = 1000

sorted numbers duration = 1973155

loopCounter = 1000

unsorted numbers duration = 23475470

loopCounter = 499500

unsorted numbers in HashSet duration = 3872978

loopCounter = 1000

sorted strings duration = 7550401

loopCounter = 1000

unsorted strings duration = 32994137

loopCounter = 499500

unsorted strings in HashSet duration = 954311

loopCounter = 1000

## Results with 10,000 elements:

Random numbers generated = 10000

sorted numbers duration = 15986179

loopCounter = 10000

unsorted numbers duration = 106388102

loopCounter = 49995000

unsorted numbers in HashSet duration = 21584447

loopCounter = 10000

sorted strings duration = 28177604

loopCounter = 10000

unsorted strings duration = 1442832050

loopCounter = 49995000

unsorted strings in HashSet duration = 2630711

loopCounter = 10000

## Results with 100,000 elements:

Random numbers generated = 100000

sorted numbers duration = 104342102

loopCounter = 100000

unsorted numbers duration = 6736310411

loopCounter = 704982704

unsorted numbers in HashSet duration = 59437652

loopCounter = 100000

sorted strings duration = 267270212

loopCounter = 100000

unsorted strings duration = 167903382832

loopCounter = 704982704

unsorted strings in HashSet duration = 28475825

loopCounter = 100000

## Here is the code:

/** * test the duration of different ways of finding duplicates in string and int arrays. */ public void duplicates(int randomNumberCount){ long startTime = 0; long endTime = 0; int loopCounter = 0; ArrayList duplicates = new ArrayList(); Random random = new Random(System.currentTimeMillis()); int[] randomNumbers = new int[randomNumberCount]; for(int count = 0; count < randomNumberCount; count++){ randomNumbers[count] = random.nextInt(randomNumberCount); } System.out.println("Random numbers generated = " + randomNumberCount); /******************************************************** * Find a duplicate int * sort the array, then check for sequential duplicates. ********************************************************/ int[] sortedNumbers = Arrays.copyOf(randomNumbers, randomNumberCount); startTime = System.nanoTime(); Arrays.sort(sortedNumbers); for (int i = 0; i < sortedNumbers.length; i++) { int testVal = sortedNumbers[i]; int tempPos = i + 1; boolean foundDuplicate = false; loopCounter++; while(tempPos < sortedNumbers.length && testVal == sortedNumbers[tempPos]){ foundDuplicate = true; duplicates.add("" + testVal); tempPos ++; loopCounter++; } if(foundDuplicate){ i = tempPos-1; } } endTime = System.nanoTime(); System.out.println("sorted numbers duration = " + (endTime - startTime)); System.out.println("loopCounter = " + loopCounter); loopCounter = 0; //System.out.println(duplicates.toString()); duplicates.clear(); /******************************************************** * Find a duplicate int * Use nested for loops to find duplicates. ********************************************************/ int[] unSortedNumbers = Arrays.copyOf(randomNumbers, randomNumberCount); startTime = System.nanoTime(); for (int i = 0; i < unSortedNumbers.length; i++) { boolean foundDuplicate = false; for (int j = i+1; j < unSortedNumbers.length; j++) { if(i != j && unSortedNumbers[i] == unSortedNumbers[j]){ foundDuplicate = true; } loopCounter++; } if(foundDuplicate){ duplicates.add("" + unSortedNumbers[i]); } } endTime = System.nanoTime(); System.out.println("unsorted numbers duration = " + (endTime - startTime)); System.out.println("loopCounter = " + loopCounter); loopCounter = 0; //System.out.println(duplicates.toString()); duplicates.clear(); /********************************************************************* * Find a duplicate int * Make each int a string, and add it to a HashSet to find duplicates *********************************************************************/ //use a hashset to find duplicate ints. HashSet numbersHashSet = new HashSet(); startTime = System.nanoTime(); for (int number : unSortedNumbers) { if(!numbersHashSet.add("" + number)){ duplicates.add("" + number); } loopCounter++; } endTime = System.nanoTime(); System.out.println("unsorted numbers in HashSet duration = " + (endTime - startTime)); System.out.println("loopCounter = " + loopCounter); loopCounter = 0; //System.out.println(duplicates.toString()); duplicates.clear(); /************************************************** * Make random strings out of numbers **************************************************/ String[] randomStrings = new String[randomNumberCount]; for(int i = 0; i < randomNumberCount; i++){ randomStrings[i] = "" + random.nextInt(randomNumberCount); } String[] sortedWords = Arrays.copyOf(randomStrings, randomNumberCount); /******************************************************** * Find a duplicate strings * sort the array, then check for sequential duplicates ********************************************************/ startTime = System.nanoTime(); Arrays.sort(sortedWords); for (int i = 0; i < sortedWords.length; i++) { String testVal = sortedWords[i]; int tempPos = i + 1; boolean foundDuplicate = false; loopCounter++; while(tempPos < sortedWords.length && testVal.equals(sortedWords[tempPos])){ foundDuplicate = true; tempPos ++; loopCounter++; } if(foundDuplicate){ duplicates.add(testVal); i = tempPos-1; } } endTime = System.nanoTime(); System.out.println("sorted strings duration = " + (endTime - startTime)); System.out.println("loopCounter = " + loopCounter); loopCounter = 0; //System.out.println(duplicates.toString()); duplicates.clear(); /******************************************************** * Find a duplicate int * Use nested for loops to search for duplicate strings. ********************************************************/ String[] unSortedWords = Arrays.copyOf(randomStrings, randomNumberCount); startTime = System.nanoTime(); for (int i = 0; i < unSortedWords.length; i++) { boolean foundDuplicate = false; for (int j = i+1; j < unSortedWords.length; j++) { if(i != j && unSortedWords[i].equals(unSortedWords[j])){ foundDuplicate = true; } loopCounter++; } if(foundDuplicate){ duplicates.add("" + unSortedWords[i]); } } endTime = System.nanoTime(); System.out.println("unsorted strings duration = " + (endTime - startTime)); System.out.println("loopCounter = " + loopCounter); loopCounter = 0; //System.out.println(duplicates.toString()); duplicates.clear(); /******************************************************** * Find a duplicate strings * add each string to a HashSet to check for duplicates. ********************************************************/ HashSet stringHashSet = new HashSet(); startTime = System.nanoTime(); for (String string : unSortedWords) { if(!stringHashSet.add(string)){ duplicates.add(string); } loopCounter++; } endTime = System.nanoTime(); System.out.println("unsorted strings in HashSet duration = " + (endTime - startTime)); System.out.println("loopCounter = " + loopCounter); loopCounter = 0; //System.out.println(duplicates.toString()); duplicates.clear(); }