Thursday, June 3, 2010

Unique elements

Take an array and return same array with only unique elements in it in o(n) .


  1. Hold a dictionary from element -> boolean
    Go over the array.
    Mark each element in the dictionary as true if this is its first appearance.
    Mark it as false if non-unique (i.e., element already existed in the dictionary)
    Go over the array again and remove each element with false in the dictionary.

  2. Instead of boolean, you can hold the index of the first location of this element.
    If you find an element that is already in the dictionary, remove it and remove the element in the stored location and mark it non-unique element. Next time you find this element only remove the one you just found.

  3. Dictionaries usually have O(log n) performance...

  4. If you assume 'n' to be asymptotically larger than the number of digits present in the largest number in the array, one could potentially use radix sort to sort these numbers in O(kn) time where k is number of digits in the largest number. It is then O(n) to remove duplicates by walking the sorted array; thus overall running time is still O(n)? Would that be an accceptable solution?

  5. Employ a hash table with 2n slots and use linear (or quadratic) probing for colission resolution.

  6. Is removing the element of an array O(1)? You can also use a HashSet instead of a Map and go through the array just once, adding the elements into the hashset or removing them from the array if they already exist in the HashSet.

  7. How about using a bloom filter (for very large n)? On one parse, remove already hashed elements. space complexity is O(1), the size of the filter.

  8. hashset = new HashSet
    array foreach {
    insert into hashset _
    return hashset

  9. Some assumptions:

    1) if the input array is sorted, then you can solve the problem with a single pass, skipping duplicated that are in sequence

    2) if the elements are between 0 and k, take an array C, and fill it as follows: C[I[n]] = C[I[n]]]++, where I is the input array and C is initialized to 0s.

    C will have k slots.

    Now, for each value in I you have a counting of the number of times it appears, so you could delete (set to null) the slots in A having values such that C[A[i]] > 1 (i : 0 to size(A)).

    This solution is similar to the one proposed by Danny, where the dictionary is the array C and the hash function is given by A[i] (with no costs to be computed).

  10. In the case that the values of the input array are < size of array, another solution could be:

    for (int i = 0; i < a.length; i++) {

    if (a[i] == -1) continue;
    if (i == a[i]) continue;

    if (a[i] == a[a[i]]) {
    a[i] = -1;
    else {
    swap(i, a[i]);
    i = i - 1;