Using Javascript to find values which are not common to two or more datasets.

There are tons of options for finding duplicates and for finding unique values; even off the shelf products like Microsoft Excel can do this for you.

I am not trying to do either of those tasks. I am trying to find values which are not common across two or more arrays or collections. For example in the following data sets …

a = [1, 2, 3, 4, 5];

b = [2, 3, 4, 5, 6, 7];

c = [4, 5, 6, 7, 8, 9];

… I am trying to find the values 1, 8 and 9 (as they appear in the data but are not common to two or more data sets.

Or in another example, I am trying to find values not common across exactly two datasets. For example in the following 2 data sets …

b = [2, 3, 4, 5, 6, 7];

c = [4, 5, 6, 7, 8, 9];

… I am trying to find the values 2, 3, 8 and 9.

Note: I am not trying to find the values which are common to all datasets eg 2, 3, 4, 5, 6, 7, 8, 9 and am not trying to find a distinct list of values eg 2, 3, 4, 5, 6, 7, 8, 9. Don’t care about duplicates and so forth. I am only trying to find rogue values which are not common across two or more data sets.

Let’s create an collection of the data sets …

abc = [a, b, c];

… and send Javascript/ JQuery to the rescue …

jQuery.each(abc, function(k, v){
    for (i = 0; i < v.length; i++){
    console.log(v[i]);
    }
});

… this produces 1,2, 3, 4, 5, 2, 3, 4, 5, 6, 7, 4, 5, 6, 7, 8, 9 (linearly printing the values of each dataset).

You might notice that I used some traditional Javascript for the inner for loop. This was so that you could differentiate between the different loops (inner and outer). Another way to write this (using Jquery for both loops) would be …

jQuery.each(abc, function(k, v){
    jQuery.each(abc[k], function(k2, v2){
        console.log(v2);
    });
});

… obviously this also produces 1,2, 3, 4, 5, 2, 3, 4, 5, 6, 7, 4, 5, 6, 7, 8, 9 (linearly printing the values of each dataset).

At this point all we have to do is test if values are in the arrays. JQuery has a great feature called “inArray” eg

jQuery.inArray(whatToLookFor, whereToLook);

Here is an implementation of inArray which produces the right output …

unique = [];
jQuery.each(abc, function(k, v){
    jQuery.each(abc[k], function(k2, v2){
    found = [];
    console.log(v2);
    for (i = 0; i < abc.length; i++){
        if (jQuery.inArray(v2, abc[i]) !== -1){
        console.log(v2 + " found in array " + i);
        found.push(v2);
        }
    }
    if (found.length === 1){
        unique.push(v2);
    }
    });
});

… [1,8, 9]

Now, if you look closely you will see that we are writing to the unique collection variable in the event that the found array equals 1 (in other words if the value we are on is only found once in the entire data set then that value is unique and not common across all data sets). This way of searching is inefficient because in the event we have a unique value we are looking through the entire data set to prove that it is in there once (of course each value is in the data sets at least once).

Improving efficiency would involve stepping over the entire data set which houses the particular number we are on[processing|searching for]. This is because we already know it exists (we are processing it … so of course it exists).

This new and improved way only looks across all other data sets (other than the one which houses the value we are processing) in search for a mate. If it does not find a match then the value is a loner a rogue value which is not common to two or more data sets.

unique = [];
jQuery.each(abc, function(k, v){
     jQuery.each(abc[k], function(k2, v2){
     found = [];
     console.log(v2);
     for (i = 0; i < abc.length; i++){
         if (k !== i){
             if (jQuery.inArray(v2, abc[i]) !== -1){
             console.log(v2 + " found in array " + i);
             found.push(v2);
             }
         }
     }
     if (found.length === 0){
         unique.push(v2);
     }
     });
});

Result: [1,8, 9]

Please comment if you:

  • Can make this code more efficient
  • Can point to a pre-written Javascript library which does this same task
  • Have any questions
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s