Thursday, September 27, 2012

Averaging Multidimensional Data for D3 Part 2

In my previous tutorial, I showed you conceptually how to average the data such that the 3 pieces of data in each row are averaged together and only 1 bar and text is shown per row for each average. This time I'm actually going to backtrack to the code I was using in the tutorial before that so that I can show you another way of averaging the data.

What we're going to average this time are not the 3 pieces of data per row (we're going to leave those alone this time); instead, we're going to average all of the elements each with a unique "Site Type" and "Media Type" value together.


The best way that I've determined to do this is to [sum each row cumulatively while [resetting the sums every time there is a separation] and then [calculate the average and store the result as a single row]] separated by differences in either the "Site Type" and/or "Media Type" values.

Hopefully that makes sense to you... If so, then the code will make perfect sense - if not, then hopefully the code will clear things up. Here's our final graph so you know what it looks like beforehand:

http://thecodingwebsite.com/tutorials/d3/average/d3average2.html

Since 2 tutorials ago (where I am taking the unchanged code from), I have only changed a bit of code within the "d3.csv" call and nothing more. So, I am only going to show that section of code:

  d3.csv("data.csv", function(d)
  {
   unfilteredData = new Array();
   
   var nextRow = { "Site Type": "", "Media Type": "", "Sum 1": 0, "Sum 2": 0, "Sum 3": 0, "Count": 0 };
   
   d.forEach(function(r, i)
   {
    if (r["Site Type"] == nextRow["Site Type"] && r["Media Type"] == nextRow["Media Type"])
    {
     //Increase the count for the number of rows of data for this graphical row.     
     ++nextRow["Count"];
     
     //Add this row's data to the graphical row's sum.
     nextRow["Sum 1"] = parseFloat(nextRow["Sum 1"]) + parseFloat(r["Data 1"]);
     nextRow["Sum 2"] = parseFloat(nextRow["Sum 2"]) + parseFloat(r["Data 2"]);
     nextRow["Sum 3"] = parseFloat(nextRow["Sum 3"]) + parseFloat(r["Data 3"]);
    }
    else
    {
     if (nextRow["Site Type"] != "")
     {
      var count = parseInt(nextRow["Count"]);
      
      //Calculate the average.
      nextRow["Data 1"] = parseFloat(nextRow["Sum 1"]) / count;
      nextRow["Data 2"] = parseFloat(nextRow["Sum 2"]) / count;
      nextRow["Data 3"] = parseFloat(nextRow["Sum 3"]) / count;
      
      unfilteredData.push(nextRow);
      
      //Create a new array for the next graphical row of data.
      nextRow = { "Site Type": "", "Media Type": "", "Sum 1": 0, "Sum 2": 0, "Sum 3": 0, "Count": 0 };
     }
     
     //Set up the starting information for this graphical row (lacking
     //any cumulative sum additions and counts added in the future).
     nextRow["Site Type"] = r["Site Type"];
     nextRow["Media Type"] = r["Media Type"];
     nextRow["Sum 1"] = r["Data 1"];
     nextRow["Sum 2"] = r["Data 2"];
     nextRow["Sum 3"] = r["Data 3"];
     nextRow["Count"] = 1;
    }
   });
   
   if (nextRow["Data 1"] == undefined)
   {
    var count = parseInt(nextRow["Count"]);
    
    //Calculate the average.
    nextRow["Data 1"] = parseFloat(nextRow["Sum 1"]) / count;
    nextRow["Data 2"] = parseFloat(nextRow["Sum 2"]) / count;
    nextRow["Data 3"] = parseFloat(nextRow["Sum 3"]) / count;
    
    unfilteredData.push(nextRow);
   }
   
   maxValue = d3.max(unfilteredData, function(d)
   {
    return Math.max(d["Data 1"], d["Data 2"], d["Data 3"]);
   });
   
   hMultiplier = width / maxValue;
   barHeight = height / (unfilteredData.length * 3) - verticalSpacing;
   barYMultiplier = barHeight + verticalSpacing;
   
   refilter("all");
  });

Just like I did in the previous averaging tutorial, I'm running a "forEach" loop on the passed data array so that I can run some code on each row of data:

   unfilteredData = new Array();
   
   var nextRow = { "Site Type": "", "Media Type": "", "Sum 1": 0, "Sum 2": 0, "Sum 3": 0, "Count": 0 };
   
   d.forEach(function(r, i)
   {
       //* Note: There is code inside here! *
   });

That's pretty self explanatory. Let's take a look at what I did inside the "forEach" loop.

I check for one condition: whether or not the "Site Type" and "Media Type" values both match that of the "cumulative graphical row" ("nextRow"). When it does:

    if (r["Site Type"] == nextRow["Site Type"] && r["Media Type"] == nextRow["Media Type"])
    {
     //Increase the count for the number of rows of data for this graphical row.     
     ++nextRow["Count"];
     
     //Add this row's data to the graphical row's sum.
     nextRow["Sum 1"] = parseFloat(nextRow["Sum 1"]) + parseFloat(r["Data 1"]);
     nextRow["Sum 2"] = parseFloat(nextRow["Sum 2"]) + parseFloat(r["Data 2"]);
     nextRow["Sum 3"] = parseFloat(nextRow["Sum 3"]) + parseFloat(r["Data 3"]);
    }

I increase the stored count of how many rows of data make up the next graphical row and add the current row's 3 data values to the 3 sums of the current graphical row. Note: I use "parseFloat" here just like I did in the previous tutorial to make sure that the values are of a float type rather than a string type when summing up the 3 data values.

Right now, all I'm doing is keeping track of the number of rows and the sum of all of their 3 data values. The actual averaging (dividing each sum by the count) will occur later.

Next, when either the "Site Type" and/or "Media Type" don't match, I run this code:

    else
    {
     if (nextRow["Site Type"] != "")
     {
      var count = parseInt(nextRow["Count"]);
      
      //Calculate the average.
      nextRow["Data 1"] = parseFloat(nextRow["Sum 1"]) / count;
      nextRow["Data 2"] = parseFloat(nextRow["Sum 2"]) / count;
      nextRow["Data 3"] = parseFloat(nextRow["Sum 3"]) / count;
      
      unfilteredData.push(nextRow);
      
      //Create a new array for the next graphical row of data.
      nextRow = { "Site Type": "", "Media Type": "", "Sum 1": 0, "Sum 2": 0, "Sum 3": 0, "Count": 0 };
     }
     
     //Set up the starting information for this graphical row (lacking
     //any cumulative sum additions and counts added in the future).
     nextRow["Site Type"] = r["Site Type"];
     nextRow["Media Type"] = r["Media Type"];
     nextRow["Sum 1"] = r["Data 1"];
     nextRow["Sum 2"] = r["Data 2"];
     nextRow["Sum 3"] = r["Data 3"];
     nextRow["Count"] = 1;
    }

First, I check to see if the "Site Type" value is empty - this is to make sure that I don't add the first value of "nextRow" (which is empty because it hasn't been filled with any data yet) to the "unfilteredData" array.

If it's not the first row of data, I calculate the average (using parseInt and parseFloat again!) and store the result in the 3 data properties of the row. Then I empty out the "nextRow" value for the next piece of data.

Regardless of whether or not it's the first row of data, I then store the current row of data in the "nextRow" array for later accumulation or for adding it to the "unfilteredData" array on its own in the near future (if it's the only row of data with that unique "Site Type" and/or "Media Type", etc.).

There's one final piece of code that I added:

   if (nextRow["Data 1"] == undefined)
   {
    var count = parseInt(nextRow["Count"]);
    
    //Calculate the average.
    nextRow["Data 1"] = parseFloat(nextRow["Sum 1"]) / count;
    nextRow["Data 2"] = parseFloat(nextRow["Sum 2"]) / count;
    nextRow["Data 3"] = parseFloat(nextRow["Sum 3"]) / count;
    
    unfilteredData.push(nextRow);
   }

This code is run after the "forEach" loop is run. It checks to see if the average hasn't been calculated yet (nothing would be stored for the "Data 1" value in result). If this is true, which I believe it always should be, then the last piece of data will finally have its average calculated and it will be added to the "unfilteredData" array as well.

Why did I have to do this? Inside the "forEach" loop, I only check for differences in "Site Type" and/or "Media Type" values (as I have undoubtedly already reiterated too many times already in this tutorial :) and only then do I add the previous cumulative graphical row. This means that the last row still needs to be added, so that last section of code takes care of the problem entirely.


Now you know of a second type of averaging of data that involves cumulative math, and you should be able to predict, detect, and avoid first/last-element-type problems when dealing with this kind of coding. Good luck! ;)

No comments:

Post a Comment