Saturday, September 15, 2012

Averaging Multidimensional Data for D3

Building from the example in my previous tutorial, I am only going to slightly modify a few aspects of the graph. My goal for this tutorial is to show how to code the graph differently such that the 3 pieces of data per row are averaged together into 1 piece of data per row.

Here's the end result and its code:

http://thecodingwebsite.com/tutorials/d3/average/d3average.html

<html>

<head>

<script type="text/javascript" src="d3.v2.min.js"></script>

<script type="text/javascript">

 var graphHeight = 1800;
 
 var xOffset = 95;
 var yOffset = 15;
 var rightPadding = 50;
 var bottomPadding = 5;
 var width = window.innerWidth - xOffset - rightPadding;
 var verticalSpacing = 10;
 var height = graphHeight - yOffset - bottomPadding;
 
 var unfilteredData;
 
 var svg;
 var maxValue, hMultiplier, barHeight, barYMultiplier;
 
 window.onload = function()
 {
  //Create the SVG graph.
  svg = d3.select("body").append("svg").attr("width", "100%").attr("height", graphHeight);
  
  d3.csv("data.csv", function(d)
  {
   unfilteredData = new Array();
   
   d.forEach(function(r, i)
   {
    var nextRow = { "Site Type": r["Site Type"], "Media Type": r["Media Type"], "Data": (parseFloat(r["Data 1"]) + parseFloat(r["Data 2"]) + parseFloat(r["Data 3"])) / 3.0, "Index": i };
    
    unfilteredData.push(nextRow);
   });
   
   maxValue = d3.max(unfilteredData, function(d)
   {
    return d["Data"];
   });
   
   hMultiplier = width / maxValue;
   barHeight = height / unfilteredData.length - verticalSpacing;
   barYMultiplier = barHeight + verticalSpacing;
   
   refilter("all");
  });
 };
 
 function refilter(filterCategory)
 {  
  if (filterCategory == "all")
  {
   data = unfilteredData;
  }
  else
  {
   data = unfilteredData.filter(function(d)
   {
    return (d["Site Type"] == filterCategory);
   });
  }
  
  var firstIndex = data[0]["Index"];
  
  d3.select("#showing").text(filterCategory);
  
  
  
  svg.selectAll("g").remove();
  
  
  
  //Add data to the graph.
  var dataAdd = svg.selectAll("g").data(data);
  
  var dataEnter = dataAdd.enter().append("g");
  
  dataEnter.append("rect").attr("x", xOffset);
  dataEnter.append("text").attr("font-size", 10);
  
  dataAdd.selectAll("rect").attr("y", function(d)
  {
   return (d["Index"] - firstIndex) * barYMultiplier + yOffset;
  }).attr("height", barHeight).transition().duration(1000).attr("width", function(d)
  {
   return d["Data"] * hMultiplier + 5;
  }).attr("stroke", "gray").attr("fill", function(d)
  {
   if (d["Media Type"] == "Section 3")
   {
    return "blue";
   }
   else if (d["Media Type"] == "Section 5")
   {
    return "green";
   }
   else
   {
    return "black";
   }
  });
  
  dataAdd.selectAll("text").text(function(d)
  {
   return d["Site Type"] + ", " + d["Media Type"];
  }).attr("x", 0).attr("y", function(d)
  {
   return (d["Index"] - firstIndex) * barYMultiplier + yOffset + 6;
  }).attr("fill", function(d)
  {
   if (d["Media Type"] == "Section 3")
   {
    return "blue";
   }
   else if (d["Media Type"] == "Section 5")
   {
    return "green";
   }
   else
   {
    return "black";
   }
  });
 }

</script>

</head>

<body>

<a href="javascript:refilter('all');">All Categories</a>   <a href="javascript:refilter('Category 1');">Category 1</a>   <a href="javascript:refilter('Category 2');">Category 2</a>   <a href="javascript:refilter('Category 3');">Category 3</a>   <a href="javascript:refilter('Category 4');">Category 4</a>   <a href="javascript:refilter('Category 5');">Category 5</a>   <a href="javascript:refilter('Category 6');">Category 6</a>   

Currently showing: <span id="showing"></span>

</body>

</html>

By now you should know the drill: what did I change and why?

Well, a couple of the changes I made were simply done because of differences in what the end result will be. For example:

var graphHeight = 1800;

I changed the graph's height from 5000 to 1800, because there will only be 1/3 as many pieces of text and rectangles because every group of 3 pieces of data are going to be averaged together now.

I'll get to the less obvious ones later. For now, let's take a look at the actual process of averaging the data:

unfilteredData = new Array();
   
   d.forEach(function(r, i)
   {
    var nextRow = { "Site Type": r["Site Type"], "Media Type": r["Media Type"], "Data": (parseFloat(r["Data 1"]) + parseFloat(r["Data 2"]) + parseFloat(r["Data 3"])) / 3.0, "Index": i };
    
    unfilteredData.push(nextRow);
   });
   
   maxValue = d3.max(unfilteredData, function(d)
   {
    return d["Data"];
   });
   
   hMultiplier = width / maxValue;
   barHeight = height / unfilteredData.length - verticalSpacing;

Rather than simply setting unfilteredData equal to the data read in from the CSV file, I go through each row of data (so that I can access all 3 pieces of data at once).

I then make sure to parse (convert) each piece of data to a floating point (number with a decimal) variable type before adding them together. If I didn't then the 3 pieces of data would be added together as strings, e.g. "5", "6", and "7" would become "567" rather than "18". If you're unfamiliar with this concept then you should read this.

I make sure to reconstruct each row of data back into an array (nextRow) before adding it to the resulting multidimensional array (unfilteredData) that I will be using later to draw the graph from. This way, the "Site Type" and "Media Type" values can be retrieved later if so desired.

Several changes occur in what I have to do now with only one piece of data per "row". For one, you can see above that finding the maximum value is simplified a bit (only "Data" needs to be used rather than finding the maximum value of "Data 1", "Data 2", and "Data 3"). I also no longer have to multiply "unfilteredData.length" by 3 as there is only 1 piece of data per row, etc...


Other than that, there was only 1 other problem I had to address: I'm not sure why, but for some odd reason D3's indexing capabilities weren't working properly after this switch. What I'm referring to is when I would have e.g. "function(d, i)", the index ("i") wasn't working. JavaScript can be a bit quirky at times, and although I'm sure that there is a very good reason for this particular behavior, I don't have the time to figure out what it is and I don't think you do either.

So, I made a workaround method. You may have noticed that in addition to "Site Type", "Media Type", and "Data", I also set "Index" equal to "i" (the index value of the forEach function). Now I can tell which row of data I'm working with!


For example, further below I have this code:

.attr("x", 0).attr("y", function(d)
  {
   return (d["Index"] - firstIndex) * barYMultiplier + yOffset + 6;
  })

However, you're probably wondering why I subtract "firstIndex" and what "firstIndex" even is. Good for you! :)

A problem occurs when a category is chosen other than the "all" category or the 1st category: all of the rectangles and text are placed lower on the page because of my "Index" solution. We want them moved up. In order to do that, I simply store the index of first piece of data in my filtered D3 selection beforehand:

var firstIndex = data[0]["Index"];

and use that to negatively offset every index so the graph is moved back up to the top.


There you have it! Now you can average your data together before the graph is even drawn for the first time, and you can fix weird indexing problems! :D

No comments:

Post a Comment