How to Visualize Data Sets With D3.js and Crossfilter – Line Chart

D3.js is a Javascript library for data-driven DOM manipulation. I know that is a generic sentence, but when you look at the sheer amount of examples in their repository, it starts to make sense. I’m sure most of you reading this article already heard about D3.js. So what’s the point of this article? It’s intended for people who never used D3.js and wanted to create something interesting. Throughout the article, I’m assuming a basic knowledge of HTML, CSS, Javascript, and Web Development. If you are just interested in the demo or you want to know if this article is worth it, check it out here.

What we need to do before we start this d3.js demo?

Before we can visualize anything, we need some data to visualize. For this example, I picked the Olympic history of athletes dataset from kaggle.com. Dataset is a .csv file about ~41MB in size, which could be too big for the browser to handle. But in my opinion, it’s better to start with some interesting data, and not worry about the size of the dataset. That way when you see the end result you can feel more of an accomplishment, and then optimize accordingly. Premature optimization is the root of all evil, as they say. So what do we actually need to start? You honestly need only one .html file, one .css file, and one .js file. But to speed up the development process, I’ve used Webpack to bundle .js files and parse .scss files to .css. In the end, it doesn’t matter that much in this example, this is just my personal preference, you can poke around my Webpack setup in this example repository. If you don’t want to use any of that, you can simply at the bottom of your .html file paste the following code.

<script src="https://d3js.org/d3.v5.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/crossfilter2/1.4.6/crossfilter.js"></script>
<script src="dist/main.js"></script>

Where the first script is a D3.js library, and the second script is Crossfilter library which we are using for filtering data. Crossfilter is definitely not necessary here, but it provides a lot of functionalities that help us get to visualizations quicker, once you get the hang of it. Last is our script we are going to write through this tutorial. I won’t write CSS I used here since that would be too detailed. You can always download my dist/main.css from the repository, or create your own .css file for this example. As I’ve said in the intro of the post, I’ve assumed that person reading this post has a basic understanding of Web development, which means you have already created the index.html file.

Let’s start visualizing using d3.js

Before we start anything with data we need to create a basic HTML structure that will hold all of our charts. For this example, you can just copy the following code into your index.html file, between body tags.

<div class="c-charts">
    <div class="c-chart" id="yearAthletesCountChart">
        <h3>Number of athletes per year (seasons combined)</h3>

    </div>
    <div class="c-chart" id="weightHeightChart">

    </div>
    <div class="c-chart" id="ageYearChart">

    </div>
    <div class="c-chart c-chart--half" id="medalsPieChart">

    </div>
    <div class="c-chart c-chart--half" id="medalsByCountryPieChart">

    </div>
</div>

This will serve as containers for charts that we will create with D3.js. But they are empty for now. So how do we fill them? Well, the first step is to get the actual data from the .csv file into our code. In index.js file, copy this code.

// assets/js/index.js
(function () {
  d3.dsv(',', '/assets/data/athlete_events.csv', function (row) {
    return {
      name: row['Name'],
      sex: row['Sex'],
      age: row['Age'] !== 'NA' ? +row['Age'] : null,
      height: row['Height'] !== 'NA' ? +row['Height'] : null,
      weight: row['Weight'] !== 'NA' ? +row['Weight'] : null,
      team: row['Team'],
      games: row['Games'],
      year: +row['Year'],
      season: row['Season'],
      sport: row['Sport'],
      event: row['Event'],
      medal: row['Medal']
    };
  }).then(function (athletes) {
    let dashboard = new DashboardComponent(athletes);
    dashboard.init();
  });
})();

So what does this code do? d3.dsv() loads the data from the .csv file that is located on the path in the second parameter. First parameter is delimiter of .csv file. The last parameter is a callback which receives the raw parsed row of a .csv file as the object, and the return value is parsed JSON object we will use throughout our code. As you might have noticed, there are plus operators before some of the row attributes. Thing is, when D3.js parses .csv file, everything inside of it is a string. So if we want to transform numbers saved as strings to Javascript Number, we use plus operator, which casts it to Number. For height and weight, we added an extra check for athletes which don’t have that data.

After we load every row into one big array, we pass it off into what I call DashboardComponent. If you are not using Webpack you can simply create a function which gets an array of data as the parameter.

Let make simple line chart

What is a line chart? We have two axes, one vertical, one horizontal, and between them there is a line that connects all the points. Really simple. Before we create chart we need to filter the data. When I say filter, I mean tie number of athletes to year. For that, we are going to use Crossfilter, which is really helpful in filtering and group data in large arrays. But just to be clear, Crossfilter is needed only because we want to get to visualizations quickly, you can do all this with normal array filtering, or however you choose.

So how do we filter array of athletes with Crossfilter so we can have years on one axis, and a number of athletes in that year?

First, we create DashboardComponent that in the beginning looks like this:

// assets/js/DashboardComponent.js
export default class DashboardComponet {
  constructor (athletes) {
    // We don't store athletes
    this.athletes = crossfilter(athletes);
  }

  init () {
    this.initYearAndAthletesCountChart();
  }

  initYearAndAthletesCountChart () {
    let numberOfAthletesPerYearChart = new NumberOfAthletesPerYearChart(this.athletes);
    numberOfAthletesPerYearChart.render();
  }
}

DashboardComponent doesn’t do anything by itself, except creating a new object for NumberOfAthletesPerYearChart, and calls the method to render the chart. So the next step is to create NumberOfAthletesPerYearChart. Since we are passing the whole Crossfilter array to the component, it makes sense to create class and constructor first.

// assets/js/charts/NumberOfAthletesPerYearChart.js
export default class NumberOfAthletesPerYearChart {
  constructor (athletes) {
    this.athletesDimension = athletes.dimension(function (athlete) {
      return athlete.year + athlete.season;
    });
    this.createGroupFromDimension();
    this.chartContainer = d3.select('#yearAthletesCountChart');
    this.chart = null; // This will hold chart SVG Dom element reference
    this.chartWidth = 960; // Width in pixels
    this.chartHeight = 400; // Height in pixels
    this.margin = 50; // Margin in pixels
    this.chartHeightWithoutMargin = this.chartHeight - this.margin;
    this.countScale = null;
    this.yearScale = null;
    this.tooltipContainer = null;
  }

  createGroupFromDimension() {
    // ... TODO
  }
}

We aren’t saving Crossfilter array anywhere concretely, but we are first creating dimension which we are going to use to aggregate our data. The dimension we are using is the description of the Olympic Games, which consists of year, and season (summer or winter). Dimension just means that we are selecting one attribute (either the actual attribute or derived) to be used as the basis for filtering and grouping of data. The second line in the constructor is creating of the group based on dimension, which we will cover a little bit later. After that, we select the chart container with D3, which is just a DOM element from index.html. We leave a few properties defined to be null, which we will fill in later as we go through the example. We are explicitly setting chartWidth, chartHeight, margin and chartHeightWithoutMargin, so we have those settings defined in one place. Usually, you can set up those values to be calculated dynamically from available viewport before the rendering of chart starts, but we’ll ignore that part, for now, so we focus only on building our Dashboard.

Currently, every entry in our dimension has this JSON structure:

{
  age: 28,
  event: "Fencing Women's epee, Team",
  games: "2016 Summer",
  height: 168,
  medal: "Bronze",
  name: "Olga Aleksandrovna Kochneva",
  season: "Summer",
  sex: "F",
  sport: "Fencing",
  team: "Russia",
  weight: 58,
  year: 2016
}

So what is that group we were talking about above? Basically, the whole point of Crossfilter is to enable easier grouping and filtering of data. We want to get a smaller array of grouped data where array element key is the name of the Olympic Games in question, and value of that array element is the number of athletes competing in the event. Which looks like this.

{
  key: "1988Summer",
  value: {
    count: 12037,
    season: "Summer",
    year: 1988
  }
}

Which brings the array of thousands and thousands of entries to just 51 elements which we will use to display data in our chart. How do we shrink that array to a group? Well, we could simply use:

// Easier option, but not what we want
this.numberOfAthletesPerYearGroup = this.athletesDimension.group().reduceCount();

Which will return us JSON object for each new entry, that looks like this:

{
  key: "1988Summer",
  value: 12037
}

But that is not what we want. Built-in reduceCount() function only returns count, which we can effectively use, but we need year and season in the value as well in the grouped result. So we need to create custom grouping logic in our createGroupFromDimension() method.

  // assets/js/charts/NumberOfAthletesPerYearChart.js
  createGroupFromDimension () {
    this.numberOfAthletesPerYearGroup = this.athletesDimension.group()
      .reduce(
        // reduceAdd()
        (output, input) => {
          output.count++;
          output.year = input.year;
          output.season = input.season;
          return output;
        },
        // reduceRemove()
        (output, input) => {
          --output.count;
          output.year = input.year;
          output.season = input.season;
          return output;
        },
        // reduceInitial()
        () => {
          return {year: null, season: null, count: 0};
        }
      )
      .order(function (p) {
        return p.count;
      });
  }

To create the group, first, we call group() method on our dimension, which creates the group. After that, we need to provide a way to reduce the bigger array to a smaller array of groups. For that we use reduce() method, where we send three callbacks, first is reduceAdd(), or what happens when an array element is added to the group. The second is reduceRemove(), which controls what happens when we remove something from group array. Last callback is reduceInitial(), which creates an empty group element when a new group is needed. Last chain call on a group is order() method, where we order groups by count, descending by default. After we create the group, we can always console log, either on dimension or group, method top(Infinity), to see resulting array. So our final group array will look like this:

[
  {
    "key": "2000Summer",
    "value": {
      "year": 2000,
      "season": "Summer",
      "count": 13821
    }
  },
  {
    "key": "1996Summer",
    "value": {
      "year": 1996,
      "season": "Summer",
      "count": 13780
    }
  },
  // ... The rest of elements
]

So, for now, we have explained what does constructor of NumberOfAthletesPerYearChart does. If we remember, after that we call render() method on the component. It’s very simple, and self-describing, so we will just continue to explain each of the 5 methods inside render() method.

// assets/js/charts/NumberOfAthletesPerYearChart.js
render () {
  this.createSvg();
  this.initScales();
  this.drawAxes();
  this.drawLine();
  this.drawPoints();
}

So to draw a chart on the screen, we first need to create a SVG DOM element which will hold the chart.

  // assets/js/charts/NumberOfAthletesPerYearChart.js
  createSvg () {
    this.chart = this.chartContainer
      .append('svg')
      .attr('width', this.chartWidth)
      .attr('height', this.chartHeight);
  }

Since we already have chartContainer reference, we simply use D3 method append('svg') to add SVG DOM element to chartContainer, and then set attributes of that SVG, based on values we defined in the constructor of NumberOfAthletesPerYearChart. SVG will for now be empty, but soon we will fill it with chart elements. Next thing we want to draw on screen are axes of our chart, which will contain years on the horizontal axis, and number of athletes on the vertical axis. Before we do that, we will set up one of the best features of D3 – scales. A D3 scale is a function which allows as to translate values from domain range to some other arbitrary range. In our use case, that range will be the number of pixels available for the chart. So that way we have one simple way to always determine where on our chart, for example, the year 1994 belongs.

  // assets/js/charts/NumberOfAthletesPerYearChart.js
  initScales () {
    // TODO potentially unsafe, if top() returns []
    let maxCount = this.numberOfAthletesPerYearGroup.top(1)[0];
    let chartWidth = +this.chart.attr('width') - this.margin;
    let chartHeight = +this.chart.attr('height') - this.margin;

    this.countScale = d3.scaleLinear().domain([0, maxCount.value.count]).range([chartHeight, this.margin]);
    // TODO We are hardcoding years for now
    this.yearScale = d3.scaleLinear().domain([1896, 2018]).range([this.margin, chartWidth]);
  }

To create scale, first, we need to get minimum and maximum values from our domain, which is contained in our data, and minimum and maximum values for our range, which in this case are the minimum and maximum number of pixels in which we can draw inside the chart. So for the vertical axis or number of athletes axis, we know the minimum value is 0. Since group array is sorted descending by the number of athletes, we just need to get the first element in that array, and it will hold maximum count. Chart width and chart height are our range values, but we subtract margin from them, so our chart has more space close to the edges, so axes are shown in full. After we get through this example, you can play around with the changing value of margin, and see how it affects the chart layout. We create number of athletes scale with d3.scaleLinear().domain().range() method chain. For domain, we pass 0 and maximum count of athletes, and for the range, we pass chart height and this margin. It’s important to note that the first element of range array is height, which is a bigger number, since (0,0) in the D3 coordinate system is in top left corner of SVG. The second element is the margin of our chart, so axis, when it’s drawn, it’s drawn from bottom left corner to top left corner. You can always play around with switching the order of those two values to see how it effects drawing of the chart. For the year scale, in the domain, we use already known values, since it’s easy to know in what year did the Olympic Games start. For range, we set the scale to start at left margin, and end at chartWidth variable, which we calculated few lines before. Now we can get to the actual drawing of axes to chart.

  
  // assets/js/charts/NumberOfAthletesPerYearChart.js
  drawAxes () {
    let countAxis = d3.axisLeft(this.countScale);
    let yearAxis = d3.axisBottom(this.yearScale);

    this.chart
      .append('g')
      .attr('class', 'c-axis')
      .attr('transform', 'translate(' + this.margin + ', 0)')
      .call(countAxis);

    this.chart
      .append('g')
      .attr('class', 'c-axis')
      .attr('transform', 'translate(0, ' + this.chartHeightWithoutMargin + ')')
      .call(yearAxis);
  }

In first two lines of method, we create both D3 axes by using d3.axisLeft() and d3.axisBottom() methods. The only difference between them is orientation. Since where the axis will be drawn is decided by attributes we set later, orientation means if axis line will be vertical or horizontal and whether ticks and text will be drawn before or after the line. For more about customization of axes generator, you can read more here. To actually draw generated axes on our chart, first, we use already known append() function of our chart, and append group element. You don’t need to append group, but it’s helpful to organize your elements in groups. Especially if your chart starts to get really complicated, it could save you a lot of trouble later. So to both of those groups, we added we add the class for styling, and transform attribute, which sets where the axis will be drawn. And last we call the axis on that group so we actually draw axis inside that group. And finally, after all this writing of mine, we finally get to the meat, which is we get something drawn on the screen. It could look something like this (depending on the .css file you are using). It’s not much yet, but it’s something to start with.

What’s left to do? Our chart line and our chart points. Let’s start with the chart line.

  
// assets/js/charts/NumberOfAthletesPerYearChart.js
  drawLine () {
    let line = d3.line()
      .x((d) => {
        return this.yearScale(d.value.year);
      })
      .y((d) => {
        return this.countScale(d.value.count);
      });

    this.numberOfAthletesPerYearGroup.order((d) => {
      return d.year;
    });

    this.chart
      .append('g')
      .attr('class', 'c-line')
      .append('path')
      .attr('d', line(this.numberOfAthletesPerYearGroup.top(Infinity)));
  }

First, we create d3.line(), which is simply a definition of a function that will take input data and assign x & y values to each element in an array. The good thing about d3.line() is that it can provide various levels of interpolation between each data point. Which we will not use in this example, but it’s always available for customization. After that we order the group on year attribute, so we don’t have the line that goes nicely from left to right. And lastly, we append another group to chart and append the path element to it. For the path to show anything we need to set its d attribute, which is the result of our line function. We just pass our group data to the line, and then we can execute code to see the result.

Let’s get to drawing the points.

// assets/js/charts/NumberOfAthletesPerYearChart.js
this.chart
      .append('g')
      .attr('class', 'c-points')
      .selectAll('circle')
      .data(this.numberOfAthletesPerYearGroup.top(Infinity))
      .enter()
      .append('circle')
      .attr('cx', (d) => {
        return this.yearScale(d.value.year);
      })
      .attr('cy', (d) => {
        return this.countScale(d.value.count);
      })
      .attr('r', '5');

First, we append another group to SVG, and then we select all circle elements inside it. After that, we use d3 method chain data().enter() where we send an array of our data and execute code after enter() N times, where N is the length of the array we provided. In this example, N = 51, so that means we will append 51 circles, where their cx and cx positions will be calculated from the scale we defined earlier. For each circle, we set r or radius to 5 pixels. When we execute this code we will get a chart that looks like this:

We finally have our chart. Maybe it looks different in your browser if you didn’t use my .css file, but looks like a chart. We are almost done with our first example. But one thing is still missing, we have points but we don’t know what those points mean, it would help a lot if we had some tooltip that shows us what each point means. So for that, we can use the following code:

  drawPoints () {
    // ... Part before this line stays the same
      .attr('r', '5')

      .on('mouseover', (d) => {
        this.showTooltip(
          d.value.year + ' ' + d.value.season + ': ' + d.value.count,
          d3.event.pageX,
          d3.event.pageY
        );
      })
      .on('mouseout', (d) => {
        this.hideTooltip();
      });
  }

  createTooltipIfDoesntExist () {
    if (this.tooltipContainer !== null) {
      return;
    }

    this.tooltipContainer = this.chartContainer
      .append('div')
      .attr('class', 'c-tooltip');
  }

  showTooltip (content, left, top) {
    this.createTooltipIfDoesntExist();

    this.tooltipContainer
      .html(content)
      .style('left', left + 'px')
      .style('top', top + 'px');

    this.tooltipContainer
      .transition()
      .duration(200)
      .style('opacity', 1);
  }

  hideTooltip () {
    this.createTooltipIfDoesntExist();

    this.tooltipContainer
      .transition()
      .duration(500)
      .style('opacity', 0);
  }

First, we need to add mouseoverandmouseout event handlers to each point we added to the chart. In mouseover we need to show the tooltip, for which we are using showTooltip() method, where we send content as a first argument or text that will be displayed inside the tooltip. Second two attributes are left and top positions of the tooltip, which are CSS attributes. Which means for this implementation to be shown as it should, tooltip container should have position: absolute; CSS style. I’m aware this maybe isn’t the best practice because it combines presentation and logic inside a script that generates the tooltip, but I hope you all will find strength in your heart to forgive me for that sin. Left and top position values are calculated from the d3.event object, which is a special object that contains event data, and should only be used inside event handlers since that’s the only time it holds data. If you noticed, in both showTooltip() and hideTooltip() there is a call to createTooltipIfDoesntExist(), which simply checks if tooltipContainer exists, and if not, it creates it for use.

And that’s it, we have fully functioning line chart. When I started writing this, I thought this would be a small post about creating a line chart. Along the way, I added a bigger dataset, and then I went into the Crossfilter territory when grouping large data sets. I hope now you have at least basic idea how to visually show grouped data of ~271k rows of data. You can always check out the demo to see how it works in action. Creating the rest of dashboard, and adding interactivity, I will cover in future posts. Stay tuned.

Useful links

Dashboard Demo

Official D3.js reference

Crossfilter API Reference

Github repository

Either you have your own idea and you want to consult with our experts or you just want to share your opinion, feel free to Contact Us
Marko Zovko
Marko Zovko
Yo 'bey, lesson here... If you come at the king you best not miss..
Lets's do some work
+1
Share
Tweet
Share
Pin