-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Two charts on the same dimension will not filter each other. More precisely, a group will not observe filters on its own dimension. This is the design of crossfilter:
Note: a grouping intersects the crossfilter's current filters, except for the associated dimension's filter. Thus, group methods consider only records that satisfy every filter except this dimension's filter. So, if the crossfilter of payments is filtered by type and total, then group by total only observes the filter by type.
https://github.com/square/crossfilter/wiki/API-Reference#dimension_group
The assumption is that you don't want to remove data from the current chart when you filter within it. (Instead, dc.js will draw filtered-out data in grey on the filtering chart, but it's still there.)
If you want to have two charts tracking the same data and filtering each other, create a duplicate dimension and give each chart its own dimension and group. Except with range/focus charts, you almost always want each chart to have its own dimension, with its group created from the dimension.
Although most dc.js methods chain, some do not chain to the same object. xAxis
returns a d3 axis object which is not the chart. If you access the axis objects of a chart, do it last or do it in a separate line:
var chart = dc.barChart(...).this(...).that(...);
var xAxis = chart.xAxis().tickFormat(...).ticks(...);
var yAxis = chart.yAxis().tickFormat(...).ticks(...);
There really is no magic - when you filter a chart, it sets the filter on the corresponding dimension object. Then the chart broadcasts a redraw message to the other charts in the chart group using the Chart Registry. Then all charts in the chart group pull new data from their crossfilter groups and animate from the old data to the new data.
(These are two separate meanings of the word "group". A crossfilter group is really a grouping or binning of the data. A chart group is a set of charts that respond to each other. Usually a chart group is associated with a crossfilter instance and a dataset.)
In almost all cases, a dimension is write-only and a group is read-only. The only exception in standard dc.js usage is that the data chart pulls ungrouped data directly from its dimension.
As far as crossfilter is concerned, a bin still exists if its value is zero - and dc.js will happily draw empty bins. See remove empty bins for a "fake group" that will remove these bins dynamically, causing the domain to get smaller.
dc.js uses Crossfilter's generic group reduce to let you specify initialize, add, and remove functions for custom aggregation of groups. Typically, these are anonymous inline functions with field names hardcoded, but you can instead use a closure to return such a function with custom parameters (thanks @jefffriesen):
// create functions to generate averages for any attribute
function reduceAddAvg(attr) {
return function(p,v) {
if (_.isLegitNumber(v[attr])) {
++p.count
p.sums += v[attr];
p.averages = (p.count === 0) ? 0 : p.sums/p.count; // gaurd against dividing by zero
}
return p;
};
}
function reduceRemoveAvg(attr) {
return function(p,v) {
if (_.isLegitNumber(v[attr])) {
--p.count
p.sums -= v[attr];
p.averages = (p.count === 0) ? 0 : p.sums/p.count;
}
return p;
};
}
function reduceInitAvg() {
return {count:0, sums:0, averages:0};
}
...
var group = dim.group().reduce(reduceAddAvg(attr), reduceRemoveAvg(attr), reduceInitAvg);
Or, check out Ethan Jewett's reductio library.
There are lots of ways to do reductions with crossfilter. Let's cover the three most common cases here:
- Reducing a few known values per row
- Reducing rows that contain a single value but a different value per row
- Reducing rows that each contain multiple values
All of these use the general form of group.reduce
If there are just a few known values from known fields (a couple sums and counts, say, or a sum and color) it's straightforward to reduce the values into an an object directly:
// reduce the total of field1 into sum1 and the total of field2 into sum2
var group = dimension.group.reduce(
function(p, v) { // add
p.sum1 += v.field1;
p.sum2 += v.field2;
return p;
},
function(p, v) { // remove
p.sum1 -= v.field1;
p.sum2 -= v.field2;
return p;
},
function() { // init
return {sum1: 0, sum2: 0};
}
);
This is simple because we know the exact fields we create, and we can properly initialize them to zero.
Say that each row has fields type
and value
. type
determines the name of the value which value
should contribute to.
var group = dimension.group().reduce(
function(p, v) { // add
p[v.type] = (p[v.type] || 0) + v.value;
return p;
},
function(p, v) { // remove
p[v.type] -= v.value;
return p;
},
function() { // initial
return {};
});
This reduces the sum of any field type
s it finds; if you want a count, use 1
instead of v.value
.
Note that each time we add a row to the reduction, we have to default the field with name v.type
to zero. Otherwise, we would add a number to undefined
, which produces NaN
. It's not necessary to do this in the remove function, because crossfilter will only remove a row after it was added.
Here we use the reusable reduce function pattern from above, to avoid tying the functions to global variables:
function reduceFieldsAdd(fields) {
return function(p, v) {
fields.forEach(function(f) {
p[f] += v[f];
});
return p;
};
}
function reduceFieldsRemove(fields) {
return function(p, v) {
fields.forEach(function(f) {
p[f] -= v[f];
});
return p;
};
}
function reduceFieldsInitial(fields) {
return function() {
var ret = {};
fields.forEach(function(f) {
ret[f] = 0;
});
return ret;
};
}
var fields = ['a', 'b', 'c'...]; // whatever fields you need
var group = dimension.group().reduce(reduceFieldsAdd(fields), reduceFieldsRemove(fields), reduceFieldsInitial(fields));
As above, if you want a count instead of a sum, use 1
instead of v[f]
.
Set a breakpoint on the chart initialization, after the groups are created. Run group.all()
in the debug console and see whether the keys and values make sense. In particular, take a look at whether the value member of each item in the array matches what the accessors (which usually take those array items as input, not just the value part) expect.
(E.g. to completely remove empty groups, or create a cumulative line chart or bar chart.) There are two ways to do this.
- One way is to use the
.data()
function. However, this currently won't work with charts that use.data()
internally, which is most of them; see #584. - Another way is to create a "fake group". The idea is to wrap the original group from crossfilter in another object which will first fetch the results from the original group and then do something to them: add bins, remove bins, manipulate keys or values.
dc.js uses a very limited part of the crossfilter API - in fact, or the most part it only uses dimension.filter()
(filterRange
etc.) and group.all()
.
This means you can easily change the way dc.js pulls data, in order to change the shape or values.
Just create an object with a `.all() method and pass this "fake group" to your chart where you would have passed the original group, and your chart will read from it instead.
In some cases, you may need to implement other methods.
The dataTable
uses dimension.top()
and dimension.bottom()
, and it can be used with a group, so the fake group may need these as well.
Prior to 2.1.2 capped charts also used group.top()
.
Some fake group generation functions are shown below. Each takes a group and produces a fake group which you pass to dc.js instead of the original group.
Add them to your usual crossfilter code like this:
var ndx = crossfilter(...)
var dim = ndx.dimension(...)
var group = dim.group(...) ...
var filtered_group = remove_empty_bins(group) // or filter_bins, or whatever
chart.dimension(dim)
.group(filtered_group)
...
Some examples of "fake groups" follow. You can find many more by searching on Stack Overflow.
function remove_empty_bins(source_group) {
return {
all:function () {
return source_group.all().filter(function(d) {
//return Math.abs(d.value) > 0.00001; // if using floating-point numbers
return d.value !== 0; // if integers only
});
}
};
}
function filter_bins(source_group, f) {
return {
all:function () {
return source_group.all().filter(function(d) {
return f(d.value);
});
}
};
}
function ensure_group_bins(source_group) { // (source_group, bins...}
var bins = Array.prototype.slice.call(arguments, 1);
return {
all:function () {
var result = source_group.all().slice(0), // copy original results (we mustn't modify them)
found = {};
result.forEach(function(d) {
found[d.key] = true;
});
bins.forEach(function(d) {
if(!found[d])
result.push({key: d, value: 0});
});
return result;
}
};
};
This takes a d3 interval for the second parameter, e.g. d3.timeHour
. Explanation.
function fill_intervals(group, interval) {
return {
all: function() {
var orig = group.all().map(kv => ({key: new Date(kv.key), value: kv.value}));
var target = interval.range(orig[0].key, orig[orig.length-1].key);
var result = [];
for(var oi = 0, ti = 0; oi < orig.length && ti < target.length;) {
if(orig[oi].key <= target[ti]) {
result.push(orig[oi]);
if(orig[oi++].key.valueOf() === target[ti].valueOf())
++ti;
} else {
result.push({key: target[ti], value: 0});
++ti;
}
}
if(oi<orig.length)
Array.prototype.push.apply(result, orig.slice(oi));
if(ti<target.length)
Array.prototype.push.apply(result, target.slice(ti).map(t => ({key: t, value: 0})));
return result;
}
};
}
With fill value and stride, as answered here.
function fill_ints(group, fillval, stride = 1) {
return {
all: function() {
var orig = group.all();
var target = d3.range(orig[0].key, orig[orig.length-1].key, stride); // 2
var result = [];
for(var oi = 0, ti = 0; oi < orig.length && ti < target.length;) {
if(orig[oi].key <= target[ti]) {
result.push(orig[oi]);
if(orig[oi++].key === target[ti])
++ti;
} else {
result.push({key: target[ti], value: fillval});
++ti;
}
}
if(oi<orig.length)
Array.prototype.push.apply(result, orig.slice(oi));
if(ti<target.length)
result = [...result, ...target.slice(ti).map(t => ({key: t, value: fillval}))];
return result;
}
};
}
function remove_bins(source_group) { // (source_group, bins...}
var bins = Array.prototype.slice.call(arguments, 1);
return {
all:function () {
return source_group.all().filter(function(d) {
return bins.indexOf(d.key) === -1;
});
}
};
}
One way to compare filtered against unfiltered values, is to copy and freeze the values when the group is created. Example
function static_copy_group(group) {
var all = group.all().map(kv => ({key: kv.key, value: kv.value}));
return {
all: function() {
return all;
}
}
}
Say we have a few groups we want to stack, but they have different X values, so the stack mixin won't display them properly as they are.
function combine_groups() { // (groups...)
var groups = Array.prototype.slice.call(arguments);
return {
all: function() {
var alls = groups.map(function(g) { return g.all(); });
var gm = {};
alls.forEach(function(a, i) {
a.forEach(function(b) {
if(!gm[b.key]) {
gm[b.key] = new Array(groups.length);
for(var j=0; j<groups.length; ++j)
gm[b.key][j] = 0;
}
gm[b.key][i] = b.value;
});
});
var ret = [];
for(var k in gm)
ret.push({key: k, value: gm[k]});
return ret;
}
};
}
The stacks can be accessed by index:
var combined = combine_groups(group1, group2, ...);
chart
.group(combined, "1", function(d) { return d.value[0]; })
.stack(combined, "2", function(d) { return d.value[1]; })
...
Sometimes crossfilter groups with floating point values don't cancel out to zero when the same values are added and then removed. This can cause strange artifacts like negative bars when there are no negative numbers and the "blank" color not showing for ordinal colors
In mathematical terms, floating point numbers are not associative or distributive, so e.g.
1 + .2 - 1 - .2 === -5.551115123125783e-17
This fake group will "snap" values to zero when they get close:
function snap_to_zero(source_group) {
return {
all:function () {
return source_group.all().map(function(d) {
return {key: d.key,
value: (Math.abs(d.value)<1e-6) ? 0 : d.value};
});
}
};
}
(thanks Xavier Dutoit!)
function accumulate_group(source_group) {
return {
all:function () {
var cumulate = 0;
return source_group.all().map(function(d) {
cumulate += d.value;
return {key:d.key, value:cumulate};
});
}
};
}
Sometimes you may need to sort your bins manually. In particular, the line chart can get messed up if you need an ordering different from the natural order of keys, which is what crossfilter will provide through .all()
So here is sort_group
:
function sort_group(group, order) {
return {
all: function() {
var g = group.all(), map = {};
g.forEach(function(kv) {
map[kv.key] = kv.value;
});
return order.map(function(k) {
return {key: k, value: map[k]};
});
}
};
};
Round group values to a quantum.
function round_group(group, q) {
return {
all: () => group.all().map(kv => ({key: kv.key, value: Math.floor(kv.value/q)*q}))
};
}
Since it expects a dimension instead of a group, the data table uses .top()
or .bottom()
to fetch data.
Also, prior to 2.1.2 capped charts used group.top()
.
If you are using a fake group and you get an error saying that one of these methods is not defined, you can add the method to the fake group. These should return items in sorted order just like crossfilter's group.top and dimension.bottom do. .all()
returns bins ordered by their key.
Here is an example expanding remove_empty_bins
with .top()
. The process is similar for other fake groups:
function remove_empty_bins(source_group) {
function non_zero_pred(d) {
//return Math.abs(d.value) > 0.00001; // if using floating-point numbers
return d.value !== 0; // if integers only
}
return {
all: function () {
return source_group.all().filter(non_zero_pred);
},
top: function(n) {
return source_group.top(Infinity)
.filter(non_zero_pred)
.slice(0, n);
}
};
}
Sometimes it's useful to create a "fake groupAll" object, for example to pass to dc.numberDisplay. This is an object with a .value()
method that dynamically computes some value in response to changes in the filters - the same signature as crossfilter's groupAll objects.
For example, here is a unique count fake groupAll which takes a group and returns the number of non-zero bins in the group:
function unique_count_groupall(group) {
return {
value: function() {
return group.all().filter(kv => kv.value).length;
}
};
}
The dataTable
is the only chart that fetches data from a crossfilter dimension. As noted in the docs, you can pass a group as the dimension
parameter if you want to display aggregated data instead of raw data rows.
The only catch is that this will only support descending order, without a little help, since a group only supports .top()
but not .bottom()
. So you may get an error like
_chart.dimension(...).bottom is not a function
Here is a "fake dimension" you can use to wrap the group and provide .bottom()
in order to use ascending order:
function reversible_group(group) {
return {
top: function(N) {
return group.top(N);
},
bottom: function(N) {
return group.top(Infinity).slice(-N).reverse();
}
};
}
Note: reductio does min, max, and median out of the box, so if that's all you need, you should use reductio. This section is if you need to do something more complicated, or if you don't want the dependency or want to understand how this works.
In order to calculate the minimum, maximum, or median, among other things, you need to maintain an array of all the rows in each bin.
There is no way around this. For example, you might think that to calculate the maximum, all you need to do is see whether each added row's value is greater than the current maximum. But what do you do when that row is removed? Do you know if there were multiple rows with that value? What was the second-to-maximum value to restore? What should you do once that value is removed? Etc.
Crossfilter does not provide access to the underlying rows in each bin. It probably could do this, but it doesn't. So you'll need to keep track of the arrays of rows yourself.
The best way to do this is to maintain each array sorted on some unique key, so that you can remove the entry for a row when you see a reduceRemove for it. (Or you can maintain an array of just the values that you need for your metric, but we won't show that here.)
This code shows how to maintain an array of the rows themselves, inside your reduce functions. Since JavaScript uses references for object, nothing is copied and this is reasonably efficient. It's also the most general solution, allowing multiple metrics to be calculated for each row.
function groupArrayAdd(keyfn) {
var bisect = d3.bisector(keyfn);
return function(elements, item) {
var pos = bisect.right(elements, keyfn(item));
elements.splice(pos, 0, item);
return elements;
};
}
function groupArrayRemove(keyfn) {
var bisect = d3.bisector(keyfn);
return function(elements, item) {
var pos = bisect.left(elements, keyfn(item));
if(keyfn(elements[pos])===keyfn(item))
elements.splice(pos, 1);
return elements;
};
}
function groupArrayInit() {
return [];
}
Give these functions a key function which provides a unique key, and they will return a function you can use for your custom reduction:
var runAvgGroup = runDimension.group().reduce(groupArrayAdd(exptKey), groupArrayRemove(exptKey), groupArrayInit);
Then you'll provide accessors which actually calculate the metric:
function medianSpeed(kv) {
return d3.median(kv.value, speedValue);
}
rowChart.valueAccessor(medianSpeed)
Complete example here.
Crossfilter runs in the browser and the practical limit is somewhere around half a million to a million rows of data. If you are binning your data properly (so that you aren't drawing thousands of bars, lines, or dots), the drawing is usually not the bottleneck. The bottleneck is usually the download of large data files and the memory usage of large data sets. It depends on the complexity of the row too (number of columns, data types).
If the data size is okay but it's hurting the interactivity of your page, you can try crossfilter-async to put crossfilter in a webworker.
If, however, you start hitting hard limits, you may want to consider a server-side solution. The response time will not be quite as good, but if your data is that big, then network latency is probably less of a problem than processing the data.
Here are some third-party solutions for using dc.js with a server-based data store. Note: these will probably require some modification of your dc.js configuration. To our knowledge, there is currently no drop-in replacement. If you run into trouble, the dc.js users group is probably the best place to ask (in addition to any forums associated with the projects themselves).
- Mongo solution, by Blair Nilsson. dc-mongo-client, dc-mongo-server users group announcement
- Smartfilter, by Darshit Shah, a Node-based crossfilter replacement.
- Ziggy Jonsson's server-side crossfilter
- Nanocubes and nanofilter, from AT&T Research, a server tuned for aggregations in multiple dimensions.
Make sure you are using the data within the callback from the function. These functions return immediately and the data will not be defined outside of the callback. Once the data fetch and parsing has been done, the callback will be called.
Check that you have set .xUnits
on your chart. The parameter should correspond to the X scale and the reduction keys of the chart's group.
- if the X scale is one of the D3 continuous scales,
xUnits
should bedc.units.integers
ordc.units.fp
- if the X scale is a D3 Ordinal Scale,
xUnits
should bedc.units.ordinal
- if the X scale is a D3 Time Scale,
xUnits
should be one of the d3 time interval ranges
(This is a general JavaScript question, but it comes up a lot with d3 & dc because a lot of function objects are used.)
If you refer to a variable outside the body of the function, you are using the variable by reference, not by value. So when the function is run, it will have the current value of the variable, not the value the variable had when you created the function:
var a = []
for(var i = 0; i < 10; ++i) a.push(function() { return i;});
a.map(function(f) { return f(); }); // returns [10, 10, 10, 10, 10, 10, 10, 10, 10, 10]
The easiest way to fix this is often to use Array.forEach instead of a for loop:
var a = []
d3.range(10).forEach(function(i) { a.push(function() { return i;}); });
a.map(function(f) { return f(); }); // returns [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
This works because the intervening function captures the current value in a local variable.
You can also use an auxiliary function:
var a = []
function helper(i) { return function() { return i; }; };
for(var i = 0; i < 10; ++i) a.push(helper(i));
a.map(function(f) { return f(); }); // returns [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Or to keep the code in one place, you can use an IIFE:
var a = []
for(var i = 0; i < 10; ++i)
a.push(function(i) {
return function() { return i; };
}(i));
a.map(function(f) { return f(); }); // returns [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Often a chart will work with ordinal or linear scales, but not work when using d3.time
scales.
The usual reason for this is that the dates need to be parsed as JavaScript Date Objects in order to be used with d3 time scales.
Before passing your data to crossfilter, do something like this:
data.forEach(function(d) {
d.date = new Date(d.date);
});