Skip to content

Conversation

@karwa
Copy link
Contributor

@karwa karwa commented Jan 29, 2020

This still needs some cleaning-up. It introduces a lot of new things:

  • Adds a Heatmap<T> plot, which can plot any sequence of any type, by means of an adapter.
  • Introduces a functional-style API. For example, the way you create a heatmap is like this:
    let data: [[Float]] = median_daily_temp_boston_2012
    let heatmap = data.plots
      .heatmap(interpolator: .linear) {
        $0.plotTitle.title = "Maximum daily temperatures in Boston, 2012"
        $0.plotLabel.xLabel = "Day of the Month"
        $0.colorMap = ColorMap.fiveColorHeatMap.lightened(by: 0.35)
        $0.showGrid = true
        $0.grid.color = Color.gray.withAlpha(0.65)
    }

Notice that the plot is created from the data (data.plots.heatmap(....))

  • Introduces color maps and more advanced multi-stop gradients.
  • Adds some well-known color maps from matplotlib (I believe we're allowed to do this, so long as we acknowledge where they come from)
  • Allows plots to request a smaller plot size via the AdjustsPlotSize protocol, which helps to reduce aliasing.
  • Adds support for PlotElements to GraphLayout (an object which gets laid-out and takes up space), and remodels axis labels to be PlotElements. This will allow us to support other items like color bars (which matplotlib can do)

Some things I know I need to work on before this is ready to merge:

  • I don't feel the name PlotElement is very descriptive. Maybe LayoutElement? Can anybody suggest a better name?
  • Same goes for Interpolator. Perhaps Mapping or HeatmapAdapter would be better? Note that users almost never need to write this name.
  • What about the functional-style API? I'd be interested what people think about this.
  • The plot title and axes markers should be natively stored as Label, so we shouldn't need to convert them.
  • Other missing features as noted in heatmap.swift (e.g. axis labels)
  • More tests

@karwa
Copy link
Contributor Author

karwa commented Jan 29, 2020

Test failures are expected. The plots are now slightly larger because of changes to the way the area is calculated.

@odmir
Copy link
Contributor

odmir commented Jan 30, 2020

I have to say I took a peek last year at your work on this in your fork and I got pretty excited about this!

I think I like LayoutElement better, maybe another alternative could be LayoutComponent?
What do you think about Normaliser instead of Interpolator, would it make sense? Since usually when you say you are normalising some data, what you do is map it to values between 0 and 1 or some other definition of normalisation.
I like the functional approach, I think though we should eventually simplify the setting of labels and stuff by making things like plotTitle and plotLabel unnecessary to the end user, by using nice setters directly on the graphs.

Collection slicing by filter closure?

What do you mean by this?

Overall I really like it! 🎉

@karwa
Copy link
Contributor Author

karwa commented Jan 31, 2020

I think I like LayoutElement better, maybe another alternative could be LayoutComponent?

LayoutComponent is nice.

What do you think about Normaliser instead of Interpolator, would it make sense? Since usually when you say you are normalising some data, what you do is map it to values between 0 and 1 or some other definition of normalisation.

I also have a prototype of a bar-graph which is generic to any data. For example, if you have a collection of Department objects, you can make a BarGraph<[Department]> directly and set an adapter for the property you want to graph via key-paths (e.g. \.employees.count).

Removing the constraint that the data is a number and replacing it with an abstract way to get a number, we can allow the graphs to use higher-level data types. This has some really nice side-effects for things like data coordinates and formatting axes markers (because we're now a graph of Departments instead of a graph of Ints).

In order to make this work, every plot that is generic like this will need an adapter. Because of that, I made an Adapters namespace and renamed the type Adapters.Heatmap.

One downside of this generic-ness is that to do things like stacked datasets and multiple series, we need to create enormous chains of generic types. For example:

StackedBarGraph<StackedBarGraph<StackedBarGraph<BarGraph<[Int]>, Data>, [Float]>, LazyMapSequence<(ClosedRange<Int>), MyStruct>>

Nobody wants to write that, so it basically forces us to use the functional-style API. It's worth pointing about that SwiftUI also has this issue, but uses the non-standard "function builders" DSL instead. Maybe that would also be something for us one day.

I like the functional approach, I think though we should eventually simplify the setting of labels and stuff by making things like plotTitle and plotLabel unnecessary to the end user, by using nice setters directly on the graphs.

Definitely. That will be source-breaking though, so maybe it's good if we tag a stable release before doing that.

Collection slicing by filter closure?

This is about the heatmap's functional API. We have one for 2D collections, and one for any collection, which slices rows of a fixed width. This refers to adding another variant, which slices rows according to some line-breaking closure.

I've since dropped this idea. It's good to keep the list of overloads down, so users are not overwhelmed. I'm not even 100% sure about the fixed-width slicing version; it's cool, but maybe it would be better as a general-purpose extension on Collection instead.

@KarthikRIyer
Copy link
Owner

I like LayoutComponent.

I do like the functional API, but I think that there will be people who aren't comfortable with the functional approach(including me 😕 ). Is there a way to support both kinds of API?

.heatmap(interpolator: .linear) {
I find interpolator here to be more intuitive, so I think as long as the argument label remains the same we can use HeatmapAdapter.

@KarthikRIyer
Copy link
Owner

Ohh... just saw the comments in the code. Would mapping be better as an argument label?

@karwa
Copy link
Contributor Author

karwa commented Jan 31, 2020

I do like the functional API, but I think that there will be people who aren't comfortable with the functional approach(including me 😕 ). Is there a way to support both kinds of API?

Sure. It would be more code, but I think it's possible. I'm not entirely sure how it would work with plots that compose multiple datasets, like BarGraph, but I'm starting to think it may be doable.

One benefit of the initialiser approach is better integration with SwiftUI. SwiftUI has these protocols NSViewRepresentable and UIViewRepresentable, which allow ordinary objects to become part of the view hierarchy. This means you can do something like:

            HStack {
                Text("SwiftPlot meets SwiftUI").font(.headline)
                Heatmap(values) {
                    $0.plotTitle.title = "\(values.count) values"
                    $0.colorMap = ColorMap.fiveColorHeatMap.lightened(by: 0.35)
                    $0.showGrid = true
                }.frame(height: 600)
            }.padding(10)

Would mapping be better as an argument label?

mapping makes a lot of sense, sure.

@odmir
Copy link
Contributor

odmir commented Jan 31, 2020

This refers to adding another variant, which slices rows according to some line-breaking closure.

That idea is interesting, but could you explain more about the use cases? Like I imagine it would be relatively easy to to inadvertently slice a collection where the resulting rows are of different lengths, what would happen then, can we still make a HeatMap out of that? or am I misinterpreting this?

@karwa
Copy link
Contributor Author

karwa commented Feb 1, 2020

@odmir rows of different lengths are fine. The use-cases aren't that convincing though, and the standard library already provides this through a .split function, so I removed it.

@odmir
Copy link
Contributor

odmir commented Feb 2, 2020

Ok, cool, lately I've been playing around with your branch because I had an idea I wanted to try out, so now I know in much more detail how the Heatmap works and understand what you mean much better!

@odmir
Copy link
Contributor

odmir commented Feb 2, 2020

Ok so now I'd like to tell you about that idea I had.

I thought the Heatmap handling of different types of data could be improved by instead of depending on SeriesType being two nested sequences, it could be any type conforming to a protocol that I'll call Heatmappable.
This protocol would require of conforming types that they provide a width, height and a method that constructs a type conforming to another protocol which I'll call HeatmappableIterator (it does not conform to the stdlib IteratorProtocol).
This "iterator" protocol only has one required method that returns the row, column and the value at those coordinates when called. The expected behaviour is that upon instantiation of a type that conforms to this protocol, we can repeatedly request new values and their coordinates until every value is exhausted. There is no requirement on the order the values must be returned, although it should be reasonable that no two values for the same coordinates should be given, but the other way around is ok: for this type to never return values for some coordinates. We would require, though, that every conforming type returns a Float value between 0 and 1.

After having this in place we could create types conforming to these protocols that provide the functionality of "iterating" over 1D data or 2D data. One step further is to move the Mapping related logic from the Heatmap to the types that conform to these protocols that need to interpolate values or get specific properties out of them with KeyPaths.

This approach would have several advantages:

  • By moving the Mapping logic as well as the logic for calculating the coordinates and getting the values, from Heatmap into other types we achieve a much simpler Heatmap implementation;
  • For the same reason, we should no longer be traversing a 2D sequence in the layout method (improved performance in case of a lot of drawing and re-layout with big/slow datasets);
  • By removing the requirement that the data storage inside the Heatmap should be two nested Sequences, we remove the necessity to convert other data types into our own which could be good for speed, specially since we wouldn't need to possibly do multiple small copies for each row of data or worse depending on the original data;
  • We don't limit the order in which the data may come in. It may come in in the order that is preferable for the original data, where it could be the most performant/convenient.

Maybe there are other advantages I didn't think about...
I've already said these types must return Float values between 0 and 1, but unless their objective is solely to plug some data type into the Heatmap, most of these types should accept other conforming types as input and expect the received values to also be Floats ranging from 0 to 1.

This opens up some really interesting possibilities and gives us a lot of flexibility:

  • We could create a conforming type that does a mosaic of the data coming from four other conforming types. Imagine that each of those four types get data from the same data source but each of them looks at a different property using KeyPaths;
  • We could do something similar but interleaving the data from different other conforming types;
  • We could aggregate data: like make one square in this type correspond to some transformation on 4 squares of another type;
  • We could make types that simply transform the data before it arrives on the Heatmap;
  • We could have types that generate the data on the fly from a closure given at initialisation;
  • The data could come from something like a stack of frames that, by repeatedly requesting the Heatmap to redraw, an animation could be shown/video/interactive viewing of the different frames in that stack, ie: using buttons in some kind of ui;
  • Imagine the data we have contains 3 different channels, we could have a type for each channel to get them to values between 0 and 1. Then we could have another type that grabs the values from 0 to 1 from each of those 3 channels and outputs them in a way that using the correct colour gradient we could see each channel represented as if it they were the RGB colour channels of an image;
  • All of this should be extremely composable so we could mix and match to our preference depending on the processing of the data and the final look we want.

If there is data that is very expensive to get or if we have a big tree of these types connected and the performance is starting to suffer, we could even do such things as create a conforming type that caches requested squares and that can be plugged anywhere that would make sense into this "tree".

The most commonly used types or "trees" of types could be created by static methods on the data directly like in extensions to the SequencePlots type so that the user has a nice experience using these for the common cases.

Ok I'm sorry if you've gone through all that and got tired of all these shenanigans. I really think this approach opens up a lot of possibilities and I'd like to hear your opinions on this 😁.

As I said before, I wanted to see how this would look like implemented, so I branched from the last commit in this draft and started implementing it just to get an idea. I checked and it draws the same graphs as the tests before, including the SVG's. If you'd like to see what it looks like, it is in my branch: odmir:heatmap_alternate. If you feel like following this approach, feel free to just copy it over to your branch!

@karwa @KarthikRIyer

@karwa
Copy link
Contributor Author

karwa commented Feb 2, 2020

It's an interesting idea. The thing is, if we reduce the Heatmap down to the most basic things that it needs, it needs a 2D dataset. Even if it used a HeatMappable abstraction instead, it would still ask for 2 dimensions; you would just be using an adapter.

So it turns out these kinds of data transformations aren't really related to the Heatmap itself - they are generic Collection-y things. It's similar to the idea of the filter closure - in theory, you could use that to slice any kind of data in any way... and that's why it's in the standard library. I even think the fixed-width slicing could maybe be promoted to a generic collection algorithm. This is the thing - do we really want to implement all that logic again and again for every 2D plot? Not really.

I'm not sure that the HeatMappable abstraction really gains us much in practice. You can build mosaics and aggregates and everything else using generic collection algorithms:

    let hm = (-20..<20).lazy.map { x in
        (-20..<20).lazy.map { y -> Int in
            return (x * x) + (y * y)
        }
    }.plots.heatmap()
    try renderAndVerify(hm)

At the same time, I do think it's important that we use standard protocols like Sequence and Collection. That means users can directly graph things like Array, but also important types like Tensor (which can also be 2D), and slices of the above types. All of them will already conform to Sequence and Collection. That's not a data storage requirement, really - you can make a trivial wrapper around almost anything and allow it to conform to Sequence. And if there are cool data-layout features that you like in 3rd-party libraries, like mosaics or random distribution generators or whatever, they'll almost certainly also be using the standard protocols.

Plots are not usually drawn repeatedly without the data having changed. Even in SwiftUI, it only triggers when a dataset stored as a @State property changes. I've been thinking about ways to isolate size-independent data, but that's an internal optimisation between the plots and the GraphLayout.

@odmir
Copy link
Contributor

odmir commented Feb 2, 2020

Thank you for opening my eyes... 😅 I was quite blinded thinking that the approach in this draft did not allow for the same things as my approach, I missed that it is trivial to make other types conform to Sequence and thus my performance arguments around reallocating memory and copying over data don't hold true. Actually, after submitting the last comment I rethought about the applications I wrote about and it sounded very similar to stuff you would want to do to things like tensors and similar to operations that CNN's and such already use today! In my branch I created two implementing types that take generic 1D sequences and 2D sequences so any data type that meet those requirements would be usable as is, but given the above realisation, there would be no advantages to my approach. I can see why all these applications actually make much more sense in its own library geared towards data treatment like TensorFlow, they should not at all be in Heatmap itself.

I do think though we should eventually figure out a way to not do unnecessary work in layout every time we want to draw, for applications where the data never changes but the graph is resized.

Anyways, I think this was a good exercise for me to tinker more with generics, PAT's and with general API design and underlying architecture. Thanks for the time and sorry if I made you waste a lot of time reading through my last comment and/or code in the branch!

@odmir
Copy link
Contributor

odmir commented Feb 2, 2020

Designing API's and architecting code is really new to me and I can see that I really need more experience. Thank you for helping me through that, it has been a really interesting experience!

@odmir
Copy link
Contributor

odmir commented Feb 2, 2020

By the way, do you think all the extensions on SequencePlots should be moved to their own file? Right now I think it can get a bit messy with extensions on Heatmap itself on the same long file.

karwa added 16 commits February 3, 2020 04:32
… to allow generating heatmaps of custom data-structures
… on Interpolator. This allows us to Heatmap sequences of arbitrary types (for example, using a KeyPath to an int or float to grade them).

- Moved Interpolator to its own file. It should possibly be given a new name, too.
- Added convenience functions for generating a Heatmap from arbitrary RACs or 2D Sequences. This allows for some really cool things.
…cense appears to allow - "derivative works" only need to mention how they changed matplotlib, which I did)
Added some geometry helpers for Rect.
…lts in a very tiny (like, sub-pixel) loss to accuracy, but results in a better image with sharper gridlines.
…iasing artefacts but shrinks the plot-size considerably.

Next step will be, then, to allow data layout to change the plot-size. That's not going to be fun.
…n to determine the row locations, but doesn't require copying in to a RAC (which would also be O(n), *and* consume memory to boot).

Allows direct heat-mapping of Strings (is that useful?). It would be nice if all charts did this and namespaced it under ".plots" or something (see ".lazy" in the Swift stdlib).
- Limit rounding on heatmap to elements with size > 1
- Added internal AdjustsPlotSize protocol for making slight changes to the plot's border rect. We need to think about this a bit more and how it affects the other plot chrome.
- GraphLayout can now draw its grid as part of the foreground, *over* the data.
- GraphLayout can now align marker labels between axis markers, instead of directly at them.
- GraphLayout now draws its border correctly. Before, it would overlap the plot's area slightly and data at the edges could bleed in to the border.
- Marker ticks and labels are now aware of the plot's border thickness.
- Marker thickness can be set separately of the border thickness. For instance, you can still have markers if the border size is 0.
@karwa
Copy link
Contributor Author

karwa commented Feb 4, 2020

OK, so I've found the source of the weird rendering with the heatmap: sometimes AGG considers the markers to have size 9, and sometimes size 10. This is because some of them descend slightly below the baseline. I believe this doesn't affect the other examples because their markers happen to use longer strings.

We should actually align all the markers by baseline, but we don't calculate or keep that information today. Doing that would improve the quality of our plots substantially, but it would be a lot of work and this patch is already big enough. That can be left as an improvement for the future - actually, Heatmap is doing nothing wrong. It just exposed a bug somewhere else.

By the way, the .swift-format file is from TensorFlow. It happens to match the style for most of this project (4 space indenting), so I've included it and run the formatting tool (only on files created in this PR).

So I think that's all the feedback addressed. @KarthikRIyer, anything to add?

@odmir
Copy link
Contributor

odmir commented Feb 4, 2020

Nice, thank you for the format file as well!

Copy link
Owner

@KarthikRIyer KarthikRIyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've started going through the changes and it'll take a little more time for me to complete that. The changes I've gone through till now look really good and interesting! There are probably some more things I mightn't understand because I'm just a basic swift user.

Also, how do you use the swift-format file? I've used this before https://github.com/apple/swift-format

@KarthikRIyer
Copy link
Owner

var cgColor: CGColor {
            if #available(OSX 10.15, iOS 13.0, *) {
                return CGColor(
                    srgbRed: CGFloat(r),
                    green: CGFloat(g),
                    blue: CGFloat(b),
                    alpha: CGFloat(a))
            } else {
                var tuple = (CGFloat(r), CGFloat(g), CGFloat(b), CGFloat(a))
                return withUnsafePointer(to: &tuple) { tupPtr in
                    return tupPtr.withMemoryRebound(to: CGFloat.self, capacity: 4) { floatPtr in
                        return CGColor(colorSpace: RGBColorSpace, components: floatPtr)!
                    }
                }
            }
        }

Why do we have two methods to get CGColor?

@karwa
Copy link
Contributor Author

karwa commented Feb 6, 2020

I've started going through the changes and it'll take a little more time for me to complete that. The changes I've gone through till now look really good and interesting! There are probably some more things I mightn't understand because I'm just a basic swift user.

Feel free to ask if there's anything that's unclear.

Also, how do you use the swift-format file? I've used this before https://github.com/apple/swift-format

Yes, it's the same tool; the file is just stores the configuration for the project (so you can just run swift-format -i ./Sources/SwiftPlot/xxx.swift from the project root, and it detects the settings). It's really helpful for me, because I usually use 2-space indentation, but this project uses 4-space. As a result, it's kind of mixed over the project. I think that and the line-length are the only configuration settings that actually change anything for us.

Why do we have two methods to get CGColor?

There's only one computed property, but it branches depending on the SDK version, The first one (with SRGB values) is the only one available on some platforms (e.g. tvOS). But it's also quite new, so older platforms don't have it and have to use the colorspace/components initialiser as a fallback.

@KarthikRIyer
Copy link
Owner

KarthikRIyer commented Feb 11, 2020

There's only one computed property, but it branches depending on the SDK version, The first one (with SRGB values) is the only one available on some platforms (e.g. tvOS). But it's also quite new, so older platforms don't have it and have to use the colorspace/components initialiser as a fallback.

@karwa I understood this but was wondering why we needed withMemoryRebound one place and not the other.

@karwa
Copy link
Contributor Author

karwa commented Feb 11, 2020

@KarthikRIyer The fallback initialiser requires a C-array of components (i.e. a pointer to the first element). See: https://developer.apple.com/documentation/coregraphics/cgcolor/1455927-init

@KarthikRIyer
Copy link
Owner

Ahh I see. I was thinking about https://developer.apple.com/documentation/coregraphics/cgcolor/1455631-init which is available 10.5+ but didn't notice that the other init was available 10.3+.

@karwa
Copy link
Contributor Author

karwa commented Feb 11, 2020

Yes, but the main motivation is for iOS, tvOS, etc. The 10.5+ initialiser isn't available at all on, say, iOS 12.

@KarthikRIyer
Copy link
Owner

Understood @karwa .

I went through the rest of the code and have got a gist of what's going on.
Thanks a lot! Merging.

Also do you think it's a good time to mark a stable release (after merging the bracket annotation of course).

@KarthikRIyer KarthikRIyer merged commit ba39eff into KarthikRIyer:master Feb 11, 2020
@karwa
Copy link
Contributor Author

karwa commented Feb 11, 2020

@KarthikRIyer yes, I think it would be a great idea to mark a stable release!

There are some areas where we could make the API simpler to use or add features, but not in a compatible way. Having a stable version means existing code can keep working, even if later releases modify the API.

For example, we use Label to represent the axes labels for layout, and they each have an independent size/color, because it gets used in multiple places and they all measure and draw independently from each other. The existing API combines the size and color of the axis labels, so users couldn't make a red x-axis label and a blue y-axis label, or have them each be different sizes if they wanted to, even though we can support that.

Similarly, ScatterPlot could use the new ColorMap, but not in a way that preserves its current API.

@KarthikRIyer
Copy link
Owner

Great! Let's release once @WilliamHYZhang's PR is merged.

@KarthikRIyer
Copy link
Owner

KarthikRIyer commented Feb 11, 2020

https://gist.github.com/KarthikRIyer/72074b291ccf83c07e0ce9579041e477

Here's a draft of the release notes for our next release. Could you'll review it?

Also @karwa @boronhub @Qwerty71 could you please let me know your full names so that I can add them to the release notes? Also if it's fine with you could you share your twitter handles? I'd like to mention you'll in the post for the release.

@anigasan @boronhub @karwa @odmir @Qwerty71 @WilliamHYZhang

@KarthikRIyer
Copy link
Owner

Also @karwa for the iOS/tvOS/watchOS support, would it be possible for you to provide an image with the plot on all three platforms? The current image just shows iOS and watchOS.

@WilliamHYZhang
Copy link
Contributor

The link seems to be broken on my end.

@KarthikRIyer
Copy link
Owner

KarthikRIyer commented Feb 11, 2020

Link updated @WilliamHYZhang.

Is there a way to share draft releases?

@WilliamHYZhang
Copy link
Contributor

Thanks @KarthikRIyer, draft release looks awesome!

@areebg9
Copy link
Contributor

areebg9 commented Feb 11, 2020

Hi! The changelogs look brilliant! The project as a whole looks very promising!

My name is Areeb Gani and my twitter is @areebg9.

I'm also planning on finishing the polar plot and other github-related activities soon; I've been busy with many math competitions recently so I haven't found time to do much programming in the last 2 weeks.

@boronhub
Copy link
Contributor

@KarthikRIyer this looks great, I'm really interested to see how this project develops!
My nameis Dhruv Baronia.

@karwa
Copy link
Contributor Author

karwa commented Feb 14, 2020

@KarthikRIyer My full name is Karl Wagner

I'll make a image showing tvOS support later today. Also I could clean up and release a version of the AppKit/UIKit/SwiftUI bindings.

I think it's better to keep those bindings separate for now, because they're still kind-of experimental (haven't been tested thoroughly) and there are a number of things I would like to change with the interface.

@KarthikRIyer
Copy link
Owner

KarthikRIyer commented Feb 16, 2020

This seems good. I could add a link to the bindings release in the release notes as well as the docs, for now.

@KarthikRIyer
Copy link
Owner

I'll make a image showing tvOS support later today. Also I could clean up and release a version of the AppKit/UIKit/SwiftUI bindings.

@karwa any updates on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants