masked_select: Filtering tensors with a boolean mask tensor #400

brentp · 2019-11-04T16:29:16Z

given a 2d array/tensor, how would i do the arraymancer equivalent of this numpy expression:

a[:,a.mean(axis=0) > 0.5] = -1

I think it's probably some combination of the map/apply/fold_inline, but it's not obvious how to do that.
Once I understand, I can open a PR with some common examples like this if it's helpful.

The text was updated successfully, but these errors were encountered:

brentp · 2019-11-12T16:49:59Z

I have come up with this:

import arraymancer

var T = randomTensor[float32](8, 3, 1'f32)
let m = T.mean(axis=0).broadcast(T.shape)

apply2_inline(T, m, if x > y: x else: -1)
echo T

which seems reasonable. not sure if there's a simpler way.

mratsim · 2019-11-13T11:14:05Z

So what's happening is that Numpy accepts a Tensor of booleans as a mask for selecting things.

Your solution works for your case because you don't need to discard/filter the values, however implementing filtering with a Tensor of bool would also be very useful for dataframes/analysis related stuff.

For filtering, I see the following difficulties:

Implementing the algorithm: it may be easier to use PyTorch's masked_select as a reference but while not as complex as Numpy, the codebase is very complex at the moment due to their 3 backens mixed together: C Torch, Aten and C10.
Updating the indexing macro to accept a Tensor of bool and dispatch to the masked_select proc(s). This may require multiple dispatch/allocations if 2 dimensions are sliced at the same time.

brentp · 2019-11-13T17:01:05Z

Thanks for considering it. For my additional 0.02...
I guess that, for example indexing with a tensor of booleans or indexes as in numpy is a fundamental feature, but for other stuff, I think I'd be fine just writing the 2-3 lines myself if I had a better handle on the map* and apply* functions.

To that end, it'd be nice if the broadcasting was done auto-magically (as in numpy) as well.

mratsim · 2019-11-13T22:03:48Z

Broadcast is done auto-magically with the .+ and other series of dot function. I intentionally have 2 different operators because it's a pain in Numpy to not broadcast and have a silent error to debug.

Arraymancer/src/tensor/operators_broadcasted.nim

Lines 25 to 28 in 94efff3

    
           proc `.+`*[T: SomeNumber|Complex[float32]|Complex[float64]](a, b: Tensor[T]): Tensor[T] {.noInit,inline.} = 
        
             ## Broadcasted addition for tensors of incompatible but broadcastable shape. 
        
             let (tmp_a, tmp_b) = broadcast2(a, b) 
        
             result = tmp_a + tmp_b

brentp · 2019-11-21T23:36:55Z

I am also having trouble just using the map/apply stuff. for example, I can't understand how to change this (which either segfaults or runs endlessly):

import arraymancer

var T = randomTensor[float32](250, 17384, 1'f32)
let m = T.map_inline():
  if x < 0: 1'f32 else: 0'f32

echo m.shape
echo m
echo m.mean(axis=0)

Vindaar · 2020-03-31T14:23:35Z

@brentp It's been a while since your last post, so maybe you've noticed this after posting at some point (or you encountered a real bug):

I believe the reason you're seeing the code run endlessly, is simply that the tensor you create is huge and arraymancer's printing is pretty slow (and doesn't just cut off after a fixed N elements).

mratsim · 2020-04-01T12:33:31Z

Note that masked_select implementation is planned soon.

mratsim · 2020-04-05T17:43:25Z

Tentative implementation and names at #429

If you have suggestion on proc name and description to limit confusion especially for

proc masked_axis_fill(t: var Tensor[T], mask: Tensor[bool], axis: int, value: T) 
  ## Take a 1D-mask 
  ## iterate on t along the axis and fill the slice of t with `value`
  ## if the mask[current_iteration_index] is true

proc masked_fill_along_axis(t: var Tensor[T], mask: Tensor[bool], axis: int, value: T) 
  ## Take a N-D mask. Dimension along the axis must be 1
  ## iterate on t along the axis
  ##   On the slice of t, apply masked_fill

I'm taking them (let's have the name discussion in the PR)

* index_select should use SomeInteger not SOmeNumber * Overload index_select for arrays and sequences * Masked Selector overload for openarrays * Add masked overload for regular arrays and sequences * Initial support of Numpy fancy indexing: index select * Fix broadcast operators from #429 using deprecated syntax * Stash dispatcher, working with types in macros is a minefield nim-lang/Nim#14021 * Masked indexing: closes #400, workaround nim-lang/Nim#14021 * Test for full masked fancy indexing * Add index_fill * Tensor mutation via fancy indexing * Add tests for index mutation via fancy indexing * Fancy indexing: supports broadcasting a value to a masked assignation * Detect wrong mask or tensor axis length * masked axis assign value test * Add masked assign of broadcastable tensor * Tag for changelog [skip ci]

mratsim changed the title ~~docs on common operations~~ masked_select: Filtering tensors with a boolean mask tensor Nov 13, 2019

mratsim added the key feature label Nov 13, 2019

mratsim mentioned this issue Apr 5, 2020

[RFC] Masked select and Masked fill #429

Merged

mratsim added a commit that referenced this issue Apr 19, 2020

Masked indexing: closes #400, workaround nim-lang/Nim#14021

46d32d9

mratsim mentioned this issue Apr 19, 2020

Implement Numpy fancy indexing #434

Merged

7 tasks

mratsim closed this as completed in #434 Apr 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

masked_select: Filtering tensors with a boolean mask tensor #400

masked_select: Filtering tensors with a boolean mask tensor #400

brentp commented Nov 4, 2019

brentp commented Nov 12, 2019

mratsim commented Nov 13, 2019

brentp commented Nov 13, 2019

mratsim commented Nov 13, 2019 •

edited

Loading

brentp commented Nov 21, 2019

Vindaar commented Mar 31, 2020

mratsim commented Apr 1, 2020

mratsim commented Apr 5, 2020 •

edited

Loading

masked_select: Filtering tensors with a boolean mask tensor #400

masked_select: Filtering tensors with a boolean mask tensor #400

Comments

brentp commented Nov 4, 2019

brentp commented Nov 12, 2019

mratsim commented Nov 13, 2019

brentp commented Nov 13, 2019

mratsim commented Nov 13, 2019 • edited Loading

brentp commented Nov 21, 2019

Vindaar commented Mar 31, 2020

mratsim commented Apr 1, 2020

mratsim commented Apr 5, 2020 • edited Loading

mratsim commented Nov 13, 2019 •

edited

Loading

mratsim commented Apr 5, 2020 •

edited

Loading