Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extra Encoders (Image, Video, sound, ...) #259

Open
4 of 15 tasks
breznak opened this issue Feb 9, 2019 · 6 comments
Open
4 of 15 tasks

Extra Encoders (Image, Video, sound, ...) #259

breznak opened this issue Feb 9, 2019 · 6 comments
Labels
community encoder question Further information is requested

Comments

@breznak
Copy link
Member

breznak commented Feb 9, 2019

For some experiments, I'd like to setup encoders/extra/{vision,audio,...}/
with specialized encoders for multiple modalities.

There existed special repos as

But basically these were

  • Encoder
  • normal HTM (+ problems to set up 2 repos)
  • the experiment

I think it'd help our community if we provided all-around baseline.
What do you think?

EDIT:

Domains:

@breznak breznak added question Further information is requested community encoder labels Feb 9, 2019
@breznak breznak mentioned this issue Feb 9, 2019
11 tasks
@ctrl-z-9000-times
Copy link
Collaborator

ctrl-z-9000-times commented Feb 16, 2019

Maybe also a Grid Cell encoder, which converts a Cartesian coordinate into a grid-cell like encoding.

@ctrl-z-9000-times
Copy link
Collaborator

The original nupic had a "time & date" encoder. It's not critical, but might be nice to have. I think the motivation for this encoder is that many anomalies correlate with a time of day or day of the week.

HTM school video for date/time encoder:
https://discourse.numenta.org/t/htm-school-episode-6-datetime-encoding/892

@ctrl-z-9000-times
Copy link
Collaborator

I wrote a vision encoder which may be added to this repo. It needs some cleanup & modification before it will be ready, and I don't know when I will get around to it. Currently it is a research prototype.

It is written in python, and uses opencv. The Opencv library has functions which do log-polar transforms, and Parvo/Magno-cellular transforms. Open-CV works well and it looks pretty well researched. I wrote an encoder which converts the processed images into sparse-distributed-representations. My encoders are less well researched.

Here are some images showing the log-polar & parvo/magno-cellular transforms.

Field of view:
4217roi

Parvo-cellular:
4217parvo

Magno-cellular:
4217magno

Here are some statistics about the SDR encoded output of eye, after viewing the dataset which contains the above image.

Parvo SDR( 250 250 1 )
    Sparsity Min/Mean/Std/Max 0.197344 / 0.200787 / 0.00118844 / 0.205056
    Activation Frequency Min/Mean/Std/Max 0 / 0.20079 / 0.29597 / 1
    Entropy 0.458866
    Overlap Min/Mean/Std/Max 0.355327 / 0.833063 / 0.0761267 / 0.956605
Magno SDR( 250 250 1 )
    Sparsity Min/Mean/Std/Max 0.19864 / 0.203853 / 0.00151922 / 0.208624
    Activation Frequency Min/Mean/Std/Max 0 / 0.203854 / 0.182725 / 0.595165
    Entropy 0.76574
    Overlap Min/Mean/Std/Max 0.00212632 / 0.848003 / 0.133877 / 0.981023

@breznak
Copy link
Member Author

breznak commented Sep 19, 2019

This is amazing!!

parvo/magno-cellular transforms.

I'll need to refresh my knowledge, this is a good starter:
https://foundationsofvision.stanford.edu/chapter-5-the-retinal-representation/#visualinformation

I cite:

When cells in the parvocellular layers of a monkey’s lateral geniculate nucleus are destroyed, performance deteriorates on a variety of tasks, such as color discrimination and pattern detection. Since the parvocellular pathway includes more than seventy percent of the retinal ganglion cells, perhaps this result is not terribly surprising. When cell bodies in the magnocellular layers are destroyed many visual performances are unaffected.

What conclusion can we draw from these lesion studies? The information carried by the neurons in the magnocellular pathway provide the best information in the low temporal and high spatial frequency components of the image.

= image classification type of tasks

Performance on motion tasks and other tasks that require this information is better when the magnocellular pathway signal is available.

= video processing in vision tasks. motion detection/tracking.

Here are some statistics [...] after viewing the dataset

So you needed to produce several SDRs from the image.

  • do you use some saccadic movements (to generate a few centers of focus)
  • or is this used on an "animation" (a sequence of related images)?

I am wondering how this kind of biologically plausible encoding would fare on "stupid" classification datasets, like MNIST, CIFAR10 etc.
Or we'd have to extend to real-world tasks: object recognition in video..

I'll be reading more chapters on vision, please share your retina code when you have time, even if it's not ready yet. Thank you

@ctrl-z-9000-times
Copy link
Collaborator

https://foundationsofvision.stanford.edu/chapter-5-the-retinal-representation/#visualinformation

That looks like a good source. My model is wrong with regards to many details, including the relative densities of magnocellular and parvocellular neurons.

do you use some saccadic movements (to generate a few centers of focus)

No, this is TODO. Currently I move the eye by a small random amount between each compute cycle. Controlling where the eye looks is an open issue, and it involves action selection and motor control.

please share your retina code

I'm keeping my latest work on the eye encoder here: https://github.com/ctrl-z-9000-times/sdr_algorithms/blob/master/eye.py I don't know when I will have time to work on it further.

@breznak
Copy link
Member Author

breznak commented Sep 20, 2019

My model is wrong with regards to many details, including

yes, according to the literature, these modifications could apply:

  • magno-/parvo-cellular neurons in 70:30 proportion ( = this is "Retina")
  • force abstraction by 1:10 compression (in retina to optical nerve neuron counts), SP could do that.
  • then starts the (large in size) visual cortex (HTM, SP+TM)
  • Q:
    • for image classification, use only parvo cells?

move the eye by a small random amount between each compute cycle. [...] involves action selection and motor control.

would the saccadic moves wrongly trigger movement/magno cells? Or would that help to encode "I moved the eye/focus, so the 'move' is caused by the move of sensor, not move of the objects in the scene"?

  • I'm considering for initial, trivial use case of 28x28 MNIST images.
    • use just 1 focus point? (problem with bounding, location)
    • use "5" fixed saccades (always visit same coordinates, so this'd be consistent for different images)

eye.py I don't know when I will have time to work on it further.

cool! Would you please make just an initial PR with the eye.py (or other necessarities) when you have time? I'd like to play with it in the next week and I'll try to adapt it to the current state of htm.core. I just ask for this so you author the file, so you'd get the (c) for the lines :) I'll then continue to make modifications to it.

Btw, reviewing that repo of yours, we're pretty much synced, aren't we? ae, CP, SDR...are more or less here 👍 . eye.py is the only missing piece. Or is there something else significant?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community encoder question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants