Enable pickle tests #821

breznak · 2020-06-02T16:55:33Z

all mark.skip( tests fixed
only remaining sparse_link_test.py with all tests skipped. Remove?
fixed serialization for ScalarEncoder
needed to relax ScalarEncoder's tests for params

Fixes #160

which was skipped, but passes OK now

and test

params checks needed to be relaxed.

breznak · 2020-06-02T16:56:01Z

bindings/py/cpp_src/bindings/encoders/py_ScalarEncoder.cpp

+
+
+  // pickle
+  py_ScalarEnc.def(py::pickle( 


added pickle for ScalarEncoder

Thanks for implementing pickle and adding some unit tests!

bindings/py/tests/encoders/scalar_encoder_test.py

src/htm/encoders/ScalarEncoder.cpp

src/htm/encoders/ScalarEncoder.hpp

so the param checks for mutually incompatible params are back in place. Change serialization to store only compatible set of params.

breznak · 2020-06-02T18:52:34Z

For some reason, the precision of the deserialized params changes ?

[ RUN      ] ScalarEncoder.Serialization
/home/mmm/devel/HTM/htm-community/nupic.cpp/src/test/unit/encoders/ScalarEncoderTest.cpp:212: Failure
The difference between p1.radius and p2.radius is 3.3990147783241609e-05, which exceeds 1.0f / 100000, where
p1.radius evaluates to 0.13370000000000001,
p2.radius evaluates to 0.13373399014778325, and
1.0f / 100000 evaluates to 9.9999997473787516e-06.
/home/mmm/devel/HTM/htm-community/nupic.cpp/src/test/unit/encoders/ScalarEncoderTest.cpp:212: Failure
The difference between p1.radius and p2.radius is 3.3990147783241609e-05, which exceeds 1.0f / 100000, where
p1.radius evaluates to 0.13370000000000001,
p2.radius evaluates to 0.13373399014778325, and
1.0f / 100000 evaluates to 9.9999997473787516e-06.
/home/mmm/devel/HTM/htm-community/nupic.cpp/src/test/unit/encoders/ScalarEncoderTest.cpp:210: Failure
The difference between p1.resolution and p2.resolution is 0.00062156862745099684, which exceeds 1.0f / 100000, where
p1.resolution evaluates to 0.13370000000000001,
p2.resolution evaluates to 0.13307843137254902, and
1.0f / 100000 evaluates to 9.9999997473787516e-06.
/home/mmm/devel/HTM/htm-community/nupic.cpp/src/test/unit/encoders/ScalarEncoderTest.cpp:212: Failure
The difference between p1.radius and p2.radius is 0.021133333333334114, which exceeds 1.0f / 100000, where
p1.radius evaluates to 4.5458000000000007,
p2.radius evaluates to 4.5246666666666666, and
1.0f / 100000 evaluates to 9.9999997473787516e-06.
[  FAILED  ] ScalarEncoder.Serialization (0 ms)

breznak · 2020-06-04T06:40:45Z

only remaining sparse_link_test.py with all tests skipped. Remove?

@dkeeney what to do with the sparse linking, is that used? None of the tests are run in that file.

The difference between p1.radius and p2.radius is 3.3990147783241609e-05, which exceeds 1.0f / 100000, where
p1.radius evaluates to 0.13370000000000001,
p2.radius evaluates to 0.13373399014778325, and
1.0f / 100000 evaluates to 9.9999997473787516e-06.

I'm also starting to see random breaks in tests for floats, which seem unrelated to any of the new changes..?

dkeeney · 2020-06-04T13:04:03Z

only remaining sparse_link_test.py with all tests skipped. Remove?

Hmmm, I did not know this test was there. This is a test of NetworkAPI from Python. This should work unless we broke the API without knowing it.

I will take a look at it. Lets make that a separate PR.

breznak · 2020-06-05T15:59:49Z

@dkeeney if you want, could you take this and/or #820 please?

dkeeney · 2020-06-05T16:08:47Z

@dkeeney if you want, could you take this and/or #820 please?

Yes, I had already started looking at this one.

dkeeney · 2020-06-06T15:40:30Z

I ran into a problem.

The sparse_link_test.py test was written before we introduced the SDR object. It tests the ability to pass both sparse and dense arrays. It used a setSparseOutput( ) function that no longer exists so that is why it failed.

My first impression was that we could just delete this file. However, in looking around I do not find any other tests or examples for NetworkAPI Region implementations in Python. So I thought I should modify this test to use the SDR object for the sparse cases.

It was then that I discovered a design issue. The SDR object is not easy to obtain in the Python Region implementation. Let me explain:

When the engine runs it calls compute( ) on each region.
PyBindRegion.cpp is the C++ stand-in for all Python generated regions. The compute( ) function on on this class maps the inputs and outputs into tables of Numpy arrays and calls guardedCompute( ) on PyRegion.py which is the base class for all py generated regions.
PyRegion.py calls the virtual function compute(self, inputs, outputs) which all Python implementations must implement.

The problem is that the SDR object is converted to a dense numpy array before being passed to Python. Therefore Python implemented regions do not have access to the SDR object itself.

So, the task now becomes how to provide access to the SDR object for the Python implemented region without breaking the API.

Open to suggestions.

breznak · 2020-06-07T08:58:04Z

So I thought I should modify this test to use the SDR object for the sparse cases

that's a good idea, atleast as an example of PyRegion

It was then that I discovered a design issue. The SDR object is not easy to obtain in the Python Region implementation.

if we wanted just to get the sdr, we can easily create it*). But I guess you're more after the deeper functionality.

*minimal sparse-test:

SDR sdr(100)
sdr.dense = <what py region returned>
print(sdr.sparse)

PyRegion.py calls the virtual function compute(self, inputs, outputs) which all Python implementations must implement.

The problem is that the SDR object is converted to a dense numpy array before being passed to Python. Therefore Python implemented regions do not have access to the SDR object itself.

I assume these arrays in compute() are dense, not sparse? Is this the original NAPI design, or the newly added functionality by us?
If the older, we have to keep dense arrays, otherwise I'd consider breaking the API and mandating sparse (numpy) arrays there.

So, the task now becomes how to provide access to the SDR object for the Python implemented region without breaking the API

If we're only about, say, syntactic sugar, we can add PyRegion.compute(self, sdr, sdr) which would internally convert and call the dense arrays. This would allow people to use SDRs from python, which I think is the recommended way as SDR is really easy to work with in Py.

If we wanted "proper" sdr/sparse implementation, we need to provide also internal sparse versions of compute(). Or break the API. Since we don't have any reports of "slow NAPI usage" or validated speed impact of dense/sparse compute()s, I'd stick just with the simpler "add sugar for compute(sdr,sdr)" method.

What do you think?

dkeeney · 2020-06-07T11:47:01Z

Before the invention of the SDR the compute( ) just passed a dictionary of numpy arrays, indexed by input or output name. It was up to the app to know if the array contained sparse or dense data. These numpy arrays are mapped to existing fixed data buffers that are created when the Region is created.

Just changing the API to pass SDR's for all buffers will not work in the general case because the underlining data may not be an SDR (for example the input to an encoder).

In C++ Region Implementations we avoided this problem because the call is just compute( ), with no arguments. The constructor contains a pointer to the Region object from which you obtained a reference to your inputs and outputs. The Python implementations do not have this pointer.

I am thinking we have two options that would not break the API:

Pass the Region pointer to python in the region impl constructor. I think there is a way to tell Python to not try to manage the pointer (never delete it). We cannot use a Shared pointer here because it would result in a circular reference situation. The Python region impl can then decide how to obtain its buffers. Either use the arrays provided or ignore them and use the new pointer to obtain its buffers.
Alternatively, we could pass numpy arrays in the compute(self, inputs, outputs ) if the spec says it should be an array and pass SDR objects when the spec says it should be an SDR. The region impl would need to just know which to expect. Since all pre-SDR code would not contain SDR in the spec we would not break any old code.

I am thinking option 2 is the better of the two choices. It avoids doing work that might not be used (converting the SDR to a dense np array). And this may be the easiest to implement.

I am surprised that nobody else ran into this problem before. Or...maybe nobody is using NetworkAPI in Python.

breznak · 2020-06-07T12:05:39Z

Alternatively, we could pass numpy arrays in the compute(self, inputs, outputs ) if the spec says it should be an array and pass SDR objects when the spec says it should be an SDR.

I also like this option better, as it's straightforward what the code intends to do (input is sdr/nparray), rather than the situation with (circular) pointers.

I am surprised that nobody else ran into this problem before. Or...maybe nobody is using NetworkAPI in Python.

might be good to do some survey what people use (if at all?)

dkeeney

Don't we still use the eigen library?

breznak · 2020-07-09T17:21:08Z

Don't we still use the eigen library?

we do. and There's no actual change to that line (likely GUI's mistake displaying changes, I resolved this via the web gui.) Thanks for spotting, though.

as the new version 1.10.0 causes build problems - libgtest.a is being built in a different path, and I don't want to deal with that in this PR.

float,double lose precision when stored to string format (JSON,...); does not happen with binary, that's why we haven't reproduced yet. Here I try setting the precision for both file & str methods, but RapidJSON fails.

breznak · 2020-09-03T12:15:17Z

The new tests (serialization to string/JSON, not binary) uncovered a new bug in precision storing of floating types.
I found related USCiLab/cereal#202

I attempted a fix, but getting some RapidJSON bug.

unknown file: Failure
C++ exception with description "rapidjson internal assertion failure: type == kStringType" thrown in the test body.

CC @dkeeney @Zbysekz #874

breznak · 2020-09-03T12:18:28Z

src/htm/types/Serializable.hpp

-      break;
-		}
+    //avoid rounding errs in de/serialization
+    out.precision(std::numeric_limits<double>::digits10);


I moved the out.precision() from saveToFile() here to save(), where it should be applicable for both file & string methods.
...but, it causes a problem

C++ exception with description "rapidjson internal assertion failure: type == kStringType" thrown in the test body.

dkeeney · 2020-09-03T12:39:34Z

I have hunted for this bug before without success. I was going to look again when I complete my current project.
The error message is caused by RapidJSON getting the key/data calls out-of-sequence. Cereal generates those sequences for std:: objects that it is serializing. What I don't know is if RapidJKSON has a bug, Cereal has a bug or are we giving Cereal something incorrectly.

breznak · 2020-09-03T13:09:07Z

The error message is caused by RapidJSON getting the key/data calls out-of-sequence.

we might as a fallback just disable JSON serialization, and only keep binary (+xml)?

What I don't know is if RapidJKSON has a bug, Cereal has a bug or are we giving Cereal something incorrectly.

I'm starting to feel worried Cereal development is losing traction 😞

dkeeney · 2020-09-03T13:14:58Z

we might as a fallback just disable JSON serialization, and only keep binary (+xml)?

The problem is we use this to create JSON for some functions (other than save/load), so we cannot turn it off.

breznak · 2020-09-03T13:20:14Z

The problem is we use this to create JSON for some functions

so try using another JSON library, for those use-cases? Ideally if Cereal could use different backends, but I guess we don't have that luxury

ctrl-z-9000-times

I got most of this PR to work, and I think we should merge what currently works.

The outstanding issues which are not fixed:

Cleanup in src/htm/types/Serializable.hpp
Still disabled testcase: sparse_link_test.py

dkeeney · 2021-05-05T18:42:54Z

Cool.
I am working on some of the other modules so that they all have saveToFile( file, fmt) and loadFromFile(file, fmt) accessible from Python. We should be able to merge when I get these finished.

breznak · 2021-05-05T18:48:22Z

Thanks @ctrl-z-9000-times for fixing the problem with precision in ScalarEnc, #821 (comment)

We should be able to merge when I get these finished.

perfect, @dkeeney 👍 I'm good to merge this, so please merge whenever suits for the work you have WIP.

breznak added 4 commits June 2, 2020 17:06

test: enable py TM test

fd4ad3b

which was skipped, but passes OK now

ScalarEncoder: py add pickle serialization

ec3c3c0

and test

SDR: enable py test pickle

fbccfa4

ScalarEncoder: fix de/serialization.

762ab12

params checks needed to be relaxed.

breznak added encoder Python Binding serializable labels Jun 2, 2020

breznak requested a review from dkeeney June 2, 2020 16:55

breznak self-assigned this Jun 2, 2020

breznak commented Jun 2, 2020

View reviewed changes

ScalarEncoder: revert param checks

891693e

so the param checks for mutually incompatible params are back in place. Change serialization to store only compatible set of params.

breznak and others added 4 commits June 3, 2020 21:27

Merge branch 'master' into enable_tests

fe076fc

Merge branch 'master_community' into enable_tests

9c77607

Merge remote-tracking branch 'community/enable_tests' into enable_tests

1b41667

bump gtest to latest release 1.10.0

b0c63e6

Merge branch 'master' into enable_tests

ca06206

Zbysekz closed this Jun 26, 2020

Zbysekz deleted the enable_tests branch June 26, 2020 06:57

breznak restored the enable_tests branch June 26, 2020 07:07

breznak reopened this Jun 26, 2020

Merge branch 'master' into enable_tests

6e6fa1c

dkeeney reviewed Jul 9, 2020

View reviewed changes

breznak added 2 commits September 2, 2020 23:10

Merge branch 'master_community' into enable_tests

27752d6

Merge remote-tracking branch 'community/enable_tests' into enable_tests

047341d

breznak closed this Sep 2, 2020

breznak reopened this Sep 2, 2020

breznak mentioned this pull request Sep 3, 2020

Save and load SP,TM is not the same #874

Closed

breznak added 2 commits September 3, 2020 11:47

revert gtest to version 1.8.1

a577384

as the new version 1.10.0 causes build problems - libgtest.a is being built in a different path, and I don't want to deal with that in this PR.

WIP Serialization bug precesion float to string

fbe6ce5

float,double lose precision when stored to string format (JSON,...); does not happen with binary, that's why we haven't reproduced yet. Here I try setting the precision for both file & str methods, but RapidJSON fails.

breznak commented Sep 3, 2020

View reviewed changes

breznak and others added 9 commits September 8, 2020 10:37

Merge branch 'master_community' into enable_tests

9b78a70

Merge branch 'master' into enable_tests

c3e0fbf

Merge branch 'master_community' into enable_tests

fb2d27a

Merge branch 'master_community' into enable_tests

f776bfe

gtest: switch to official google/gtest repo

daf28b2

Merge remote-tracking branch 'community/enable_tests' into enable_tests

46231e9

Merge branch 'master' into enable_tests

8310c0c

Revert changes to src/htm/types/Serializable.hpp

ed4ac1e

Fix ScalarEncoder serialization

8be24fb

ctrl-z-9000-times approved these changes May 5, 2021

View reviewed changes

dkeeney approved these changes May 5, 2021

View reviewed changes

dkeeney merged commit 5a25f9b into master May 5, 2021

dkeeney deleted the enable_tests branch May 5, 2021 19:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable pickle tests #821

Enable pickle tests #821

breznak commented Jun 2, 2020

breznak Jun 2, 2020

ctrl-z-9000-times May 5, 2021

breznak commented Jun 2, 2020

breznak commented Jun 4, 2020

dkeeney commented Jun 4, 2020

breznak commented Jun 5, 2020

dkeeney commented Jun 5, 2020

dkeeney commented Jun 6, 2020

breznak commented Jun 7, 2020

dkeeney commented Jun 7, 2020

breznak commented Jun 7, 2020

dkeeney left a comment

breznak commented Jul 9, 2020

breznak commented Sep 3, 2020

breznak Sep 3, 2020

dkeeney commented Sep 3, 2020

breznak commented Sep 3, 2020

dkeeney commented Sep 3, 2020

breznak commented Sep 3, 2020

ctrl-z-9000-times left a comment •

edited

Loading

dkeeney commented May 5, 2021

breznak commented May 5, 2021

Enable pickle tests #821

Enable pickle tests #821

Conversation

breznak commented Jun 2, 2020

breznak Jun 2, 2020

Choose a reason for hiding this comment

ctrl-z-9000-times May 5, 2021

Choose a reason for hiding this comment

breznak commented Jun 2, 2020

breznak commented Jun 4, 2020

dkeeney commented Jun 4, 2020

breznak commented Jun 5, 2020

dkeeney commented Jun 5, 2020

dkeeney commented Jun 6, 2020

breznak commented Jun 7, 2020

dkeeney commented Jun 7, 2020

breznak commented Jun 7, 2020

dkeeney left a comment

Choose a reason for hiding this comment

breznak commented Jul 9, 2020

breznak commented Sep 3, 2020

breznak Sep 3, 2020

Choose a reason for hiding this comment

dkeeney commented Sep 3, 2020

breznak commented Sep 3, 2020

dkeeney commented Sep 3, 2020

breznak commented Sep 3, 2020

ctrl-z-9000-times left a comment • edited Loading

Choose a reason for hiding this comment

dkeeney commented May 5, 2021

breznak commented May 5, 2021

ctrl-z-9000-times left a comment •

edited

Loading