Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable pickle tests #821

Merged
merged 24 commits into from
May 5, 2021
Merged

Enable pickle tests #821

merged 24 commits into from
May 5, 2021

Conversation

breznak
Copy link
Member

@breznak breznak commented Jun 2, 2020

  • all mark.skip( tests fixed
  • only remaining sparse_link_test.py with all tests skipped. Remove?
  • fixed serialization for ScalarEncoder
  • needed to relax ScalarEncoder's tests for params

Fixes #160



// pickle
py_ScalarEnc.def(py::pickle(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added pickle for ScalarEncoder

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for implementing pickle and adding some unit tests!

bindings/py/tests/encoders/scalar_encoder_test.py Outdated Show resolved Hide resolved
src/htm/encoders/ScalarEncoder.cpp Outdated Show resolved Hide resolved
src/htm/encoders/ScalarEncoder.hpp Outdated Show resolved Hide resolved
so the param checks for mutually incompatible params are back in place.
Change serialization to store only compatible set of params.
@breznak
Copy link
Member Author

breznak commented Jun 2, 2020

For some reason, the precision of the deserialized params changes ?

[ RUN      ] ScalarEncoder.Serialization
/home/mmm/devel/HTM/htm-community/nupic.cpp/src/test/unit/encoders/ScalarEncoderTest.cpp:212: Failure
The difference between p1.radius and p2.radius is 3.3990147783241609e-05, which exceeds 1.0f / 100000, where
p1.radius evaluates to 0.13370000000000001,
p2.radius evaluates to 0.13373399014778325, and
1.0f / 100000 evaluates to 9.9999997473787516e-06.
/home/mmm/devel/HTM/htm-community/nupic.cpp/src/test/unit/encoders/ScalarEncoderTest.cpp:212: Failure
The difference between p1.radius and p2.radius is 3.3990147783241609e-05, which exceeds 1.0f / 100000, where
p1.radius evaluates to 0.13370000000000001,
p2.radius evaluates to 0.13373399014778325, and
1.0f / 100000 evaluates to 9.9999997473787516e-06.
/home/mmm/devel/HTM/htm-community/nupic.cpp/src/test/unit/encoders/ScalarEncoderTest.cpp:210: Failure
The difference between p1.resolution and p2.resolution is 0.00062156862745099684, which exceeds 1.0f / 100000, where
p1.resolution evaluates to 0.13370000000000001,
p2.resolution evaluates to 0.13307843137254902, and
1.0f / 100000 evaluates to 9.9999997473787516e-06.
/home/mmm/devel/HTM/htm-community/nupic.cpp/src/test/unit/encoders/ScalarEncoderTest.cpp:212: Failure
The difference between p1.radius and p2.radius is 0.021133333333334114, which exceeds 1.0f / 100000, where
p1.radius evaluates to 4.5458000000000007,
p2.radius evaluates to 4.5246666666666666, and
1.0f / 100000 evaluates to 9.9999997473787516e-06.
[  FAILED  ] ScalarEncoder.Serialization (0 ms)

@breznak
Copy link
Member Author

breznak commented Jun 4, 2020

only remaining sparse_link_test.py with all tests skipped. Remove?

@dkeeney what to do with the sparse linking, is that used? None of the tests are run in that file.

The difference between p1.radius and p2.radius is 3.3990147783241609e-05, which exceeds 1.0f / 100000, where
p1.radius evaluates to 0.13370000000000001,
p2.radius evaluates to 0.13373399014778325, and
1.0f / 100000 evaluates to 9.9999997473787516e-06.

I'm also starting to see random breaks in tests for floats, which seem unrelated to any of the new changes..?

@dkeeney
Copy link

dkeeney commented Jun 4, 2020

only remaining sparse_link_test.py with all tests skipped. Remove?

Hmmm, I did not know this test was there. This is a test of NetworkAPI from Python. This should work unless we broke the API without knowing it.

I will take a look at it. Lets make that a separate PR.

@breznak
Copy link
Member Author

breznak commented Jun 5, 2020

@dkeeney if you want, could you take this and/or #820 please?

@dkeeney
Copy link

dkeeney commented Jun 5, 2020

@dkeeney if you want, could you take this and/or #820 please?

Yes, I had already started looking at this one.

@dkeeney
Copy link

dkeeney commented Jun 6, 2020

I ran into a problem.

The sparse_link_test.py test was written before we introduced the SDR object. It tests the ability to pass both sparse and dense arrays. It used a setSparseOutput( ) function that no longer exists so that is why it failed.

My first impression was that we could just delete this file. However, in looking around I do not find any other tests or examples for NetworkAPI Region implementations in Python. So I thought I should modify this test to use the SDR object for the sparse cases.

It was then that I discovered a design issue. The SDR object is not easy to obtain in the Python Region implementation. Let me explain:

  • When the engine runs it calls compute( ) on each region.
  • PyBindRegion.cpp is the C++ stand-in for all Python generated regions. The compute( ) function on on this class maps the inputs and outputs into tables of Numpy arrays and calls guardedCompute( ) on PyRegion.py which is the base class for all py generated regions.
  • PyRegion.py calls the virtual function compute(self, inputs, outputs) which all Python implementations must implement.

The problem is that the SDR object is converted to a dense numpy array before being passed to Python. Therefore Python implemented regions do not have access to the SDR object itself.

So, the task now becomes how to provide access to the SDR object for the Python implemented region without breaking the API.

Open to suggestions.

@breznak
Copy link
Member Author

breznak commented Jun 7, 2020

So I thought I should modify this test to use the SDR object for the sparse cases

that's a good idea, atleast as an example of PyRegion

It was then that I discovered a design issue. The SDR object is not easy to obtain in the Python Region implementation.

if we wanted just to get the sdr, we can easily create it*). But I guess you're more after the deeper functionality.

*minimal sparse-test:

SDR sdr(100)
sdr.dense = <what py region returned>
print(sdr.sparse)
PyRegion.py calls the virtual function compute(self, inputs, outputs) which all Python implementations must implement.

The problem is that the SDR object is converted to a dense numpy array before being passed to Python. Therefore Python implemented regions do not have access to the SDR object itself.

I assume these arrays in compute() are dense, not sparse? Is this the original NAPI design, or the newly added functionality by us?
If the older, we have to keep dense arrays, otherwise I'd consider breaking the API and mandating sparse (numpy) arrays there.

So, the task now becomes how to provide access to the SDR object for the Python implemented region without breaking the API

If we're only about, say, syntactic sugar, we can add PyRegion.compute(self, sdr, sdr) which would internally convert and call the dense arrays. This would allow people to use SDRs from python, which I think is the recommended way as SDR is really easy to work with in Py.

If we wanted "proper" sdr/sparse implementation, we need to provide also internal sparse versions of compute(). Or break the API. Since we don't have any reports of "slow NAPI usage" or validated speed impact of dense/sparse compute()s, I'd stick just with the simpler "add sugar for compute(sdr,sdr)" method.

What do you think?

@dkeeney
Copy link

dkeeney commented Jun 7, 2020

Before the invention of the SDR the compute( ) just passed a dictionary of numpy arrays, indexed by input or output name. It was up to the app to know if the array contained sparse or dense data. These numpy arrays are mapped to existing fixed data buffers that are created when the Region is created.

Just changing the API to pass SDR's for all buffers will not work in the general case because the underlining data may not be an SDR (for example the input to an encoder).

In C++ Region Implementations we avoided this problem because the call is just compute( ), with no arguments. The constructor contains a pointer to the Region object from which you obtained a reference to your inputs and outputs. The Python implementations do not have this pointer.

I am thinking we have two options that would not break the API:

  1. Pass the Region pointer to python in the region impl constructor. I think there is a way to tell Python to not try to manage the pointer (never delete it). We cannot use a Shared pointer here because it would result in a circular reference situation. The Python region impl can then decide how to obtain its buffers. Either use the arrays provided or ignore them and use the new pointer to obtain its buffers.

  2. Alternatively, we could pass numpy arrays in the compute(self, inputs, outputs ) if the spec says it should be an array and pass SDR objects when the spec says it should be an SDR. The region impl would need to just know which to expect. Since all pre-SDR code would not contain SDR in the spec we would not break any old code.

I am thinking option 2 is the better of the two choices. It avoids doing work that might not be used (converting the SDR to a dense np array). And this may be the easiest to implement.

I am surprised that nobody else ran into this problem before. Or...maybe nobody is using NetworkAPI in Python.

@breznak
Copy link
Member Author

breznak commented Jun 7, 2020

Alternatively, we could pass numpy arrays in the compute(self, inputs, outputs ) if the spec says it should be an array and pass SDR objects when the spec says it should be an SDR.

I also like this option better, as it's straightforward what the code intends to do (input is sdr/nparray), rather than the situation with (circular) pointers.

I am surprised that nobody else ran into this problem before. Or...maybe nobody is using NetworkAPI in Python.

might be good to do some survey what people use (if at all?)

@Zbysekz Zbysekz closed this Jun 26, 2020
@Zbysekz Zbysekz deleted the enable_tests branch June 26, 2020 06:57
@breznak breznak restored the enable_tests branch June 26, 2020 07:07
@breznak breznak reopened this Jun 26, 2020
Copy link

@dkeeney dkeeney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we still use the eigen library?

@breznak
Copy link
Member Author

breznak commented Jul 9, 2020

Don't we still use the eigen library?

we do. and There's no actual change to that line (likely GUI's mistake displaying changes, I resolved this via the web gui.) Thanks for spotting, though.

as the new version 1.10.0 causes build problems - libgtest.a is being
built in a different path, and I don't want to deal with that in this
PR.
float,double lose precision when stored to string format (JSON,...);
does not happen with binary, that's why we haven't reproduced yet.

Here I try setting the precision for both file & str methods, but
RapidJSON fails.
@breznak
Copy link
Member Author

breznak commented Sep 3, 2020

The new tests (serialization to string/JSON, not binary) uncovered a new bug in precision storing of floating types.
I found related USCiLab/cereal#202

I attempted a fix, but getting some RapidJSON bug.

unknown file: Failure
C++ exception with description "rapidjson internal assertion failure: type == kStringType" thrown in the test body.

CC @dkeeney @Zbysekz #874

break;
}
//avoid rounding errs in de/serialization
out.precision(std::numeric_limits<double>::digits10);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the out.precision() from saveToFile() here to save(), where it should be applicable for both file & string methods.
...but, it causes a problem

C++ exception with description "rapidjson internal assertion failure: type == kStringType" thrown in the test body.

@dkeeney
Copy link

dkeeney commented Sep 3, 2020

I have hunted for this bug before without success. I was going to look again when I complete my current project.
The error message is caused by RapidJSON getting the key/data calls out-of-sequence. Cereal generates those sequences for std:: objects that it is serializing. What I don't know is if RapidJKSON has a bug, Cereal has a bug or are we giving Cereal something incorrectly.

@breznak
Copy link
Member Author

breznak commented Sep 3, 2020

The error message is caused by RapidJSON getting the key/data calls out-of-sequence.

we might as a fallback just disable JSON serialization, and only keep binary (+xml)?

What I don't know is if RapidJKSON has a bug, Cereal has a bug or are we giving Cereal something incorrectly.

I'm starting to feel worried Cereal development is losing traction 😞

@dkeeney
Copy link

dkeeney commented Sep 3, 2020

we might as a fallback just disable JSON serialization, and only keep binary (+xml)?

The problem is we use this to create JSON for some functions (other than save/load), so we cannot turn it off.

@breznak
Copy link
Member Author

breznak commented Sep 3, 2020

The problem is we use this to create JSON for some functions

so try using another JSON library, for those use-cases? Ideally if Cereal could use different backends, but I guess we don't have that luxury

Copy link
Collaborator

@ctrl-z-9000-times ctrl-z-9000-times left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got most of this PR to work, and I think we should merge what currently works.

The outstanding issues which are not fixed:

  • Cleanup in src/htm/types/Serializable.hpp
  • Still disabled testcase: sparse_link_test.py

@dkeeney
Copy link

dkeeney commented May 5, 2021

Cool.
I am working on some of the other modules so that they all have saveToFile( file, fmt) and loadFromFile(file, fmt) accessible from Python. We should be able to merge when I get these finished.

@breznak
Copy link
Member Author

breznak commented May 5, 2021

Thanks @ctrl-z-9000-times for fixing the problem with precision in ScalarEnc, #821 (comment)

We should be able to merge when I get these finished.

perfect, @dkeeney 👍 I'm good to merge this, so please merge whenever suits for the work you have WIP.

@dkeeney dkeeney merged commit 5a25f9b into master May 5, 2021
@dkeeney dkeeney deleted the enable_tests branch May 5, 2021 19:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Finish pickle serialization for Pybind modules
4 participants