Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

problem specifying -b 10g for throuput tests #717

Closed
ajragusa opened this issue Nov 26, 2018 · 7 comments
Closed

problem specifying -b 10g for throuput tests #717

ajragusa opened this issue Nov 26, 2018 · 7 comments

Comments

@ajragusa
Copy link

ajragusa commented Nov 26, 2018

We have a problem where we can specify -b 1g but not -b 10g on the pscheduler CLI

# pscheduler task throughput -d REMOTE_HOST -b 10g
Submitting task...
Unable to post task: Rewritten test specification is invalid: {u'dest': u'REMOTE_HOST', u'bandwidth': 10000000000.0, u'schema': 1} is not valid under any of the given schemas
The 'pscheduler troubleshoot' command may be of use in problem
diagnosis. 'pscheduler troubleshoot --help' for more information.

I talked to Mark and he said it was because something was making this a float instead of an int

@mfeit-internet2
Copy link
Member

mfeit-internet2 commented Nov 26, 2018

My suspicion is that the rewriter, which passes the JSON through jq via PyJQ is exercising a bug in PyJQ. Will investigate further and try to have a fix for 4.1.5.

@mfeit-internet2
Copy link
Member

TL;DR: The problem isn't in the pScheduler code. It appears to be in PyJQ.

Some notes on what I found trying to track this down:

jq

Jq can handle large numbers into the low petas before it punts and switches to scientific notation:

% echo '{}' | jq '.x = 9999999999999998 | .st = (.x|tostring)'
{
  "x": 9999999999999998,
  "st": "9999999999999998"
}

% echo '{}' | jq '.x = 9999999999999999 | .st = (.x|tostring)'
{
  "x": 1e+16,
  "st": "1e+16"
}

Note in the second example that it hits 10P because of rounding problems. If bits per second is our poster boy for large numbers, there are enough orders of magnitude to spare that we should be good for many more years.

Python

Python reports that its floats have a 15-digit base plus a 53-digit mantissa and an epsilon of 2.220446049250313e-16, which says IEEE 754 double-precision. No surprise there.

PyJQ

Pyjq 2.1.0 and 2.2.0 both top out at 2,147,483,647, or the maximum value for a signed, 32-bit integer:

import pyjq

for script in [
        ".x = 2147483647",
        ".x = 2147483648" 
]:
    filt = pyjq.compile(script)
    print script, "->", filt.all({})
% ~/tmp/pyt
.x = 2147483647 -> [OrderedDict([(u'x', 2147483647)])]
.x = 2147483648 -> [OrderedDict([(u'x', 2147483648.0)])]

My current suspicion is that the type conversion code in PyJQ is reading into a shorter integer than it should be. That's being investigated.

@mfeit-internet2
Copy link
Member

Looks like it's definitely PyJQ or the code Cython is generating. Submitted a bug to PyJQ's maintainer: doloopwhile/pyjq#24.

mfeit-internet2 added a commit that referenced this issue Dec 6, 2018
Temporary workaround until the root cause of #717 is resolved.
@mfeit-internet2
Copy link
Member

Installed a workaround if the JQFilter class that scans the returned data for items that are instances of float greater than or equal to 2147483648.0 and having no fractional part and replaces them with integers of the same value.

Will kick this forward to 4.2.0 for the fix to PyJQ; if it comes earlier than that, I'll pull it back to a bugfix release.

@mfeit-internet2
Copy link
Member

Release Notes for 4.1.5:

Corrected a jq scripting problem where certain large integers were being rewritten as floats.

mfeit-internet2 added a commit that referenced this issue Jan 18, 2019
* Added classifiers_has to rewriter for consistency.  (Semi-related to #479)

* Removed some unused code

* Make clear when the validation file is invalid.  (Found during #479)

* Priorities, part 1:  CLI/API to database.  #479

* Update service internal for Debian.  #505

* Temporary workaround until the root cause of #717 is resolved.

* Bumping version

* Fixing issue found on tesbed where pscheduler-server RPM won't install due to scriptlet error. Problem was call to 'pscheduler internal service stop' before it was installed. Bumping relnumn but leaving version the same since this is just a RPM  spec change

* Handle non-offers of tools correctly when complaining.  #748

* Properly generate and validate the task.  #727

* Merged stashed changed prior to merge with 4.2.0 code base.  #479

* Fixed mistakes from merge.  (Code still untested.)  #479

* Fixed monitor.  #479

* Fixed debug flag so all processes can see it.  #753

* Handle initial debus state better; added GET for state.  #753

* Added missing bump of table version.  #479

* Take runs through cleanup state and set status properly.  #479

* More development, still incomplete.  #479

* More development, probably close to done.  #479

* Remvoed duplicate task posting.  #479

* Fixed finished/cleanup behavior on non-lead participants.  #479

* Pass priority along to non-lead participants.  #479

* Better handling of unspecified priorities.  #479

* Cosmetic fixes.  #479

* More error checking and null handling.  #479

* Pass debug through to result; print result in right place; avoid unneeded fetches.  #479

* Always pull task.  #479

* Return time from preempted runs to the timeline.  #479
@mfeit-internet2
Copy link
Member

mfeit-internet2 commented Mar 14, 2019

This turns out to be a two-part problem.


jq

The first part is jq's jv_is_integer() function, which uses the following to determine whether or not the value is an integer:

if(x != x || x > INT_MAX || x < INT_MIN){
  return 0;
}
return x == (int)x;

This works if the context is that anything that won't fit into an int becomes a double.
It isn't clear that this code should or shouldn't care about the size in light of the fact that there are larger types such as long.

PyJQ doesn't care, so I wrote a new function called jv_is_integer_large() that pulls the fractional part and checks to see if it's within the system's epsilon for that type:

double ipart;
double fpart = modf(x, &ipart);
return fabs(fpart) < DBL_EPSILON;

Adding the function was chosen over modifying the behavior of the existing one, which may have dependencies in other code.


PyJQ

As part of the JV-to-Python converter, PyJQ turns the value into an int if the jq library says it is one, otherwise it returns the double that it got from jv_number_value():

cdef object jv_to_pyobj(jv jval):
    kind = jv_get_kind(jval)
    ...
    elif kind == JV_KIND_NUMBER:
        v = jv_number_value(jval)
        if jv_is_integer(jval):
            return int(v)
        return v
    ...

This doesn't take into account the fact that Python 2 has two integer types, the 32-bit int and the arbitrarily-large long. The latter was folded into Python 3 as its int type. This modification to the function handles it correctly when jv_is_integer() is patched as described above:

cdef object jv_to_pyobj(jv jval):
    kind = jv_get_kind(jval)
    ...
    elif kind == JV_KIND_NUMBER:
        v = jv_number_value(jval)
        if jv_is_integer_large(jval):
            if (sys.version_info > (3, 0)):
                # Python 3 - All ints are arbitrarily-large
                return int(v)
            else:
                # Python 2 - Has arbitrarily-large longs and 32-bit ints
                return long(v) if v > sys.maxsize else int(v)
        return v
    ...

Pull requests for both projects are being submitted.

Meanwhile, these changes will be folded into the jq 1.6 upgrade in #785.

@mfeit-internet2
Copy link
Member

I'm going to call this closed since pulling the changes from #785 will fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants