Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kernel: refactor interpreter state handling #3823

Merged
merged 2 commits into from
Jan 10, 2020

Conversation

fingolfin
Copy link
Member

@fingolfin fingolfin commented Jan 2, 2020

This continues refactoring work begun in PRs #3805 and #3819.

Moving the interpreter state into a dedicated struct which is allocated on the C stack has multiple advantages; the latter commits in this PR exploit several of them:

  • we can get rid of ExecBegin/ExecEnd
  • we can simplify IntrBegin/IntrEnd
  • we can reduce the number of places calling IntrBegin/IntrEnd from 5 down to 2, both in read.c and in places where they clearly belong
  • I suspect we did not actually save/restore the interpreter state quite completely/correctly in a few places; these are now taken care of completely automatically by virtue of the interpreter state being on the stack
  • ...

UPDATE: I've removed all but one commit from this PR, to keep it simple. I'll add the other back in a separate PR (or more) once this PR here has been merged.

@fingolfin fingolfin added do not merge PRs which are not yet ready to be merged (e.g. submitted for discussion, or test results) topic: kernel release notes: not needed PRs introducing changes that are wholly irrelevant to the release notes labels Jan 2, 2020
@fingolfin fingolfin force-pushed the mh/InterpreterState branch from 462e961 to 8ab6b11 Compare January 2, 2020 09:10
@coveralls
Copy link

coveralls commented Jan 2, 2020

Coverage Status

Coverage increased (+0.007%) to 84.734% when pulling 8211b18 on fingolfin:mh/InterpreterState into acb61ab on gap-system:master.

@fingolfin fingolfin force-pushed the mh/InterpreterState branch from 8ab6b11 to 7788e50 Compare January 2, 2020 15:20
@fingolfin fingolfin changed the title WIP kernel: refactor interpreter state handling kernel: refactor interpreter state handling Jan 2, 2020
@fingolfin fingolfin force-pushed the mh/InterpreterState branch from 08ef101 to 441fa08 Compare January 6, 2020 22:37
@fingolfin fingolfin force-pushed the mh/InterpreterState branch from 8b4c2fe to 22af33c Compare January 7, 2020 22:32
src/gap.h Outdated Show resolved Hide resolved
@fingolfin fingolfin force-pushed the mh/InterpreterState branch 3 times, most recently from fa28f2d to f0f3168 Compare January 9, 2020 06:15
@fingolfin fingolfin removed the do not merge PRs which are not yet ready to be merged (e.g. submitted for discussion, or test results) label Jan 9, 2020
@fingolfin fingolfin force-pushed the mh/InterpreterState branch 2 times, most recently from ece4bd3 to a1dd31e Compare January 9, 2020 06:26
Copy link
Member Author

@fingolfin fingolfin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some comments, will add more later.

UInt IntrReturning;
UInt IntrCoding;
Obj IntrState;
Obj StackObj;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These variables are removed from the global GAPState and put into a stack allocated IntrState.

WEll, all except STATE(IntrState) which itself became superfluous.

@@ -432,14 +432,15 @@ static GVarDescriptor GVarTHREAD_EXIT;

static void ThreadedInterpreter(void * funcargs)
{
IntrState intr = { 0, 0, 0, 0 };
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allocate IntrState on the stack and use it below.

struct SavedReaderState s;
IntrState intr = { 0, 0, 0, 0 };
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allocate IntrState on the stack and use it below.

struct SavedReaderState s;
IntrState intr = { 0, 0, 0, 0 };
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allocate IntrState on the stack and use it below.

@@ -39,6 +39,8 @@ struct ReaderState {

ScannerState s;

IntrState intr;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add an instance of IntrState into ReaderState; it'll thus be stack-allocated as part of that in ReadEvalCommand below

@@ -2729,35 +2730,23 @@ void ReadEvalError(void)
struct SavedReaderState {
UInt userHasQuit;
syJmp_buf readJmpError;
UInt intrCoding;
UInt intrIgnoring;
UInt intrReturning;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to save/restore these here anymore, as the interpreter state is on the stack -- yay!

** directly interpret, such as loops or function bodies.
*/
/* TL: UInt IntrCoding; */

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above comments are all already in the header file, so they can be removed here.

// are both false.
//
// IgnoreLevel gives the highest value of IntrIgnoring which means this
// statement is NOT ignored (this is usually, but not always, 0)
static void INTERPRETER_PROFILE_HOOK(int ignoreLevel)
static void INTERPRETER_PROFILE_HOOK(IntrState * intr, int ignoreLevel)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The majority of the changes below either add an IntrState * intr argument to functions; or pass intr on to other functions that need them; or replace STATE(IntrFoo) by intr->foo.

From here on, a long time nothing else happens... I think for the rest of the whole file intrprtr.c


/* must be in immediate (non-ignoring, non-coding) mode */
assert( STATE(IntrIgnoring) == 0 );
assert( STATE(IntrCoding) == 0 );
GAP_ASSERT(intr->ignoring == 0);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah, I changed a few assert to GAP_ASSERT since I was touching those lines anyway

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a general question: Why is GAP_ASSERT preferable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's out own macro, under our own full control; it does what we want, not the compiler or OS vendor. For now it is an alias for assert, but only if kernel debug mode was activated by configure

/* remember old interpreter state */
if (!STATE(IntrState))
STATE(IntrState) = NEW_PLIST(T_PLIST, 16);
PushPlist(STATE(IntrState), STATE(StackObj));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Real change: we don't need to push StackObj onto a stack here and restore it later, as the interpreter state contains StackObj and is itself allocated on the C calling stack.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that we did not save IntrIgnoring, IntrCoding, IntrReturning here, but relied on the caller to do so.

@fingolfin fingolfin force-pushed the mh/InterpreterState branch from a1dd31e to eec98fe Compare January 9, 2020 08:27
Copy link
Member Author

@fingolfin fingolfin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope to have explained all changes in this PR via comments now in one way or another.

*/
/* TL: Obj IntrState; */
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As explained above, we don't need STATE(IntrState) anymore, so this comment on its predecessor (it used to be a simple global variable) can be removed, too.


/* dummy result value (probably ignored) */
if (result)
*result = 0;
}

// switch back to the old state
STATE(StackObj) = PopPlist(STATE(IntrState));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As explained above, we don't need STATE(IntrState) anymore.

/* remember old interpreter state */
if (!STATE(IntrState))
STATE(IntrState) = NEW_PLIST(T_PLIST, 16);
PushPlist(STATE(IntrState), STATE(StackObj));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that we did not save IntrIgnoring, IntrCoding, IntrReturning here, but relied on the caller to do so.

if (intr->coding > 0) {
CodeFuncCallBegin();
return;
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I accidentally clang-formatted this if. But that makes me wonder: since I am touching all these if (IntrCoding) etc. checks anyway, is there any harm in running git clang-format on them? Doesn't make the git blame experience worse here, does it? Just reviewing would be slightly more annoying, but that could be alleviated by putting those reformatting changes into a second commit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I misunderstand what you mean by

reviewing would be slightly more annoying

but I think its much easier to read that way, so I don't see why changing it in this PR would be wrong.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add an extra commit which does the formatting, just so it will be easy to spot if looking through history/blame?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem in general reformatting is that it can make it slightly harder to see which bits of the code have changed (although in this case, it doesn't really make any difference, as you say)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think it gets slightly more annoying because the diff is bigger; and when a single line is just modified, you can see easily in the diff what was inserted/removed; while if I reformat it into multiple lines, you can't see that quite as easily; it might not even be easy to match old and new code. Anyway, it shouldn't be that bad; I'll put it into a second commit, so we can also drop it if we change our minds.



/* push 'true' (to execute body of else-branch) */
PushObj( True );
PushObj(intr, True);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another note: almost all changes which add IntrState * intr as a parameter, resp. pass intr as an argument, could be removed again if we turned IntrState into a simple C++ class, and turned IntrIfElse etc. into member functions of that.

@@ -324,49 +326,49 @@ static UInt EvalRef(const LHSRef ref, Int needExpr)
return 0;
}

static void AssignRef(const LHSRef ref)
static void AssignRef(ReaderState * rs, const LHSRef ref)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as EvalRef.

@@ -378,92 +380,92 @@ static void AssignRef(const LHSRef ref)
}
}

static void UnbindRef(ScannerState * s, const LHSRef ref)
static void UnbindRef(ReaderState * rs, const LHSRef ref)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as EvalRef; note that the ScannerState is contained in the ReaderState, so there is no need for us to pass both.

}
}
}

static void IsBoundRef(ScannerState * s, const LHSRef ref)
static void IsBoundRef(ReaderState * rs, const LHSRef ref)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as EvalRef; note that the ScannerState is contained in the ReaderState, so there is no need for us to pass both.

@@ -713,7 +715,7 @@ static void CheckUnboundGlobal(ReaderState * rs, LHSRef ref)
return;

// don't warn if we are skipping/ignoring code
if (STATE(IntrIgnoring))
if (rs->intr.ignoring)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace STATE(IntrIgnoring)

}
}
else {
Match(&rs->s, S_ASSIGN, ":=", follow);
UInt currLHSGVar = rs->CurrLHSGVar;
if ( LEN_PLIST(rs->StackNams) == 0 || !STATE(IntrCoding) ) {
if ( LEN_PLIST(rs->StackNams) == 0 || !rs->intr.coding ) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace STATE(IntrCoding)

@ChrisJefferson
Copy link
Contributor

Just as a sanity check, this doesn't effect the speed does it? I'm sure it doesn't (or if anything, would slightly speed things up, as it avoid accessing globals), but it might be worth just sanity checking.

Copy link
Contributor

@wucas wucas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fingolfin thanks for the comments, I think I mostly understand what this PR changes, I added some questions but nothing that blocks merging. I was also looking at #3819 but I wasn't fast enough...

ExecStatus returning;

// 'StackObj' is the stack of values.
Obj StackObj;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be a silly question, but since based on the comments it only matters whether the first three variables are non-zero or not, why aren't they bool or BOOL? What are the remaining 63 bits needed for?


void IntrLe(void);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General C question: What is an argument void? Is that the same as IntrLe()?

Copy link
Contributor

@ChrisJefferson ChrisJefferson Jan 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, (void) means no arguments. In C++, both (void) and () mean the same thing.

In C, () doesn't mean no arguments! It means "unknown number and types of arguments". However, no-one should ever write this, and most people assume () means "no arguments". To see this, gcc will compile this without complaint:

void f() {}

int main(void)
{ f(2,3,4); }

(This is actually a slight simplification of what () means, where I don't think it's worth learning the full horribleness, as it's not actually useful :) )

if (intr->coding > 0) {
CodeFuncCallBegin();
return;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I misunderstand what you mean by

reviewing would be slightly more annoying

but I think its much easier to read that way, so I don't see why changing it in this PR would be wrong.


/* must be in immediate (non-ignoring, non-coding) mode */
assert( STATE(IntrIgnoring) == 0 );
assert( STATE(IntrCoding) == 0 );
GAP_ASSERT(intr->ignoring == 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a general question: Why is GAP_ASSERT preferable?

@fingolfin
Copy link
Member Author

@ChrisJefferson I measured RereadGrp("clasmax.grp"); which takes about 30 milliseconds. Let it run 50 times through my benchmark function (which runs the GC etc. before taking timings), and could not see any significant difference in terms of performance. If you have a better idea for a "benchmark", let me know, I can run that then, too.

@fingolfin
Copy link
Member Author

I now also went to a Linux server (less fluctuation than on my laptop), and used perf tool to measure GAP's startup time (since the whole library has to be read, it's quite sensitive to changes in the reader and interpreter. I measures the head commits of the three stable branches for 4.9, 4.10 and 4.11, as well as this PR (the GASMAN fix from PR #3840 does not seem to have an effect). The file quit.g just contains QUIT;; no packages are loaded, no gap.ini or .gaprc are in effect.

stable-4.9

 Performance counter stats for './gap -A -b ../quit.g' (10 runs):

            802.84 msec task-clock:u              #    0.999 CPUs utilized            ( +-  0.27% )
                 0      context-switches:u        #    0.000 K/sec
                 0      cpu-migrations:u          #    0.000 K/sec
            34,966      page-faults:u             #    0.044 M/sec                    ( +-  0.00% )
     2,482,203,480      cycles:u                  #    3.092 GHz                      ( +-  0.28% )
     3,163,169,668      instructions:u            #    1.27  insn per cycle           ( +-  0.00% )
       693,667,281      branches:u                #  864.016 M/sec                    ( +-  0.01% )
        10,447,508      branch-misses:u           #    1.51% of all branches          ( +-  0.34% )

           0.80333 +- 0.00219 seconds time elapsed  ( +-  0.27% )

stable-4.10

 Performance counter stats for './gap -A -b ../quit.g' (10 runs):

            833.74 msec task-clock:u              #    0.999 CPUs utilized            ( +-  0.22% )
                 0      context-switches:u        #    0.000 K/sec
                 0      cpu-migrations:u          #    0.000 K/sec
            35,064      page-faults:u             #    0.042 M/sec                    ( +-  0.00% )
     2,583,316,151      cycles:u                  #    3.098 GHz                      ( +-  0.15% )
     3,565,799,484      instructions:u            #    1.38  insn per cycle           ( +-  0.00% )
       795,139,389      branches:u                #  953.702 M/sec                    ( +-  0.00% )
        10,690,165      branch-misses:u           #    1.34% of all branches          ( +-  0.27% )

           0.83419 +- 0.00181 seconds time elapsed  ( +-  0.22% )

stable-4.11

 Performance counter stats for './gap -A -b ../quit.g' (10 runs):

            975.68 msec task-clock:u              #    1.000 CPUs utilized            ( +-  0.30% )
                 0      context-switches:u        #    0.000 K/sec
                 0      cpu-migrations:u          #    0.000 K/sec
            37,200      page-faults:u             #    0.038 M/sec                    ( +-  0.00% )
     3,061,431,840      cycles:u                  #    3.138 GHz                      ( +-  0.22% )
     4,186,947,880      instructions:u            #    1.37  insn per cycle           ( +-  0.00% )
       914,773,888      branches:u                #  937.575 M/sec                    ( +-  0.00% )
        11,666,942      branch-misses:u           #    1.28% of all branches          ( +-  0.50% )

           0.97611 +- 0.00294 seconds time elapsed  ( +-  0.30% )

This PR

 Performance counter stats for './gap -A -b ../quit.g' (10 runs):

            973.42 msec task-clock:u              #    1.000 CPUs utilized            ( +-  0.23% )
                 0      context-switches:u        #    0.000 K/sec
                 0      cpu-migrations:u          #    0.000 K/sec
            37,172      page-faults:u             #    0.038 M/sec                    ( +-  0.00% )
     3,047,728,129      cycles:u                  #    3.131 GHz                      ( +-  0.27% )
     4,213,028,737      instructions:u            #    1.38  insn per cycle           ( +-  0.03% )
       917,797,785      branches:u                #  942.859 M/sec                    ( +-  0.04% )
        11,572,378      branch-misses:u           #    1.26% of all branches          ( +-  0.73% )

           0.97388 +- 0.00219 seconds time elapsed  ( +-  0.23% )

What we can see is that this PR has no direct impact, but we do see a regression of the startup time, it got slower with each release :-(. One suspect for this I have in mind are th changes to decouple scanner and IO code; I'll investigate if those indeed are (part of) the cause. If they are, I already was planning to work on that for quite some time, it just wasn't high on my priority list.

@fingolfin
Copy link
Member Author

I used git bisect to track down the slowdown between GAP 4.10 and 4.11, and discovered what I should have known: this was caused by 1762043 from PR #2773, i.e., the method reordering.

@ChrisJefferson
Copy link
Contributor

Thanks, that's more than enough testing!

@ChrisJefferson ChrisJefferson merged commit 1ab4dda into gap-system:master Jan 10, 2020
@fingolfin fingolfin deleted the mh/InterpreterState branch January 10, 2020 22:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release notes: not needed PRs introducing changes that are wholly irrelevant to the release notes topic: kernel
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants