Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Split src into multiple subdirectories #1915

Open
wants to merge 14 commits into
base: master
Choose a base branch
from

Conversation

fingolfin
Copy link
Member

This is something I've been wanting to do for a long time: When I started with the GAP kernel, I found it very difficult to navigate it and find things. These days, I know my way around it pretty well, but even now sometimes I have troubles.

I think grouping our kernel sources in a suitable fashion could help people who are learning the GAP kernel, but also old timers. Of course in addition, one may want to rename certain files (e.g. objcftl.c) or split some (see e.g. issue #1914), but I think just grouping them is a good first step.
(And something similar would also be very useful for the library; the file dev/packages would be a good starting points, and similar methods as in this PR could be used to get there).

Now, even assuming we agree with this general idea, there are some details to resolve; e.g. which files to move were. This PR is a starting point for that, not (yet) the end. You can view the result conveniently here: https://github.com/fingolfin/gap/tree/mh/split-srcdir/src

Since I expect a discussion, I set it up in a way so that it is trivial for me to adjust it (e.g. to group files differently, to change directory names, and also just to be able to rebase it on master). Specifically, the first three commits are fixed, the others are generated by a script called rename-stuff.sh; by discarding the generated commits, then editing rename-stuff.sh and running it again, one can play around with the desired structure.

Since this is a somewhat disruptive change, we should ideally not have any PRs against the kernel open when doing it; any such PR will have to be re-created by their authors, which can be painful (I wouldn't mind too much for mine, as I can automate this, but I am sure others will)

The directory names in here and the way I grouped things are of course not set in stone. Here are my reasons for each dir name and its content (something like these explanations should eventually go into README.md files in each dir):

  • interpreter: this contains the code parsing code (read.c, scanner.c), interpreting it immediately (intrp.c) and delayed via our internal syntax tree (code.c, exprs.c, funcs.c, stats.c, vars.c) and the debugging hooks into the interpreter. (It also contains OpenInput etc. as part of scanner.c, which I find unfortunate, and this part of the motivation for issue Split scanner.c, move io part out (say into io.c and io.h) #1914)

  • hpc: I left this unchanged, although I think some of the files in here should be split up, and perhaps also moved into other dirs.

  • tnums: contains most implementation of concrete data types. I called this "tnums" and not "types" as types already have also another meaning in GAP, and the word in general is quite overloaded. Missing are of course TNUM implementation which are in interpreter resp. in hpc.

  • math: contains code specific to mathematics, e.g. coset tables, pc group collectors, vectors and matrices over finite fields. One file in there is a bit of an odd-ball listoper.c: contains some very math specific code (e.g. dealing with monomials, matrices and vectors), but also some generic code (e.g. EqListList). I think this file should be split into two and more; I am then still not 100% were each should be (the list arithmetic is of course very fundamental; OTOH this is about "matrices as list-of-lists, which is rather math specific...)

  • core (TODO: better name) contains files which make up the "core" of the GAP language (other than the interpreter code): objects (!), generic object "types" and supporting functionality (ariths, lists, records, set), functions and operations (calls.c, opers.c; note that code.c defines function bodies and is part of the interpreter, as is funcs.c, which deals with evaluating/executing function/procedure calls; this is a rather fine line to draw, and in the past, I have often looked for stuff in funcs.c when it was in calls.c and vice versa; we may want to carefully evaluate these two)

  • applications, general: I am not quite happy with these two names and the distinction; perhaps they should be just one anyway?

    • The first dir, applications only contains listfunc.c/.h and sortbase.h. This file provides generic functions which are not part of the "core" GAP, but are built atop it and are to be used from GAP. Most notably: lots of sorting functions (which perhaps should be in their own sort.c). It also contains some other stuff, e.g. STRONGLY_CONNECTED_COMPONENTS_DIGRAPH which sounds like it rather belongs into a new math/graph.c or a package ;-)., and various permutation actions" OnPoints etc, which perhaps should be in math/actions.c. This leaves APPEND_LIST_INTR and COPY_LIST_ENTRIES.
    • the second dir, general, contains more low-level stuff:
      • iostream.c: mostly creating subprocesses and communicating with them, rather low level stuff (and I think the filename could be more helpful). It is used to implement InputOutputLocalProcess
      • intfuncs.c: contains hash functions; and the relatively new bitfield code
      • streams.c: This file is another candidate for splitting (and the resulting files might live in different dirs afterwards). Generally, it seems to be there for supporting GAP stream objects, hence the name.
        • Part of are functions for manipulating files and directories from GAP -> move to another file?
        • Part is our other API for creating subprocesses (not to mention that the io package adds a third one ;-), which is ultimately exposed to the user via InputOutputLocalProcess
        • Also contain functions which drive regular text reps. file stream
        • parts expose control of the input/output logging code (which itself is currently implemented in scanner.c, which I want to split as described in Split scanner.c, move io part out (say into io.c and io.h) #1914; then these GAP functions should perhaps be moved to the resulting file?)
        • it also contains the code driving the various Read() functions, testing functions, custom read-eval loops etc.
  • util contains, well, general utilities: debug.{c,h}, fibhash.h, gaputils.h

  • several files are still left in src itself, but we may want to change that:

    • compiler related files could perhaps be moved into compiler subdir. Perhaps also the compiled files?
    • however src/compiled.h is included by every (?) package with a kernel extension, so it must stay there, or at least some file with that name; ultimately, I'd envision a file with a similar role but a name that is more suggestive of its (accidentally) "intended" use (this file, of course, was meant for use by C output of gac, but package authors also used it for convenience). If this was a fresh project, a candidate name for this file would be gap.h -- but of course we already have a gap.h, so to do that, we'd first have to rename that. Perhaps all.h would also be a good idea (at least for the part which includes "all" headers)
    • gap.c/.h: I'd leave them in src for now, and then eventually start splitting them up into more logical and self contained chunks
    • system.h: Right now this is more the kind of "file which every file in GAP should include, to ensure it "sees" all global #defines and typedefs: also, lots of other junk is in there. This file, too, would be split, perhaps part of it fused with gap.h, or moved into a new gapconfig.h, or ...
    • system.c, sysfiles.c/.h: perhaps move to a new subdir sys resp. os resp. system? Possibly split up etc.?
    • boehm_gc.c, gasman.c/h: could be moved into a memory subdir, or into core?
    • gapw95.c ...
    • profile.c/.h: uhh... ? general? application? somewhere else?

@fingolfin fingolfin added kind: discussion discussions, questions, requests for comments, and so on do not merge PRs which are not yet ready to be merged (e.g. submitted for discussion, or test results) topic: kernel request for comments labels Nov 16, 2017
@fingolfin fingolfin added this to the GAP 4.10.0 milestone Nov 16, 2017
@codecov
Copy link

codecov bot commented Nov 16, 2017

Codecov Report

Merging #1915 into master will decrease coverage by 12.07%.
The diff coverage is n/a.

@@             Coverage Diff             @@
##           master    #1915       +/-   ##
===========================================
- Coverage   84.76%   72.69%   -12.08%     
===========================================
  Files         689      477      -212     
  Lines      336400   241494    -94906     
===========================================
- Hits       285159   175544   -109615     
- Misses      51241    65950    +14709
Impacted Files Coverage Δ
src/math/freegroup_elms.h 100% <ø> (ø)
src/interpreter/statements.c 87.42% <ø> (ø)
src/util/gaputils.h 100% <ø> (ø)
src/core/objects.c 81.41% <ø> (ø)
src/tnums/pperm.c 86.94% <ø> (ø)
src/system.c 65.57% <ø> (-7.98%) ⬇️
src/core/objects.h 81.81% <ø> (ø)
src/util/debug.c 33.33% <ø> (ø)
src/tnums/objset.c 83.01% <ø> (ø)
src/general/streams.c 63.45% <ø> (ø)
... and 795 more

@fingolfin
Copy link
Member Author

Oh, and of course when doing this, one may as well also rename some files. Like intrprtr.c to interpreter.c, or objfgelm.c to e.g. free_group_elements.c (err, do we prefer underscores or dashes? Currently both is in use)

@ChrisJefferson
Copy link
Contributor

I am generally unenthusiastic about this.

While the GAP source directory is quite big, I'm not sure rearranging the files will help that much -- usually I am searching the source for a particular function, which I just do by looking over all files.

However, moving and renaming of files is going to break attempts to go through the history of a file or function, break any in progress commits (which will still exist) and make it harder to do a quick diff between a file in 4.8.x and a new release.

In short, while I agree different naming and directory structure would have been better in the start, I'm not sure the churn is worth it now.

@fingolfin
Copy link
Member Author

moving and renaming of files is going to break attempts to go through the history of a file or function

That's not correct. git blame automatically follows renames, and so does git log --follow FILE.

break any in progress commits (which will still exist)

That's true, but would only be a temporary problem.

make it harder to do a quick diff between a file in 4.8.x and a new release

That's true to an extent, although I don't recall ever having done that, and the hardest bit now would be to know to what the file has been renamed. Most files just moved, though.

@fingolfin
Copy link
Member Author

Here is an example of a GitHub "blame" view for a moved and renamed file: https://github.com/fingolfin/gap/blame/mh/split-srcdir/src/math/collector_combinat.c

@fingolfin
Copy link
Member Author

Or something bigger and with more changes: https://github.com/fingolfin/gap/blame/mh/split-srcdir/src/interpreter/interpreter.c

@ChrisJefferson
Copy link
Contributor

I don't really want to have to remember to add --follow FILE to git log.

This is something I'm tempted to suggest some kind of vote at a gap days on, as it will obviously effect all current developers.

Of course, I understand a trade-off between current developers (who I imagine mostly don't want things to move they understand), and new developers who might benefit from rearranged files -- I imagine many existing developers are more likely to want things to stay the same. If I'm wrong, I'm very happy for changes to be made.

@fingolfin
Copy link
Member Author

fingolfin commented Nov 20, 2017

I am happy to vote on this (after 4.9 is out, obviously). Of course that would then also be a vote on whether we want to earnestly try to attract new developers, or not.

UPDATE: I regret that last sentence, it makes it sound like an ominous threat, which is not my intention. I do think the decision is related to whether we prefer to cater more to existing or to potential future devs, but it's not that dramatic either way, I believe.

@ChrisJefferson
Copy link
Contributor

One other comment on this, while I remember it.

I'd prefer we not do this until fairly far through the 4.10 dev cycle, if we do, just because it will make cherry-picking fixes back onto stable-4.9 basically impossible.

@frankluebeck
Copy link
Member

I'm not in favour of this change. I cannot see any advantage of having the files in so many subdirectories. This only makes working with the files and the repository more cumbersome.

Of course, it would be nice to make it easier to understand which files do what. But I doubt that this is achieved by sorting them into subdirectories.

How about just adding a README file with an extended version of the first post of this conversation? I would also support renaming some files; there is really no reason any more to use cryptic filenames because of 8.3 restrictions.

@fingolfin
Copy link
Member Author

Could you please explain in how far you think subdirectories would make "working with the files and the repository more cumbersome"?

Of course subdirectories alone does not solve the problem with understanding what each file does, but I think it helps with understanding how things fit together.

It's good we at least agree that renaming some files to something more meaningful (free from 8+3 file name length restrictions) would be helpful).

BTW, this goes even more for lib, where I really really would like to use subdirs, and rename files, to better get a hold of the over 500 files in there.

@fingolfin
Copy link
Member Author

As to adding a README with an "extended version of the first post of this conversation": I am not interested in doing that, as I am pretty sure nobody would read it, and it'll be outdated within a few months, and nobody would maintain it. In contrast, you cannot just ignore subdirectories, or overlook them.

@fingolfin fingolfin removed this from the GAP 4.10.0 milestone Sep 28, 2018
@wilfwilson
Copy link
Member

FWIW, I welcome changes like these, and I feel like they would make it easier for me to understand, and potentially get involved in, the kernel.

@wilfwilson
Copy link
Member

Note - I especially support the renaming of files to have more intelligible names.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do not merge PRs which are not yet ready to be merged (e.g. submitted for discussion, or test results) kind: discussion discussions, questions, requests for comments, and so on topic: kernel
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants