-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor/reorganize root dir #517
Closed
Closed
Changes from all commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
9051408
Add function for migrating internal files.
vyasr e0169be
Move migration function to project, use for sp cache.
vyasr 3cd9f14
Migrate shell history.
vyasr 0698a26
Switch from print to logging.
vyasr 09c9755
Provide missing arg.
vyasr a5a4090
Ensure that .signac directory exists.
vyasr 57dc715
Update signac/contrib/project.py
mikemhenry f17d158
Merge branch 'master' into refactor/reorganize_root_dir
vyasr File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can't remove this in 2.0 without breaking backwards compatibility with 1.x. Should this migration be handled as a "schema migration" like #253 instead of a signac 2.0 migration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how we should handle this. Maybe something else worth discussing in the meeting today.
I think that migration of
signac.rc
to.signac/config
is possibly something to handle via explicit schema migration, because that is user-facing right now, but all of the other files are purely internal so making the change transparently makes more sense to me. I agree that we can't remove this in signac 2.0, but I don't know how long we would continue to include this code. Do we retain it indefinitely? Remove in 3.0?Furthermore, if we treated this like a schema migration, how long would signac support older schema versions? We would probably have to require migrations before users could use newer signac versions because we might make assumptions about the workspace that may not be valid for older data spaces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bdice @csadorf to expand on this, since it was apparently decided at the last developer meeting we are choosing to leave
signac.rc
at the project root rather than moving to.signac/config
, all the files moving are purely internal and not user-facing. The disappearance of any of these files will not invalidate the data space or change the way a user interacts with it. I would not consider this a schema migration, but rather a reshuffling of our internals. In fact, I'm not even sure users need to be aware of this change ifsignac.rc
is not moving. Therefore, I don't think we should require any user input to perform this migration, which is what the migration API is based on.Regarding when we remove this function, I would be fine removing it in 2.0 if we are not moving
signac.rc
. The worst case scenario is that someone doesn't touch some signac project between now and version 2.0, and then they have to rerunsignac update-cache
. If someone goes this long without touching a project, I don't think the loss of signac shell histories is a problem either, and an errant.signac_shell_history
file persisting at the project root forever is probably not something anyone would ever notice. We could also choose to keep this function for longer if we'd like, but in the long term the main issue I could see with this is people accessing signac data spaces that are archived to read-only directories, in which case the file move would fail.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vyasr @csadorf
The current approach in this PR of assuming that no project will be negatively affected by moving/changing/deleting files without user confirmation just seems like it's not a good idea. 😕 For example, this PR might not be safe if someone ran parallel jobs and triggered automatic file movements with race conditions. This is a migration as far as I can tell, and it is what we're trying to safeguard with the work @csadorf did for interactive migrations. I believe we need to make this a manual (and optional) migration process for the user.
Because of the user burden that a manual migration incurs, we could wait until finalizing any other potential schema changes (e.g. changes necessary for TOML support or other signac 2.0 plans) to introduce the migration formally. (Side note: that includes the proposed workspace directory changes, if we need to prompt the user to create a symlink
workspace
pointing to the current project config's defined workspace path.)Alternatively, we can just make this PR the definition of "schema version 2" and apply future changes in schema versions 3, 4, etc. As long as the migrations are well-defined, it should be easy to make them apply sequentially. signac 2.0 could have schema version 10 (composed of separate changes to cache files, workspaces, configs, etc.), and that would be perfectly fine with me. I think I like this idea best.
If we want to treat signac like a mature database application, I think that safeguards like user-controlled migrations that involve metadata changes or file movements are expected and necessary. I understand that the stakes are low in this case but we shouldn't skip over using the infrastructure that is designed specifically for this purpose. That said, I don't mean for my vote to be an overriding one if other @glotzerlab/signac-committers agree to the approach in this PR (or are willing to accept it as a trade-off of safety / convenience / etc.).
Timeliness of access (pre-1.7, pre-2.0, post-2.0) shouldn't govern the outcome here. Schema versions should.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are good points. I'll try and address them:
signac
is using overloaded terminology. I've never spent much time with the schema migration code, so I hadn't thought about it until you mentioned it, and I think the naming put me in the wrong frame of mind. To your point about treating signac as a mature db, I don't actually think that comparison applies here because this problem is quite different from what databases face. Changes in signac's data model are not schema migrations in the sense of a database. A schema migration describes changes in a user's chosen layout of data stored using signac as we describe in this recipe. What we're doing here is more like a "database migration", analogous to MongoDB changing how it stores indexes internally between versions. True databases have the benefit of storing their data internally and therefore controlling the layout. To my knowledge (I could be wrong about this), to migrate data stored in an older version of MongoDB server, the solution is just to upgrade the MongoDB server software itself. There may also be other tools built to support this (for example this tool for MySQL supports migrating databases between newer versions of MySQL), but in general I don't think databases face the same problem we do where the database software can be updated without also updating the underlying data storage format. I don't know if there's a suitable renaming ofschema_version
that would avoid this problem, though, nor have I thought at all about the backwards compatibility implications of such a change.workspace
That's not 100% pertinent to this discussion, but worth ironing out here since we'll have to do that soon enough. We could help out users there by checking whether theworkspace_dir
key is set insignac.rc
, and if so, raising an exception indicating that the user should run the migration unlessworkspace_dir = workspace
.signac-flow
.signac-flow
has no concept of schema version, nor should it because it's not a database, butsignac
also has no knowledge ofsignac-flow
. What happens if a user runs a manual schema migration withsignac
on a project with bundled jobs submitted, then tries to check status? Conversely, what happens if a user upgradessignac-flow
and.bundles
gets moved to.signac/bundles
, but they're still using an older version of flow. Doessignac-flow
need to have a way to patch into the migration? Does the migration logic already provide hooks for this (not to my knowledge)? Does that become a separate migration process?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One follow-up on my last bullet point above. The fact that
signac-flow
is susceptible to this kind of problem suggests to me thatflow
should probably be storing its internal files independently ofsignac
(e.g. in.flow
). This kind of interdependency is problematic; if for some reason in the future we decide to rename.signac
to.party_parrot
,signac
shouldn't have to worry about preserving the directory becausesignac-flow
(or anothersignac
-adjacent package) might be using it.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This discussion is sufficient to convince me that we should not perform this change automatically. That being the case, we might as well use the migration tooling. In that case I'd probably want to do what @bdice suggested and make a new branch off of master in which to collect all the relevant change (the ones in this PR, workspace, maybe TOML), then make the schema version change there once we're happy with all the changes. Maintaining migrations perpetually seems fine to me, I don't see much overhead arising there, at least not for a while (Python 4 etc).
I disagree that the only guarantee we make is that our API works (or at least, I disagree that that is all we aspire to). IMO one of the goals of signac is to store data in a layout that is well-defined and can be parsed even without signac to maximize data sharing and lifetime. A major strength of our model is that
signac_statepoint.json
clearly identifies the data in a directory. To the extent that I would not consider changing that (e.g. by renaming the file, changing its format, etc, or by a more systematic generalization of our data model as we planned in the original signac 2.0 prototype) I would consider that part of what we promise, or at least something we should aspire to offer.I agree that data schema migrations are not always user-facing. One way to think about this in the context of AiiDA is that AiiDA is a database built on top of a database i.e. it stores data within another database according to a well-defined internal schema, and that schema can change. Therefore, client code must also account for this. I hadn't considered signac this way since we market signac as a "database on the filesystem", but now I agree that effectively applies with the files on the filesystem as our "underlying database".
From that perspective, flow also has an internal schema. Other workflow managers must have the same problem. AiiDA gets around this by having the workflow and data in the same package, so a single update/schema migration can solve this. What does Fireworks do? They are probably the closest point of comparison for signac-flow as an independent workflow tool. Do they have a procedure for what happens if their internal MongoDB storage layout changes? I agree that it seems like using the migration tooling seems like the right approach for flow, perhaps it just needs to be generalized enough that any tool built on signac can reuse it safely to avoid duplication (maybe that's already largely the case, I'm not sure).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! This is how I think about it as well. I am happy with the proposal to combine all schema changes into one PR and handle flow separately.