You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on May 29, 2018. It is now read-only.
Most research software does not actually get cited directly. For example, a paper might cite sklearn but not numpy, or numpy but not BLAS, etc. Consequently, most research software is only cited implicitly.
To try and fill in the implied citation network, we can extract software dependencies from known repositories. This can take a few forms:
Python packages that use setuptools define their dependencies explicitly, and these are stored in a well-structured object that's easy to parse.
What about R?
What about MATLAB?
What about C/C++?
Alternatively, once we have a list of top-level packages, we can start crawling package management hierarchies:
Debian/ubuntu/etc
PyPI
Mathworks file exchange?
What about Mac users: anaconda? brew? ports?
Once we have a full tree, we'll have to prune it back to some reasonable level. It might be useful to include something like boost, but libc would obviously be a step too far. Where do we draw the line? Can this be automated?
The text was updated successfully, but these errors were encountered:
Can I request that this dependency tracking be implemented in such a way that it can be imported as a module into another project?
I ask because I've been intending to do something similar to this for a collaboratin analysis tool my team has been working on: https://github.com/sbenthall/bigbang
One thing I'd like to suggest (though it might be scope creep) is to think about how this integrates with version control. Software dependencies are something that change over time.
Yes, that's an excellent point. For something like pypi or debian, dynamic dependency tracking would be pretty straightforward since all packages are versioned. For the other, more esoteric sources (mathworks?), this seems pretty treacherous, but maybe soluble via timestamps.
I definitely like the idea of implementing that as a standalone module. I worry a little about having common identifiers across modules if it gets split up, but canonical naming can be part of the functionality of the dependency tracking module.
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Most research software does not actually get cited directly. For example, a paper might cite sklearn but not numpy, or numpy but not BLAS, etc. Consequently, most research software is only cited implicitly.
To try and fill in the implied citation network, we can extract software dependencies from known repositories. This can take a few forms:
Alternatively, once we have a list of top-level packages, we can start crawling package management hierarchies:
Once we have a full tree, we'll have to prune it back to some reasonable level. It might be useful to include something like boost, but libc would obviously be a step too far. Where do we draw the line? Can this be automated?
The text was updated successfully, but these errors were encountered: