Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic naming of qsartuna-build output if not supplied by user #34

Closed
shortydutchie opened this issue Sep 25, 2024 · 3 comments
Closed

Comments

@shortydutchie
Copy link

Hi

I'd suggest addition of following to optbuild.py so that the best and merged models are named automatically as ..._best.pkl, ..._final.pkl etc. if not supplied by user. That way the results are always written and not lost.

def basename_from_config(nm: str) -> (pathlib.Path, str):
    """
    Basename for automatic naming purposes
    Returns:
       basepath, basename
    """
    p, n = os.path.split(nm)
    b, e = os.path.splitext(n)
    base = b[:]
    for repl in ["config_", "conf_"]:
        base = base.replace(repl, "")
        base = base.replace(repl.upper(), "")
    return pathlib.Path(p), base

def set_default_output_names(args) -> Union[str, bool]:
    """
    Set default output names based on the conf file name, where not supplied
    """

    if not args.config:
        return False

    if not os.path.exists(args.config):
        logger.critical(
            "\n!!! Please specify a valid configuration file, '%s' does not exist !!!\n"
            % args.config
        )
        return False

    basepath, basename = basename_from_config(args.config)

    if args.best_buildconfig_outpath is None:
        args.best_buildconfig_outpath = pathlib.Path(
            os.path.join(basepath, "model_%s_best.json" % basename)
        )
        logger.info(
            "best_buildconfig_outpath : set to %s based on config file name"
            % args.best_buildconfig_outpath
        )
    if args.best_model_outpath is None:
        args.best_model_outpath = pathlib.Path(
            os.path.join(basepath, "model_%s_best.pkl" % basename)
        )
        logger.info(
            "best_model_outpath       : set to %s based on config file name"
            % args.best_model_outpath
        )
    if args.merged_model_outpath is None:
        args.merged_model_outpath = pathlib.Path(
            os.path.join(basepath, "model_%s_final.pkl" % basename)
        )
        logger.info(
            "merged_model_outpath     : set to %s based on config file name"
            % args.merged_model_outpath
        )

    return basename

Call it right after args = parser.parse_args() as follows:

    args = parser.parse_args()
    basename = set_default_output_names(args)

Thanks!

@lewismervin1
Copy link
Collaborator

Thanks for the helpful suggestion @shortydutchie!

I incorporated your proposed suggestions into the branch https://github.com/MolecularAI/QSARtuna/tree/3.1.4_prep. It would be great if you could try it and see if it works as you expect.

@shortydutchie
Copy link
Author

Thank you Lewis!
Yes that works, specifying only --conf automatically sets the output names.

> qsartuna-build --config test_Lasso.json
2024-10-03 08:46:37,282.282 INFO optbuild - set_default_output_names: best_buildconfig_outpath : set to model_test_Lasso_best.json based on config file name
2024-10-03 08:46:37,282.282 INFO optbuild - set_default_output_names: best_model_outpath       : set to model_test_Lasso_best.pkl based on config file name
2024-10-03 08:46:37,282.282 INFO optbuild - set_default_output_names: merged_model_outpath     : set to model_test_Lasso_final.pkl based on config file name

But I forgot - there's a minor issue with this bit (line 211 optbuild.py):

    if args.best_buildconfig_outpath:
        os.makedirs(os.path.dirname(args.best_buildconfig_outpath), exist_ok=True)
        ...
        ...

If the path name is supplied by user without a directory it will fail as dirname() returns "", which makedirs() does not like.
Couple of options -

  • I've commented it out in my version, ie. user needs to create output directory themselves.
  • should add "./" to the default names set in set_default_output_names() - my oversight, apologies

Thanks!

@lewismervin1
Copy link
Collaborator

Thanks @shortydutchie, I made some changes and incorporated them into 3.1.4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants