Skip to content

Add functionality for multiple morphs in one command #82

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Feb 27, 2024

Conversation

Sparks29032
Copy link
Collaborator

@Sparks29032 Sparks29032 commented Jul 4, 2023

Fixes issue #84.

Added --sequence option for someone to morph a FILE multiple times against all files in a DIRECTORY as target. Plots the Rw values by default, and can save a table. Default is to sort files in DIRECTORY in alphabetical order.
image
image
A user can enable the --temperature option if all files in DIRECTORY end in "###K.gr" or "###K.cgr". This will sort files by ###, which should be the temperature (in Kelvin). Plots and tables look like the following if this option is enabled.
image
image

Fixes #77: default behavior is to set plot labels to file names. Making default just "morph" and "target" means multiple morphs in a morph sequence will have the same label.

Also fixes #47: checked the saving code enough to know it works.

New warnings appear during tests; see issue in diffpy.utils.

@codecov
Copy link

codecov bot commented Jul 4, 2023

Codecov Report

Merging #82 (63c1b8e) into main (6fdf12b) will increase coverage by 9.32%.
The diff coverage is 92.08%.

@@            Coverage Diff             @@
##             main      #82      +/-   ##
==========================================
+ Coverage   75.01%   84.33%   +9.32%     
==========================================
  Files          36       38       +2     
  Lines        1437     1890     +453     
==========================================
+ Hits         1078     1594     +516     
+ Misses        359      296      -63     
Files Coverage Δ
diffpy/pdfmorph/__init__.py 100.00% <100.00%> (ø)
diffpy/pdfmorph/morph_helpers/transformrdftopdf.py 100.00% <100.00%> (+10.00%) ⬆️
diffpy/pdfmorph/tests/test_morphchain.py 100.00% <100.00%> (ø)
diffpy/pdfmorph/tests/test_morphpdftordf.py 100.00% <100.00%> (ø)
diffpy/pdfmorph/tests/test_morphrdftopdf.py 100.00% <100.00%> (ø)
diffpy/pdfmorph/tests/test_morphresolution.py 100.00% <100.00%> (ø)
diffpy/pdfmorph/tests/test_morphrgrid.py 100.00% <100.00%> (ø)
diffpy/pdfmorph/tests/test_morphscale.py 100.00% <100.00%> (ø)
diffpy/pdfmorph/tests/test_morphshape.py 98.55% <100.00%> (-1.45%) ⬇️
diffpy/pdfmorph/tests/test_morphsmear.py 100.00% <100.00%> (ø)
... and 11 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@Sparks29032
Copy link
Collaborator Author

Sparks29032 commented Jul 5, 2023

Warnings from test come from use of depreciated pkg_resources in diffpy.utils. Removed diffpy.pdfmorph's use of pkg_resources in its version.py module. Errors in checks from importlib_metadata, requires fix.

@Sparks29032 Sparks29032 marked this pull request as ready for review July 5, 2023 21:43
@Sparks29032 Sparks29032 marked this pull request as draft July 6, 2023 03:24
@Sparks29032
Copy link
Collaborator Author

Need to add plotting of Rw vs temperature (if temperature given). Otherwise, a histogram will be displayed of Rw for each file in directory as target.

@Sparks29032 Sparks29032 marked this pull request as ready for review July 6, 2023 19:28
@Sparks29032
Copy link
Collaborator Author

Sparks29032 commented Jul 6, 2023

Before merging, do you mind reviewing the plot/tables generated (from top comment)? Thinking maybe we can also use dotted lines instead to match PDFgui plotting (below)
image

@Sparks29032
Copy link
Collaborator Author

Covers the first four check boxes in issue #84. Still need some refactoring to cover the final three check boxes, but those can be done in a single commit.

See comments for major edits.

Comment on lines +223 to +224
# ensure all entries in target_labels are distinct for plotting
unique_labels = set()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could happen if the field given to --sort-by is duplicated across multiple files. We handle this by renaming the duplicated field with an additional (#).

image

Comment on lines 247 to 252
if numerical:
# Plot Rw vs Temperature
plt.plot(target_labels, rws, linestyle='-', marker='o')
plt.ylabel(r"$R_w$")
plt.xlabel(rf"{field}")
plt.minorticks_on()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Numerical fields are plotted as a line graph.

Uploading image.png…

Comment on lines +261 to +263
angle = numpy.arccos(bar_size / max_len)
angle *= 180 / numpy.pi # Convert to degrees
plt.xticks(rotation=angle)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A slight tilt to make the graph less tall

self.testfiles.append(testsequence_dir.joinpath(filename))
return

def test_morphsequence(self, setup_default):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test morphsequence is same as performing multiple morphs individually

s_sequence_results = multiple_morphs(self.parser, opts, pargs, stdout_flag=False)
assert s_sequence_results == sequence_results

def test_morph_outputs(self, setup_default, tmp_path):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check if files are saved properly against files in the new testsaving directory

nickel_PDF = testdata_dir.joinpath("nickel_ss0.01.cgr")
serial_JSON = testdata_dir.joinpath("testsequence_serialfile.json")

class TestExceptions(unittest.TestCase):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will change to pytest; additionally, the three files (1) test_pdfmorphapp_exceptions.py, (2) test_morphsequence.py and (3) test_parser.py will be moved to a new file test_pdfmorphapp.py to follow convention. This will be done either in the change_libraries branch or another commit.

def setUp(self):
self.parser = create_option_parser()

def test_length_exceptions(self):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ensure all the following throw an exception

self.assertAlmostEqual(tools.nn_value(value, name=None), abs(value))
self.assertAlmostEqual(tools.nn_value(-value, name=None), abs(-value))

def test_field_sort(self):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check new field_sort function (edited from temperature_sort)

if val < 0:
negative_value_warning = f"\n# Negative value for {name} given. Using absolute value instead."
print(negative_value_warning)
return -val
return val


def deserialize(serial_file):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function exists to keep the diffpy.utils import contained in the tools.py module

return parsers.deserialize_data(serial_file)


def field_sort(filepaths: list, field, reverse=False, serfile=None, get_field_values=False):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sort a list of files by a field. This field can either be in the header or in a metadata/serial file. Includes a reverse option.

metavar="FILE",
dest="savefile",
help="Save manipulated PDF from FILE1 to FILE. Use \'-\' for stdout.",
metavar="SLOC",
Copy link
Collaborator Author

@Sparks29032 Sparks29032 Aug 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New options begin here. Note metavar FILE in -s option changed to SLOC since it can now also be a directory.

print(header_verbose, file=outfile)

# Print table with label
print("\n# Labels: [r] [gr]", file=outfile)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added label before table is printed

Comment on lines +481 to +492
# Parse paths
morph_file = Path(pargs[0])
if not morph_file.is_file():
parser.custom_error(f"{morph_file} is not a file. Go to --help for usage.")
target_directory = Path(pargs[1])
if not target_directory.is_dir():
parser.custom_error(f"{target_directory} is not a directory. Go to --help for usage.")

# Do not morph morph_file against itself if it is in the same directory
target_list = list(target_directory.iterdir())
if morph_file in target_list:
target_list.remove(morph_file)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Get target files to morph against

Comment on lines +494 to +501
# Format field name for printing and plotting
field = None
if opts.field is not None:
field_words = opts.field.split()
field = ""
for word in field_words:
field += f"{word[0].upper()}{word[1:].lower()}"
field_list = None
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Get field if we want to sort by it

Comment on lines 527 to 542
# Make directory to save files in if it does not already exist
Path(save_opt).mkdir(parents=True, exist_ok=True)

# Morphs will be saved in the subdirectory "Morphs"
save_morphs_here = Path(save_opt).joinpath("Morphs")
save_morphs_here.mkdir(exist_ok=True)

# Get names for the saved morphs
if opts.snfile is not None:
# Names should be stored properly in opts.snfile
save_names = tools.deserialize(opts.snfile)
# Default naming scheme
else:
for target_file in target_list:
save_names.update({target_file.name: {"save_morph_as":
f"Morph_with_Target_{target_file.stem}.cgr"}})
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The -s option for multiple morphs saves all the morphs to a user-specified directory. Default names for the names of morphs are "Morph_with_Target_{target_PDF_name}.cgr"

Comment on lines 602 to 621
# Table labels
labels = "\n# Labels: [Target]"
if field is not None:
labels += f" [{field}]"
for param in [["Scale", scales], ["Stretch", stretches], ["Smear", smears]]:
if len(param[1]) > 0:
labels += f" [{param[0]}]"
labels += " [Pearson] [Rw]\n"

# Corresponding table
table = ""
for idx in range(results_length):
row = f"{file_names[idx]}"
if field is not None:
row += f" {field_list[idx]}"
for param in [scales, stretches, smears]:
if len(param) > idx:
row += f" {param[idx]:0.6f}"
row += f" {pearsons[idx]:0.6f} {rws[idx]:0.6f}"
table += f"{row}\n"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Terminal output for multiple morphs is a table summarizing details of the morph

image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks really great!

@Sparks29032
Copy link
Collaborator Author

Covers the first four check boxes in issue #84. Still need some refactoring to cover the final three check boxes, but those can be done in a single commit.

See comments for major edits.

Most recent two commits cover the last three check boxes in issue #84 (list of edits to be made based on comments on PR #81). Only issue left to be resolved is making multiple_morphs and single_morph more maintainable, though I believe multiple_morphs makes use of single_morph capabilities whenever possible.

@Sparks29032 Sparks29032 mentioned this pull request Aug 21, 2023
Copy link
Contributor

@sbillinge sbillinge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please work on the help messages for the cli parser. I am a professional PDF person and am having a hard time figuring out what you want there.... if a chemist is the target audience, try and make it science oriented. They won't know what SLOC is but they will know they have a directory folder containing a set of PDFs from their sample.

The App looks really great. I am happy to merge before refactoring the io into separate modules (or we could just put that as an issue) but I think it would be good to do that if we think any of it is reusable.

Thanks for refactoring the tests. I think that pytest can run unittests so I wasn't really needing this (just new tests in pytest syntax) but it is good to have it done! :)

@@ -46,11 +46,22 @@ jobs:
conda install --file requirements/test.txt
pip install .

# redownload diffpy.utils pre 3.2.0 release, use https since public repo
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make sure there is an issue to remove this when diffpy.utils is released on conda-forge. I would rather not do this here tbh, but if it is needed for now it is ok, but let's make sure we remove it later.

What is left on diffpy.utils to get it released? Can we just do that and not have this here? it is just making more work in the future.....

help="Save manipulated PDF from FILE1 to FILE. Use \'-\' for stdout.",
metavar="SLOC",
dest="saveloc",
help="""Save the manipulated PDF to a file SLOC. Use \'-\' for stdout.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will a chemist be able to understand this? I am not sure the target audience (presume a chemist) but please write the help with the target audience in mind.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, changed SLOC to NAME.

'--save-names-file',
metavar="SNFILE",
dest="snfile",
help="""Used only when -s and --multiple are enabled.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

be clear if this is OR or AND, e.g., "Used only if both -s and -multiple are enabled."

Again, read it with a beginning grad in chemistry in mind....will it be clear to them what to input here and when and how? Examples are sometimes better than text filled with jargon.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the names and descriptions. Also added an example for ``--save-names-file` in tutorial.

if len(pargs) < 2:
parser.error("You must supply FILE1 and FILE2.")
elif len(pargs) > 2:
parser.error("Too many arguments.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with users in mind, when things fail it is always more helpful, as well as telling them why it failed, telling them what they should do next to make it work properly. Much less frustrating for the user.

with open(opts.savefile, 'w') as outfile:
try:
if opts.saveloc != "-":
save_file_name = Path(opts.saveloc).resolve().as_posix()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need the as_posix()? OK if yes, but just checking.

parser.custom_error("Too many arguments. Go to --help for usage.")

# Parse paths
morph_file = Path(pargs[0])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks like a lot of io handling which would be better in some kind of io function somewhere rather than in the app I think.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have factored some io handling in an io module, but both functions take in a lot of inputs to make work.

opts.plot = False

# Manage saving
save_opt = opts.saveloc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as above. Do we want io in some kind of io.py module?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have this part yet to factor out

})

# Parse rws
file_names = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put all on one line? e.g.,

file_names, rws, pearsons = [], [], []

if "smear" in results[key]:
smears.append(results[key]["smear"])

# Input parameters used for every morph
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a header-handling function may also be reusable and can go into another module

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put into tools with the name get_values_from_dictionary_collection(), though I am not particularly satisfied with that name.

Comment on lines 602 to 621
# Table labels
labels = "\n# Labels: [Target]"
if field is not None:
labels += f" [{field}]"
for param in [["Scale", scales], ["Stretch", stretches], ["Smear", smears]]:
if len(param[1]) > 0:
labels += f" [{param[0]}]"
labels += " [Pearson] [Rw]\n"

# Corresponding table
table = ""
for idx in range(results_length):
row = f"{file_names[idx]}"
if field is not None:
row += f" {field_list[idx]}"
for param in [scales, stretches, smears]:
if len(param) > idx:
row += f" {param[idx]:0.6f}"
row += f" {pearsons[idx]:0.6f} {rws[idx]:0.6f}"
table += f"{row}\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks really great!

Comment on lines 64 to 80
metavar="NAME",
dest="saveloc",
help="""Save the manipulated PDF to a file SLOC. Use \'-\' for stdout.
When --multiple is enabled, each manipulated PDF is saved to a directory SLOC.
Names for each manipulated PDF can be specified using --snf.""",
help="""Save the manipulated PDF to a file named NAME. Use \'-\' for stdout.
When --multiple is enabled, save each manipulated PDF as a file in a directory named NAME;
you can specify names for each saved PDF file using --save-names-file.""",
)
parser.add_option(
'--snf',
'--save-names-file',
metavar="SNFILE",
metavar="NAMESFILE",
dest="snfile",
help="""Used only when -s and --multiple are enabled.
help="""Used only when both -s and --multiple are enabled.
Specify names for each manipulated PDF when saving (see -s) using a serial file
SNFILE. Each target PDF should be an entry in SFILE with a key save_morph_as
whose value specifies the name to save the manipulated PDF as.""",
NAMESFILE. The format of NAMESFILE should be as follows: each target PDF
is an entry in NAMESFILE. For each entry, there should be a key 'save_morph_as'
whose value specifies the name to save the manipulated PDF as.
(See sample names files in the pdfmorph tutorial).""",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hope these changes are more understandable. I will add more files to tutorial and quickstart tutorial if needed.

@@ -46,18 +46,6 @@ jobs:
conda install --file requirements/test.txt
pip install .

# redownload diffpy.utils pre 3.2.0 release, use https since public repo
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No more downloading from github: use conda-forge release

Comment on lines 145 to 155
group.add_option(
'--plot-parameter',
metavar='PLOTPARAM',
dest='plotparam',
help="""Used when both plotting and --multiple are enabled.
Choose a PLOTPARAM to plot for each morph (i.e. adding --pp=Pearson means the program
will display a plot of the Pearson correlation coefficient for each morph-target pair).
PLOTPARAM is not case sensitive, so both Pearson and pearson indicate the same parameter.
When PLOTPARAM is not specified, Rw values for each morph-target pair will be plotted.
PLOTPARAM will be displayed as the vertical axis label for the plot.""",
)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only one new option added this commit. All others were just moved (put into separate options group)

pdfplot.plot_rws(target_file_names, plot_results["Rw"])
try:
if field_list is not None:
pdfplot.plot_param(field_list, param_list, param_name, field)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No tests written for plotting, but have some to check failure conditions

@sbillinge sbillinge merged commit b53381b into diffpy:main Feb 27, 2024
@Sparks29032 Sparks29032 deleted the morph_sequence branch May 30, 2024 21:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Determine default plot label behavior allow people to save the morphed PDF file as a new G(r) file
2 participants