Skip to content

Writing Tutorials #81

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 12 commits into from
Closed
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion diffpy/pdfmorph/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
"""

# obtain version information
__version__ = '0.0.1'
from diffpy.pdfmorph.version import __version__

# top-level import
from diffpy.pdfmorph.pdfmorph_api import pdfmorph, morph_default_config, plot_morph
Expand Down
199 changes: 170 additions & 29 deletions diffpy/pdfmorph/pdfmorphapp.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,12 +16,10 @@
from __future__ import print_function

import sys
import os
import os.path
from pathlib import Path

import numpy
from diffpy.pdfmorph import __version__
from diffpy.pdfmorph.version import __version__
import diffpy.pdfmorph.tools as tools
import diffpy.pdfmorph.pdfplot as pdfplot
import diffpy.pdfmorph.morphs as morphs
Expand All @@ -31,6 +29,7 @@

def create_option_parser():
import optparse
prog_short = Path(sys.argv[0]).name # Program name, compatible w/ all OS paths

class CustomParser(optparse.OptionParser):
def __init__(self, *args, **kwargs):
Expand Down Expand Up @@ -62,9 +61,27 @@ def custom_error(self, msg):
parser.add_option(
'-s',
'--save',
metavar="FILE",
metavar="SFILE",
dest="savefile",
help="Save manipulated PDF from FILE1 to FILE. Use \'-\' for stdout.",
help="Save manipulated PDF from FILE1 to SFILE. Use \'-\' for stdout.",
)
parser.add_option(
'--sequence',
dest="sequence",
action="store_true",
help=f"""Changes usage to \'{prog_short} [options] FILE DIRECTORY\'. FILE
will be morphed with each file in DIRECTORY as target.
Files in DIRECTORY are sorted by alphabetical order unless
--temperature is enabled. Plotting and saving options are for
a Rw plot and table respectively.""",
)
parser.add_option(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is still really sorting by filename here so I think this is not needed.

Another option would be to really store by temperature where temperature is piece of metadata. Then we would need to know where the metadata came from. The gr files allow metadata in the header region so we could parse the file for a "temperature" field and then use the value of that item in alpha-numeric order. If we offer that, we could allow sorting by any metadata field, so allow the option for a user to specify a string that matches any field in the header and then sort alphanumercically by the value of those items. This seems better to me. What do you think?

I guess default behavior is generally to make the higher-T PDF the target and the lower-T the morph (it is easier to broaden than sharpen data with morphs). This may frustrate a user who has some fancy sorting that is not temperature, like a battery discharging or something and it is becoming less ordered in time or sthg, or for some other reason, so we could offer a --reverse option so the sorting is done with the higher value morphing onto a lower value target.

Many users might have a table in their notebooks or in a csv file that does a mapping of "temperature" onto the filename (or any other item-key onto filename). Another approach would be to have tools that can parse such a file and collect the metadata that way. Moving forwards we would like to move in the direction of databases, so being able to parse the same info from json serialized data also would be useful. In this world, we would allow the user to specify a file (support csv or json or yaml formats?) that maps metadata fields to a filename. Then this could be the target we use for our sorting. If we had tools that could take this information and inject it into .gr files as an option, or vice-versa, parse out the gr file headers into metadata files, it would be good. This would then be very helpful in general.

With this in mind, I would suggest that we have a --sort-by that takes a string which is the field we want to sort by, a --reverse which reverses the sort sense when true. Your --temperature option could be omitted, or could be there but is a helper meaning "sort-by temperature". By default it could look for medatada..<ext> as the metadata filename where could be ["csv", "json", "yaml", "yml"] or sthg, but also have a --metadata-filename option that overrides this. What do you think?

These issues are way broader than pdfmorph, so we could add a bunch of capabilities in diffpy.utils that do the file handling, metadata serialization and deserialization in different formats so these things are available across all of diffpy (and beyond). That could be a separate little mini project.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a way more robust solution. I can start with reading metadata from headers. Could I have an example of a separate metadata mapping file before I work on that part? I can include both as standalone functions in tools so we can move them to diffpy.utils later.

'--temperature',
dest="temp",
action="store_true",
help="""Used with --sequence to sort files in DIRECTORY by temperature.
File names in DIRECTORY should end in _#K.gr or _#K.cgr (where # is a float)
to use this option.""",
)
parser.add_option(
'--rmin', type="float", help="Minimum r-value to use for PDF comparisons."
Expand Down Expand Up @@ -179,12 +196,6 @@ def custom_error(self, msg):
dest="plot",
help="Do not show the plot.",
)
group.add_option(
'--usefilenames',
action="store_true",
dest="usefilenames",
help="Use the file names as labels on plot."
)
group.add_option(
'--mlabel',
metavar="MLABEL",
Expand Down Expand Up @@ -214,6 +225,7 @@ def custom_error(self, msg):
)

# Defaults
parser.set_defaults(sequence=False)
parser.set_defaults(plot=True)
parser.set_defaults(refine=True)
parser.set_defaults(pearson=False)
Expand All @@ -227,9 +239,130 @@ def custom_error(self, msg):
def main():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's put main at the bottom right next to if __name__ ==....

parser = create_option_parser()
(opts, pargs) = parser.parse_args()
if opts.sequence:
multiple_morphs(parser, opts, pargs, stdout_flag=True)
else:
single_morph(parser, opts, pargs, stdout_flag=True)


def multiple_morphs(parser, opts, pargs, stdout_flag):
# Custom error messages since usage is distinct when --sequence tag is applied
if len(pargs) < 2:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a bit brittle and unmaintainable to me since we may add or remove parser options in the future (see my comments above). Can we find a more robust way to do this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since most parser options are shared by single_morph and multiple_morphs, changing most parser options would only require changes in the single_morph function. Options that are not shared between them (plotting and saving since different items are plotted/saved) are the only ones kept separate right now.

parser.custom_error("You must supply FILE and DIRECTORY. Go to --help for usage.")
elif len(pargs) > 2:
parser.custom_error("Too many arguments. Go to --help for usage.")

# Parse paths
morph_file = Path(pargs[0])
if not morph_file.is_file():
parser.custom_error(f"{morph_file} is not a file. Go to --help for usage.")
target_directory = Path(pargs[1])
if not target_directory.is_dir():
parser.custom_error(f"{target_directory} is not a directory. Go to --help for usage.")

# Sort files in directory
target_list = list(target_directory.iterdir())
if opts.temp:
# Sort by temperature
try:
target_list = tools.temperature_sort(target_list)
except ValueError:
parser.custom_error("All file names in directory must end in _#K.gr or _#K.cgr (where # is a float) to use "
"the --temperature option.")
else:
# Default is alphabetical sort
target_list.sort()

# Disable single morph plotting and saving
plot_opt = opts.plot
opts.plot = False
save_opt = opts.savefile
opts.savefile = None

# Do not morph morph_file against itself if it is in the same directory
if morph_file in target_list:
target_list.remove(morph_file)

# Morph morph_file against all other files in target_directory
results = []
for target_file in target_list:
if target_file.is_file:
results.append([
target_file.name,
single_morph(parser, opts, [morph_file, target_file], stdout_flag=False),
])

# Input parameters used for every morph
inputs = [None, None]
inputs[0] = f"# Morphed file: {morph_file.name}"
inputs[1] = "\n# Input morphing parameters:"
inputs[1] += f"\n# scale = {opts.scale}"
inputs[1] += f"\n# stretch = {opts.stretch}"
inputs[1] += f"\n# smear = {opts.smear}"

# If print enabled
if stdout_flag:
# Separated for saving
print(f"\n{inputs[0]}{inputs[1]}")

# Results from each morph
for entry in results:
outputs = f"\n# Target: {entry[0]}\n"
outputs += "# Optimized morphing parameters:\n"
outputs += "\n".join(f"# {i[0]} = {i[1]:.6f}" for i in entry[1])
print(outputs)

rws = []
target_labels = []
results_length = len(results)
for entry in results:
if opts.temp:
name = entry[0]
target_labels.append(tools.extract_temperatures([name])[0])
else:
target_labels.append(entry[0])
for item in entry[1]:
if item[0] == "Rw":
rws.append(item[1])

if len(pargs) != 2:
parser.error("You must supply FILE1 and FILE2")
if save_opt is not None:
# Save table of Rw values
try:
with open(save_opt, 'w') as outfile:
# Header
print(f"{inputs[0]}\n{inputs[1]}", file=outfile)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks a bit too chatty to me? Maybe enclose it in a if verbose: (collect verbose as -v, --verbose). In general I prefer my code to be less "chatty" because then the outputs just start looking like noise and important messages get lost and overlooked. using verbose allows debugging when things are going wrong and the user can't figure out why.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I also change this for single morphs as well? (Line 526 of this file and onward.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, everywhere......

if opts.temp:
print(f"\n# L T(K) Rw", file=outfile)
else:
print(f"\n# L Target Rw", file=outfile)

# Table
for idx in range(results_length):
print(f"{target_labels[idx]} {rws[idx]}", file=outfile)
outfile.close()

# Output to stdout
path_name = Path(outfile.name).absolute()
save_message = f"\n# Rw table saved to {path_name}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe don't prepend a blank line? It rarely looks good.

Copy link
Collaborator Author

@Sparks29032 Sparks29032 Jul 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a sample output. I wanted to keep it spaced out from the other outputs. However, after adding the --verbose option, this may change.

# Input morphing parameters:
# scale = None
# stretch = None
# smear = None

# Target: SrFe2As2_174K.gr
# Optimized morphing parameters:
# rmax = 100.010000
# rmin = 0.000000
# rstep = 0.010000
# Rw = 0.072189
# Pearson = 0.997625

# Target: SrFe2As2_180K.gr
# Optimized morphing parameters:
# rmax = 100.010000
# rmin = 0.000000
# rstep = 0.010000
# Rw = 0.085544
# Pearson = 0.996714

# Target: SrFe2As2_186K.gr
# Optimized morphing parameters:
# rmax = 100.010000
# rmin = 0.000000
# rstep = 0.010000
# Rw = 0.098680
# Pearson = 0.995607

# Target: SrFe2As2_192K.gr
# Optimized morphing parameters:
# rmax = 100.010000
# rmin = 0.000000
# rstep = 0.010000
# Rw = 0.117966
# Pearson = 0.993501

# Target: SrFe2As2_198K.gr
# Optimized morphing parameters:
# rmax = 100.010000
# rmin = 0.000000
# rstep = 0.010000
# Rw = 0.150522
# Pearson = 0.989150

# Target: SrFe2As2_204K.gr
# Optimized morphing parameters:
# rmax = 100.010000
# rmin = 0.000000
# rstep = 0.010000
# Rw = 0.163814
# Pearson = 0.987273

# Target: SrFe2As2_210K.gr
# Optimized morphing parameters:
# rmax = 100.010000
# rmin = 0.000000
# rstep = 0.010000
# Rw = 0.175156
# rmax = 100.010000
# rmin = 0.000000
# rstep = 0.010000
# Rw = 0.224387
# Pearson = 0.978108

# Rw table saved to C:\Users\Spark\desktop\savefile.txt

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I were a user I may prefer it in a csv format for easier porting to other programs? this will also be better on PDFitc. No need to print anything to screen, though since you have it you may as well leave it but nest it in if verbose. A summary that would be useful on screen would just be filename and Rw so I can quickly find the jump (i.e., the phase transition) without having to open a file. What do you think?

It makes sense to me to print to screen when it is a single pair of files (in that case no way I want to open up a file from disc"

print(save_message)

# Save failed
except FileNotFoundError as e:
save_fail_message = "\nUnable to save to designated location"
print(save_fail_message)
parser.custom_error(str(e))

if plot_opt:
pdfplot.plot_rws(target_labels, rws, opts.temp)

return results


def single_morph(parser, opts, pargs, stdout_flag):
if len(pargs) < 2:
parser.error("You must supply FILE1 and FILE2.")
elif len(pargs) > 2:
parser.error("Too many arguments.")

# Get the PDFs
x_morph, y_morph = getPDFFromFile(pargs[0])
Expand Down Expand Up @@ -355,15 +488,23 @@ def main():
morphs_in += f"\n# scale = {scale_in}"
morphs_in += f"\n# stretch = {stretch_in}"
morphs_in += f"\n# smear = {smear_in}"
print(morphs_in)

items = list(config.items())
items.sort()
# Output morph parameters
morph_results = list(config.items())
morph_results.sort()

# Ensure Rw, Pearson last two outputs
morph_results.append(("Rw", rw))
morph_results.append(("Pearson", pcc))

output = "\n# Optimized morphing parameters:\n"
output += "\n".join(f"# {i[0]} = {i[1]:.6f}" for i in items)
output += f"\n# Rw = {rw:.6f}"
output += f"\n# Pearson = {pcc:.6f}"
print(output)
output += "\n".join(f"# {i[0]} = {i[1]:.6f}" for i in morph_results)

# No stdout output when running morph sequence
if stdout_flag:
print(morphs_in)
print(output)

if opts.savefile is not None:
path_name = Path(pargs[0]).absolute()
header = "# PDF created by pdfmorph\n"
Expand Down Expand Up @@ -397,14 +538,14 @@ def main():

if opts.plot:
pairlist = [chain.xy_morph_out, chain.xy_target_out]
labels = ["morph", "target"] # Default label names
if opts.usefilenames:
labels = [pargs[0], pargs[1]]
else:
if opts.mlabel is not None:
labels[0] = opts.mlabel
if opts.tlabel is not None:
labels[1] = opts.tlabel
labels = [pargs[0], pargs[1]] # Default is to use file names

# If user chooses labels
if opts.mlabel is not None:
labels[0] = opts.mlabel
if opts.tlabel is not None:
labels[1] = opts.tlabel

# Plot extent defaults to calculation extent
pmin = opts.pmin if opts.pmin is not None else opts.rmin
pmax = opts.pmax if opts.pmax is not None else opts.rmax
Expand All @@ -415,7 +556,7 @@ def main():
pairlist, labels, rmin=pmin, rmax=pmax, maglim=maglim, mag=mag, rw=rw, l_width=l_width
)

return
return morph_results


def getPDFFromFile(fn):
Expand Down
47 changes: 47 additions & 0 deletions diffpy/pdfmorph/pdfplot.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
"""Collection of plotting functions for PDFs."""

import matplotlib.pyplot as plt
import diffpy.pdfmorph.tools as tools
from bg_mpl_stylesheet.bg_mpl_stylesheet import bg_mpl_style
import numpy

Expand Down Expand Up @@ -197,6 +198,52 @@ def comparePDFs(
return


def plot_rws(target_labels, rws, temp_flag=False):
"""
Plot Rw values for multiple morphs.
:param target_labels: Names (or temperatures if --temperature is enabled) of each file acting as target for the morph.
:type target_labels: list
:param rws: Contains the Rw values corresponding to each file.
:type rws: list
:param temp_flag: When True, temperature is extracted from file names in target_files. Then, a line chart of
Rw versus temperature is made. Otherwise (default), it plots a bar chart of Rw values per file.
:type temp_flag: bool
"""

# If we can extract temperature data
if temp_flag:
temps = target_labels

# Plot Rw vs Temperature
plt.plot(temps, rws, linestyle='-', marker='o')
plt.ylabel(r"$R_w$")
plt.xlabel(r"Temperature ($K$)")
plt.minorticks_on()

# Create bar chart for each file
else:
file_names = target_labels
# Ensure file names do not crowd
bar_size = 1
max_len = bar_size
for name in file_names:
max_len = max(max_len, len(name))
angle = numpy.arccos(bar_size / max_len)
angle *= 180 / numpy.pi # Convert to degrees
plt.xticks(rotation=angle)

# Plot Rw for each file
plt.bar(file_names, rws)
plt.ylabel(r"$R_w$")
plt.xlabel(r"Target File")

# Show plot
plt.tight_layout()
plt.show()

return


def truncatePDFs(r, gr, rmin=None, rmax=None):
"""Truncate a PDF to specified bounds.

Expand Down
42 changes: 42 additions & 0 deletions diffpy/pdfmorph/tests/test_morphsequence.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
#!/usr/bin/env python
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

usually we use the pattern that the test module that that tests modeul <filename> is called test_<filename>. This pattern may not be being used in pdfmorph which whose provenance is older, but when we edit code, we like the edits to follow the new standards if possible. I am not sure if it makes sense here, but if it does can we use this new pattern?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll move these to a file called test_pdfmorphapp



import unittest
from pathlib import Path

from diffpy.pdfmorph.pdfmorphapp import create_option_parser, single_morph, multiple_morphs

thisfile = locals().get('__file__', 'file.py')
tests_dir = Path(thisfile).parent.resolve()
testsequence_dir = tests_dir.joinpath("testdata").joinpath("testsequence")


class TestMorphSequence(unittest.TestCase):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

per my comment above, we are moving to using pytest because the syntax is easier and personally I never learned unittest syntax so I find it harder to maintain. As per above, we won't go back and change old unittest code to pytest, but can we use pytest syntax in all new tests and older tests that we revisit for some reason?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll figure out how to move setUp() or reconstruct it. Should not be too difficult to change

def setUp(self):
self.parser = create_option_parser()
filenames = ["g_174K.gr", "f_180K.gr", "e_186K.gr", "d_192K.gr", "c_198K.gr", "b_204K.gr", "a_210K.gr"]
(self.opts, pargs) = self.parser.parse_args(["--scale", "1", "--stretch", "0", "-n", "--temperature"])
self.testfiles = []
for filename in filenames:
self.testfiles.append(testsequence_dir.joinpath(filename))
return

def test_morphsequence(self):
# Run multiple single morphs
single_results = []
morph_file = self.testfiles[-1]
for target_file in self.testfiles[:-1]:
pargs = [morph_file, target_file]
single_results.append(single_morph(self.parser, self.opts, pargs, stdout_flag=False))
pargs = [morph_file, testsequence_dir]

# Run a morph sequence
sequence_results = multiple_morphs(self.parser, self.opts, pargs, stdout_flag=False)

# Compare results
for idx in range(len(self.testfiles[:-1])):
assert sequence_results[idx][1] == single_results[idx]


if __name__ == '__main__':
unittest.main()
Loading