Skip to content

Writing Tutorials #81

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 12 commits into from
Prev Previous commit
Next Next commit
Add plotting, saving options
  • Loading branch information
Andrew Yang committed Jul 6, 2023
commit 72f735e28c289faf48af43ffefd57ddc00c0f713
91 changes: 74 additions & 17 deletions diffpy/pdfmorph/pdfmorphapp.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,13 +71,13 @@ def custom_error(self, msg):
action="store_true",
help=f"""Changes usage to \'{prog_short} [options] FILE DIRECTORY\'. FILE
will be morphed with each file in DIRECTORY as target.
Files in directory are sorted by alphabetical order unless
--temperature is enabled. Plotting and saving are disabled
when this option is enabled.""",
Files in DIRECTORY are sorted by alphabetical order unless
--temperature is enabled. Plotting and saving options are for
a Rw plot and table respectively.""",
)
parser.add_option(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is still really sorting by filename here so I think this is not needed.

Another option would be to really store by temperature where temperature is piece of metadata. Then we would need to know where the metadata came from. The gr files allow metadata in the header region so we could parse the file for a "temperature" field and then use the value of that item in alpha-numeric order. If we offer that, we could allow sorting by any metadata field, so allow the option for a user to specify a string that matches any field in the header and then sort alphanumercically by the value of those items. This seems better to me. What do you think?

I guess default behavior is generally to make the higher-T PDF the target and the lower-T the morph (it is easier to broaden than sharpen data with morphs). This may frustrate a user who has some fancy sorting that is not temperature, like a battery discharging or something and it is becoming less ordered in time or sthg, or for some other reason, so we could offer a --reverse option so the sorting is done with the higher value morphing onto a lower value target.

Many users might have a table in their notebooks or in a csv file that does a mapping of "temperature" onto the filename (or any other item-key onto filename). Another approach would be to have tools that can parse such a file and collect the metadata that way. Moving forwards we would like to move in the direction of databases, so being able to parse the same info from json serialized data also would be useful. In this world, we would allow the user to specify a file (support csv or json or yaml formats?) that maps metadata fields to a filename. Then this could be the target we use for our sorting. If we had tools that could take this information and inject it into .gr files as an option, or vice-versa, parse out the gr file headers into metadata files, it would be good. This would then be very helpful in general.

With this in mind, I would suggest that we have a --sort-by that takes a string which is the field we want to sort by, a --reverse which reverses the sort sense when true. Your --temperature option could be omitted, or could be there but is a helper meaning "sort-by temperature". By default it could look for medatada..<ext> as the metadata filename where could be ["csv", "json", "yaml", "yml"] or sthg, but also have a --metadata-filename option that overrides this. What do you think?

These issues are way broader than pdfmorph, so we could add a bunch of capabilities in diffpy.utils that do the file handling, metadata serialization and deserialization in different formats so these things are available across all of diffpy (and beyond). That could be a separate little mini project.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a way more robust solution. I can start with reading metadata from headers. Could I have an example of a separate metadata mapping file before I work on that part? I can include both as standalone functions in tools so we can move them to diffpy.utils later.

'--temperature',
dest="temperature_sort",
dest="temp",
action="store_true",
help="""Used with --sequence to sort files in DIRECTORY by temperature.
File names in DIRECTORY should end in _#K.gr or _#K.cgr
Expand Down Expand Up @@ -240,8 +240,6 @@ def main():
parser = create_option_parser()
(opts, pargs) = parser.parse_args()
if opts.sequence:
opts.plot = False # Disable plotting
opts.savefile = None # Disable saving
multiple_morphs(parser, opts, pargs, stdout_flag=True)
else:
single_morph(parser, opts, pargs, stdout_flag=True)
Expand All @@ -264,32 +262,48 @@ def multiple_morphs(parser, opts, pargs, stdout_flag):

# Sort files in directory
target_list = list(target_directory.iterdir())
if opts.temperature_sort:
if opts.temp:
# Sort by temperature
target_list = tools.temperature_sort(target_list)
try:
target_list = tools.temperature_sort(target_list)
except ValueError:
parser.custom_error("All file names in directory must end in _###K.gr or _###K.cgr to use "
"the --temperature option.")
else:
# Default is alphabetical sort
target_list.sort()

# Disable single morph plotting and saving
plot_opt = opts.plot
opts.plot = False
save_opt = opts.savefile
opts.savefile = None

# Do not morph morph_file against itself if it is in the same directory
if morph_file in target_list:
target_list.remove(morph_file)

# Morph morph_file against all other files in target_directory
results = []
for target_file in target_list:
# Only morph morph_file against different files in target_directory
if target_file.is_file and morph_file != target_file:
if target_file.is_file:
results.append([
target_file.name,
single_morph(parser, opts, [morph_file, target_file], stdout_flag=False),
])

# Input parameters used for every morph
inputs = [None, None]
inputs[0] = f"# Morphed file: {morph_file.name}"
inputs[1] = "\n# Input morphing parameters:"
inputs[1] += f"\n# scale = {opts.scale}"
inputs[1] += f"\n# stretch = {opts.stretch}"
inputs[1] += f"\n# smear = {opts.smear}"

# If print enabled
if stdout_flag:
# Input parameters used for every morph
inputs = "\n# Input morphing parameters:"
inputs += f"\n# scale = {opts.scale}"
inputs += f"\n# stretch = {opts.stretch}"
inputs += f"\n# smear = {opts.smear}"

print(inputs)
# Separated for saving
print(f"\n{inputs[0]}{inputs[1]}")

# Results from each morph
for entry in results:
Expand All @@ -298,6 +312,49 @@ def multiple_morphs(parser, opts, pargs, stdout_flag):
outputs += "\n".join(f"# {i[0]} = {i[1]:.6f}" for i in entry[1])
print(outputs)

rws = []
target_labels = []
results_length = len(results)
for entry in results:
if opts.temp:
name = entry[0]
target_labels.append(tools.extract_temperatures([name])[0])
else:
target_labels.append(entry[0])
for item in entry[1]:
if item[0] == "Rw":
rws.append(item[1])

if save_opt is not None:
# Save table of Rw values
try:
with open(save_opt, 'w') as outfile:
# Header
print(f"{inputs[0]}\n{inputs[1]}", file=outfile)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks a bit too chatty to me? Maybe enclose it in a if verbose: (collect verbose as -v, --verbose). In general I prefer my code to be less "chatty" because then the outputs just start looking like noise and important messages get lost and overlooked. using verbose allows debugging when things are going wrong and the user can't figure out why.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I also change this for single morphs as well? (Line 526 of this file and onward.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, everywhere......

if opts.temp:
print(f"\n# L T(K) Rw", file=outfile)
else:
print(f"\n# L Target Rw", file=outfile)

# Table
for idx in range(results_length):
print(f"{target_labels[idx]} {rws[idx]}", file=outfile)
outfile.close()

# Output to stdout
path_name = Path(outfile.name).absolute()
save_message = f"\n# Rw table saved to {path_name}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe don't prepend a blank line? It rarely looks good.

Copy link
Collaborator Author

@Sparks29032 Sparks29032 Jul 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a sample output. I wanted to keep it spaced out from the other outputs. However, after adding the --verbose option, this may change.

# Input morphing parameters:
# scale = None
# stretch = None
# smear = None

# Target: SrFe2As2_174K.gr
# Optimized morphing parameters:
# rmax = 100.010000
# rmin = 0.000000
# rstep = 0.010000
# Rw = 0.072189
# Pearson = 0.997625

# Target: SrFe2As2_180K.gr
# Optimized morphing parameters:
# rmax = 100.010000
# rmin = 0.000000
# rstep = 0.010000
# Rw = 0.085544
# Pearson = 0.996714

# Target: SrFe2As2_186K.gr
# Optimized morphing parameters:
# rmax = 100.010000
# rmin = 0.000000
# rstep = 0.010000
# Rw = 0.098680
# Pearson = 0.995607

# Target: SrFe2As2_192K.gr
# Optimized morphing parameters:
# rmax = 100.010000
# rmin = 0.000000
# rstep = 0.010000
# Rw = 0.117966
# Pearson = 0.993501

# Target: SrFe2As2_198K.gr
# Optimized morphing parameters:
# rmax = 100.010000
# rmin = 0.000000
# rstep = 0.010000
# Rw = 0.150522
# Pearson = 0.989150

# Target: SrFe2As2_204K.gr
# Optimized morphing parameters:
# rmax = 100.010000
# rmin = 0.000000
# rstep = 0.010000
# Rw = 0.163814
# Pearson = 0.987273

# Target: SrFe2As2_210K.gr
# Optimized morphing parameters:
# rmax = 100.010000
# rmin = 0.000000
# rstep = 0.010000
# Rw = 0.175156
# rmax = 100.010000
# rmin = 0.000000
# rstep = 0.010000
# Rw = 0.224387
# Pearson = 0.978108

# Rw table saved to C:\Users\Spark\desktop\savefile.txt

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I were a user I may prefer it in a csv format for easier porting to other programs? this will also be better on PDFitc. No need to print anything to screen, though since you have it you may as well leave it but nest it in if verbose. A summary that would be useful on screen would just be filename and Rw so I can quickly find the jump (i.e., the phase transition) without having to open a file. What do you think?

It makes sense to me to print to screen when it is a single pair of files (in that case no way I want to open up a file from disc"

print(save_message)

# Save failed
except FileNotFoundError as e:
save_fail_message = "\nUnable to save to designated location"
print(save_fail_message)
parser.custom_error(str(e))

if plot_opt:
pdfplot.plot_rws(target_labels, rws, opts.temp)

return results


Expand Down
47 changes: 47 additions & 0 deletions diffpy/pdfmorph/pdfplot.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
"""Collection of plotting functions for PDFs."""

import matplotlib.pyplot as plt
import diffpy.pdfmorph.tools as tools
from bg_mpl_stylesheet.bg_mpl_stylesheet import bg_mpl_style
import numpy

Expand Down Expand Up @@ -207,6 +208,52 @@ def comparePDFs(
return


def plot_rws(target_labels, rws, temp_flag=False):
"""
Plot Rw values for multiple morphs.
:param target_labels: Names (or temperatures if --temperature is enabled) of each file acting as target for the morph.
:type target_labels: list
:param rws: Contains the Rw values corresponding to each file.
:type rws: list
:param temp_flag: When True, temperature is extracted from file names in target_files. Then, a line chart of
Rw versus temperature is made. Otherwise (default), it plots a bar chart of Rw values per file.
:type temp_flag: bool
"""

# If we can extract temperature data
if temp_flag:
temps = target_labels

# Plot Rw vs Temperature
plt.plot(temps, rws, linestyle='-', marker='o')
plt.ylabel(r"$R_w$")
plt.xlabel(r"Temperature ($K$)")
plt.minorticks_on()

# Create bar chart for each file
else:
file_names = target_labels
# Ensure file names do not crowd
bar_size = 5
max_len = bar_size
for name in file_names:
max_len = max(max_len, len(name))
angle = numpy.arccos(bar_size / max_len)
angle *= 180 / numpy.pi # Convert to degrees
plt.xticks(rotation=angle)

# Plot Rw for each file
plt.bar(file_names, rws)
plt.ylabel(r"$R_w$")
plt.xlabel(r"Target File")

# Show plot
plt.tight_layout()
plt.show()

return


def truncatePDFs(r, gr, rmin=None, rmax=None):
"""Truncate a PDF to specified bounds.

Expand Down
35 changes: 25 additions & 10 deletions diffpy/pdfmorph/tools.py
Original file line number Diff line number Diff line change
Expand Up @@ -113,21 +113,36 @@ def readPDF(fname):


def nn_value(val, name):
# Convenience function for ensuring certain non-negative inputs
"""Convenience function for ensuring certain non-negative inputs."""
if val < 0:
negative_value_warning = f"\n# Negative value for {name} given. Using absolute value instead."
print(negative_value_warning)
return -val
return val


def temperature_sort(filenames):
def temperature_sort(filepaths):
"""Sort a list of files by temperatures encoded in their name. Files should all end in _###K.gr or _###K.cgr."""

# Get temperature from file names
filenames = []
for path in filepaths:
filenames.append(path.name)
temps = extract_temperatures(filenames)

# Sort files (whose paths are contained in filenames) ending in _###K.gr/_###K.cgr by ###
for idx in range(len(filenames)):
filename = filenames[idx].name
s_index = filename.rfind("_") # Start of temperature value
e_index = filename.rfind("K") # End of temperature value
temp = float(filename[s_index + 1: e_index])
filenames[idx] = [filenames[idx], temp]
filenames.sort(key=lambda entry: entry[1])
return [entry[0] for entry in filenames]
for idx in range(len(filepaths)):
filepaths[idx] = [filepaths[idx], temps[idx]]
filepaths.sort(key=lambda entry: entry[1])
return [entry[0] for entry in filepaths]


def extract_temperatures(filenames):
"""Convenience function to extract temperatures from file names."""

temps = []
for name in filenames:
s_index = name.rfind("_") # Start of temperature value
e_index = name.rfind("K") # End of temperature value
temps.append(float(name[s_index + 1: e_index]))
return temps