-
Notifications
You must be signed in to change notification settings - Fork 18
Writing Tutorials #81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## main #81 +/- ##
==========================================
- Coverage 75.01% 74.91% -0.11%
==========================================
Files 36 36
Lines 1437 1443 +6
==========================================
+ Hits 1078 1081 +3
- Misses 359 362 +3
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
@Sparks29032 this looks really great. |
everything looking really great! |
Adding "morph sequence" (tentative name for performing multiple morphs in one command) functionality in morph_sequence branch. Once morph_sequence is merged, will add tutorial text in this branch and compress/zip the additionalData folder. Then this branch should be ready to merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great work. please see my comments
--temperature is enabled. Plotting and saving options are for | ||
a Rw plot and table respectively.""", | ||
) | ||
parser.add_option( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is still really sorting by filename here so I think this is not needed.
Another option would be to really store by temperature where temperature is piece of metadata. Then we would need to know where the metadata came from. The gr files allow metadata in the header region so we could parse the file for a "temperature" field and then use the value of that item in alpha-numeric order. If we offer that, we could allow sorting by any metadata field, so allow the option for a user to specify a string that matches any field in the header and then sort alphanumercically by the value of those items. This seems better to me. What do you think?
I guess default behavior is generally to make the higher-T PDF the target and the lower-T the morph (it is easier to broaden than sharpen data with morphs). This may frustrate a user who has some fancy sorting that is not temperature, like a battery discharging or something and it is becoming less ordered in time or sthg, or for some other reason, so we could offer a --reverse
option so the sorting is done with the higher value morphing onto a lower value target.
Many users might have a table in their notebooks or in a csv file that does a mapping of "temperature" onto the filename (or any other item-key onto filename). Another approach would be to have tools that can parse such a file and collect the metadata that way. Moving forwards we would like to move in the direction of databases, so being able to parse the same info from json serialized data also would be useful. In this world, we would allow the user to specify a file (support csv or json or yaml formats?) that maps metadata fields to a filename. Then this could be the target we use for our sorting. If we had tools that could take this information and inject it into .gr files as an option, or vice-versa, parse out the gr file headers into metadata files, it would be good. This would then be very helpful in general.
With this in mind, I would suggest that we have a --sort-by
that takes a string which is the field we want to sort by, a --reverse
which reverses the sort sense when true. Your --temperature
option could be omitted, or could be there but is a helper meaning "sort-by temperature". By default it could look for medatada..<ext>
as the metadata filename where could be ["csv", "json", "yaml", "yml"]
or sthg, but also have a --metadata-filename
option that overrides this. What do you think?
These issues are way broader than pdfmorph, so we could add a bunch of capabilities in diffpy.utils
that do the file handling, metadata serialization and deserialization in different formats so these things are available across all of diffpy (and beyond). That could be a separate little mini project.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a way more robust solution. I can start with reading metadata from headers. Could I have an example of a separate metadata mapping file before I work on that part? I can include both as standalone functions in tools so we can move them to diffpy.utils later.
@@ -227,9 +239,130 @@ def custom_error(self, msg): | |||
def main(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's put main at the bottom right next to if __name__ ==.
...
|
||
def multiple_morphs(parser, opts, pargs, stdout_flag): | ||
# Custom error messages since usage is distinct when --sequence tag is applied | ||
if len(pargs) < 2: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems a bit brittle and unmaintainable to me since we may add or remove parser options in the future (see my comments above). Can we find a more robust way to do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since most parser options are shared by single_morph and multiple_morphs, changing most parser options would only require changes in the single_morph function. Options that are not shared between them (plotting and saving since different items are plotted/saved) are the only ones kept separate right now.
|
||
# Output to stdout | ||
path_name = Path(outfile.name).absolute() | ||
save_message = f"\n# Rw table saved to {path_name}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe don't prepend a blank line? It rarely looks good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is a sample output. I wanted to keep it spaced out from the other outputs. However, after adding the --verbose
option, this may change.
# Input morphing parameters:
# scale = None
# stretch = None
# smear = None
# Target: SrFe2As2_174K.gr
# Optimized morphing parameters:
# rmax = 100.010000
# rmin = 0.000000
# rstep = 0.010000
# Rw = 0.072189
# Pearson = 0.997625
# Target: SrFe2As2_180K.gr
# Optimized morphing parameters:
# rmax = 100.010000
# rmin = 0.000000
# rstep = 0.010000
# Rw = 0.085544
# Pearson = 0.996714
# Target: SrFe2As2_186K.gr
# Optimized morphing parameters:
# rmax = 100.010000
# rmin = 0.000000
# rstep = 0.010000
# Rw = 0.098680
# Pearson = 0.995607
# Target: SrFe2As2_192K.gr
# Optimized morphing parameters:
# rmax = 100.010000
# rmin = 0.000000
# rstep = 0.010000
# Rw = 0.117966
# Pearson = 0.993501
# Target: SrFe2As2_198K.gr
# Optimized morphing parameters:
# rmax = 100.010000
# rmin = 0.000000
# rstep = 0.010000
# Rw = 0.150522
# Pearson = 0.989150
# Target: SrFe2As2_204K.gr
# Optimized morphing parameters:
# rmax = 100.010000
# rmin = 0.000000
# rstep = 0.010000
# Rw = 0.163814
# Pearson = 0.987273
# Target: SrFe2As2_210K.gr
# Optimized morphing parameters:
# rmax = 100.010000
# rmin = 0.000000
# rstep = 0.010000
# Rw = 0.175156
# rmax = 100.010000
# rmin = 0.000000
# rstep = 0.010000
# Rw = 0.224387
# Pearson = 0.978108
# Rw table saved to C:\Users\Spark\desktop\savefile.txt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I were a user I may prefer it in a csv format for easier porting to other programs? this will also be better on PDFitc. No need to print anything to screen, though since you have it you may as well leave it but nest it in if verbose
. A summary that would be useful on screen would just be filename and Rw so I can quickly find the jump (i.e., the phase transition) without having to open a file. What do you think?
It makes sense to me to print to screen when it is a single pair of files (in that case no way I want to open up a file from disc"
try: | ||
with open(save_opt, 'w') as outfile: | ||
# Header | ||
print(f"{inputs[0]}\n{inputs[1]}", file=outfile) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks a bit too chatty to me? Maybe enclose it in a if verbose:
(collect verbose as -v
, --verbose
). In general I prefer my code to be less "chatty" because then the outputs just start looking like noise and important messages get lost and overlooked. using verbose allows debugging when things are going wrong and the user can't figure out why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I also change this for single morphs as well? (Line 526 of this file and onward.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, everywhere......
@@ -0,0 +1,42 @@ | |||
#!/usr/bin/env python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
usually we use the pattern that the test module that that tests modeul <filename>
is called test_<filename>
. This pattern may not be being used in pdfmorph which whose provenance is older, but when we edit code, we like the edits to follow the new standards if possible. I am not sure if it makes sense here, but if it does can we use this new pattern?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll move these to a file called test_pdfmorphapp
testsequence_dir = tests_dir.joinpath("testdata").joinpath("testsequence") | ||
|
||
|
||
class TestMorphSequence(unittest.TestCase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
per my comment above, we are moving to using pytest because the syntax is easier and personally I never learned unittest syntax so I find it harder to maintain. As per above, we won't go back and change old unittest code to pytest, but can we use pytest syntax in all new tests and older tests that we revisit for some reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll figure out how to move setUp() or reconstruct it. Should not be too difficult to change
@Sparks29032 please create a new clean branch off a merged main and move over just the things we want (rst files presumably) |
Doing some reformatting and rewriting of docscripts for automodule formatting. Initial draft of an auto-reformatter included as
DocReformat.java
.Sample documentation after initial reformatting below.
Sample 1
Sample 2