pyMBE-dev · pm-blanco · Nov 12, 2024 · Sep 13, 2024 · Sep 13, 2024 · Oct 20, 2024
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -8,16 +8,22 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ## [Unreleased]
 
 ### Changed
+- The sample script `plot_HH.py` has been replaced for specific examples on how to plot data post-processed with pyMBE: `plot_branched_polyampholyte.py`, `plot_peptide.py`, and `plot_peptide_mixture_grxmc_ideal.py`. (#95)
+- Sample scripts now take the pH as an argparse input instead of looping over several pH values. This enables paralization of the sample scripts and avoids conflicts with the current post-processing pipeline. (#95)
 - Switched from `os.makedirs` to `Path().mkdir()` to prevent ocasional failure of the scripts when running them in paralel. (#91)
 - `pmb.set_reduced_units()` now redefines the reduced units instead of creating a new instance of `pint.UnitRegistry`. Therefore, the user can do operations between objects defining before and after changing the set of reduced units without getting a `ValueError` (#89)
 - `pmb.set_reduced_units()` now checks that the arguments provided by the user have the right dimensionality. (#89)
 - The constants stored as attributes in `pyMBE.pymbe_library` are now using their values stabilished in the 2019 SI. The value is taken directly from `scipy.constants` instead of being a hard-coded constant. (#86)
 - Switched to Ctest for testing, allowing to run the tests on paralel (#87)
 
 ### Added
-- Unit testing for reaction methods (#86)
+- New sample script showing how to use the analysis tools in pyMBE for post-processing time series from the sample scripts `analyze_time_series.py` (#95) 
+- A new optional argument `ignore_files`  for `lib.analysis.analyze_time_series`, enabling to provide a list of files to be ignored for post-processing of time series. (#95)
+- Functional testing for all sample scripts. (#95)
+- Unit testing for reaction methods. (#86)
 
 ### Fixed
+- `lib.analysis.get_dt` now raises a ValueError if the two first two rows of the dataframe have the same values for the time, which break the subsequent code. (#95)
 - Removed global state variables, instead they are now created by the constructor of `pyMBE.pymbe_library`. This prevents two instances of the pyMBE library to share the same memory address for their attributes. (#89)
 - Required Python dependency versions compatible with ESPResSo 4.2 (#84)
 - Fixed several deprecated paths and function names in `tutorials/pyMBE_tutorial.ipynb`. (#77, #78, #79, #80, #81)

diff --git a/README.md b/README.md
@@ -63,16 +63,17 @@ sudo apt install python3-venv
 To set up pyMBE, the users need to install its virtual environment, install its Python dependencies and configure the path to the ESPResSo build folder as follows:
 
 ```sh
-python3 -m venv pymbe
-source pymbe/bin/activate
-python3 maintainer/configure_venv.py --espresso_path=/home/user/espresso/build # adapt path
+python3 -m venv pymbe  # creates a local folder named pymbe, which contains the virtual environment
+source pymbe/bin/activate  # activates the pymbe venv
+python3 maintainer/configure_venv.py --espresso_path=/home/user/espresso/build # please, adapt the espresso path accordingly
 python3 -m pip install -r requirements.txt
-deactivate
+python3 simulation_script.py # run the espresso simulation script
+deactivate  # deactivate the virtual environment
 ```
 
-We highlight that the path `/home/user/espresso/build` is just an example of a possible
-path to the ESPResSo build folder. The user should change this path to match
-the local absolute path were ESPResSo was installed. 
+We highlight that the path `/home/user/espresso/build` is just an example of a possible path to the ESPResSo build folder. 
+The user should change this path to match the local absolute path were ESPResSo was installed.
+For more details on how to install ESPResSo, please consult the [ESPResSo installation guide](https://espressomd.github.io/doc4.2.2/installation.html).
 
 The pyMBE virtual enviroment can be deactivated at any moment:
 ```sh

diff --git a/lib/analysis.py b/lib/analysis.py
@@ -42,14 +42,15 @@ def add_data_to_df(df, data_dict, index):
                          index=index)])
     return updated_df
 
-def analyze_time_series(path_to_datafolder, filename_extension= ".csv", minus_separator = False,):
+def analyze_time_series(path_to_datafolder, filename_extension= ".csv", minus_separator = False, ignore_files=None):
     """
     Analyzes all time series stored in `path_to_datafolder` using the block binning method.
 
     Args:
         path_to_datafolder(`str`): path to the folder with the files with the time series
         filename_extension(`str`): extension of the file. Defaults to ".csv"
         minus_separator(`bool`): switch to enable the minus as a separator. Defaults to False.
+        ignore_files(`lst`): list of filenames to be ignored for the bining analysis.
 
     Returns:
         data(`Pandas.Dataframe`): Dataframe with the time averages of all the time series in the datafolder.
@@ -59,10 +60,18 @@ def analyze_time_series(path_to_datafolder, filename_extension= ".csv", minus_se
 
     """
     data=pd.DataFrame()
+    if ignore_files is None:
+        ignore_files=[]
     with os.scandir(path_to_datafolder) as subdirectory:
         # Gather all data
         for subitem in subdirectory:
             if subitem.is_file():
+                ignore_file=False
+                for file in ignore_files:
+                    if set(file.split()) == set(subitem.name.split()):
+                        ignore_file=True
+                if ignore_file:
+                    continue
                 if filename_extension in subitem.name:
                     # Get parameters from the file name
                     data_dict=get_params_from_file_name(file_name=subitem.name,
@@ -190,6 +199,8 @@ def get_dt(data, time_col = "time", relative_tolerance = 0.01, verbose = False):
         raise ValueError(f"Column \'{time_col}\' not found in columns: "+str( data.columns.to_list() ) )
     imax = data.shape[0]
     dt_init = time[1] - time[0]
+    if dt_init < 1e-8:
+        raise ValueError(f"The two first rows contain data samples at the same simulation time: time[0] = {time[0]} time[1] = {time[1]}. Post-processing of data with repeated time values is not supported because it breaks the estimation of the autocorrelation time.")
     warn_lines = []
     for i in range(1,imax):
         dt = time[i] - time[i-1]

diff --git a/samples/Beyer2024/README.md b/samples/Beyer2024/README.md
@@ -11,5 +11,4 @@ where the previous line will run the script to produce Fig. 7a in Ref.[^1] The u
 
 The optional argparse argument `--plot` controls if these scripts generate the corresponding plot or if the data is simply stored to file. We note that the format of the plots can differ from that of our publication [^1]. Theses scripts are part of the continous integration (CI) scheme of the pyMBE library and they are used to ensure that any stable version of the library reproduces the benchmarks.
 
-
-[^1]: David Beyer, Paola B. Torres, Sebastian P. Pineda, Claudio F. Narambuena, Jean-Noël Grad, Peter Košovan, Pablo M. Blanco; pyMBE: The Python-based molecule builder for ESPResSo. J. Chem. Phys. 14 July 2024; 161 (2): 022502. [https://doi.org/10.1063/5.0216389](https://doi.org/10.1063/5.0216389)
+[^1]: D. Beyer, P. B. Torres, S. P. Pineda, C. F. Narambuena, J. N. Grad, P. Košovan, P. M Blanco. J. Chem. Phys.(2024), 161 (2), 022502. doi: [10.1063/5.0216389](https://doi.org/10.1063/5.0216389).
diff --git a/samples/README.md b/samples/README.md
@@ -0,0 +1,28 @@
+# Pipeline of the sample scripts in pyMBE
+
+## Production scripts
+Production scripts show examples on how to setup various systems with pyMBE and ESPResSo.
+These scripts sample the systems for a specific set of conditions and take as argparse arguments various inputs for the simulations (for example, the pH of the solution).
+When run, the production script collect the time series of various quantities and store them in CSV files for later postprocessing.
+Such CSV files are systematically named using the input argparse arguments, allowing to backtrace from which specific system are the corresponding time series.
+Examples of production scripts are: `branched_polyampholyte.py`, `peptide_cpH.py`, `peptide_mixture_grxmc_ideal.py` and `salt_solution_gcmc.py`. 
+
+## Analysis scripts
+Analysis scripts show examples on how to analyze the time series produced with the production scripts of pyMBE.
+These scripts read the time series stored from the production scripts and post-process them, calculating the ensemble mean, the error of the mean and the auto-correlation time using the block analysis method. [^1]
+These quantities are stored together with their corresponding input conditions, extracted from the filename, into a wrapper CSV file for plotting and further analysis.
+Examples of analysis scripts are: `analyze_time_series.py`. 
+
+## Plotting scripts
+Plotting scripts show examples on how to plot data post-processed with the analysis scripts of pyMBE and on how to use the toolbox of pyMBE to calculate various analytical solutions.
+Examples of plotting scripts are: `plot_branched_polyampholyte.py`, `plot_peptide_cpH.py`, and `plot_peptide_mixture_grxmc_ideal.py`.
+
+[^1]: Janke, W. (2002). Statistical analysis of simulations: Data correlations and error estimation. Quantum simulations of complex many-body systems: from theory to algorithms, 10, 423-445. 
+
+## Example on how to use the pipeline
+The sample scripts are designed to be used in the following order: (i) production script, (ii) analysis script and (iii) plotting script. For example:
+```bash
+python3 branched_polyampholyte.py # By default, stores the time series in `time_series/branched_polyampholyte`
+python3 analyze_time_series.py --data_folder time_series/branched_polyampholyte # by default, stores the post-processed data in `time_series/branched_polyampholyte/analyzed_data.csv`
+python3 plot_branched_polyampholyte.py # By default, reads the averages data in `time_series/branched_polyampholyte/analyzed_data.csv`
+```
diff --git a/samples/analyze_time_series.py b/samples/analyze_time_series.py
@@ -0,0 +1,33 @@
+#
+# Copyright (C) 2024 pyMBE-dev team
+#
+# This file is part of pyMBE.
+#
+# pyMBE is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# pyMBE is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+import argparse
+from lib import analysis
+
+parser = argparse.ArgumentParser(description='Sample script analyze time series from the other sample scripts using the binning method.')
+parser.add_argument('--data_folder',
+                    type=str,
+                    required=True,
+                    help='path to the data folder with the time series')
+args = parser.parse_args()
+
+# Read and analyze time series
+analyzed_data=analysis.analyze_time_series(path_to_datafolder=args.data_folder,
+                                            ignore_files=["analyzed_data.csv","df.csv"])
+analyzed_data.to_csv(f"{args.data_folder}/analyzed_data.csv", 
+                        index=False)
Original file line number	Diff line number	Diff line change
Expand Up		@@ -11,5 +11,4 @@ where the previous line will run the script to produce Fig. 7a in Ref.[^1] The u

		The optional argparse argument `--plot` controls if these scripts generate the corresponding plot or if the data is simply stored to file. We note that the format of the plots can differ from that of our publication [^1]. Theses scripts are part of the continous integration (CI) scheme of the pyMBE library and they are used to ensure that any stable version of the library reproduces the benchmarks.


		[^1]: David Beyer, Paola B. Torres, Sebastian P. Pineda, Claudio F. Narambuena, Jean-Noël Grad, Peter Košovan, Pablo M. Blanco; pyMBE: The Python-based molecule builder for ESPResSo. J. Chem. Phys. 14 July 2024; 161 (2): 022502. [https://doi.org/10.1063/5.0216389](https://doi.org/10.1063/5.0216389)
		[^1]: D. Beyer, P. B. Torres, S. P. Pineda, C. F. Narambuena, J. N. Grad, P. Košovan, P. M Blanco. J. Chem. Phys.(2024), 161 (2), 022502. doi: [10.1063/5.0216389](https://doi.org/10.1063/5.0216389).