Skip to content
This repository has been archived by the owner on Mar 22, 2024. It is now read-only.

add export options #18

Merged
merged 6 commits into from
Nov 25, 2020
Merged

add export options #18

merged 6 commits into from
Nov 25, 2020

Conversation

manonreau
Copy link
Collaborator

Add different export options:

  • Export training, evaluation and test data in HDF5 (mol name, prediction, y)

  • Chose data export frequency (all or intermediate (every 5 epochs for now))

  • Export the best or the last model

Copy link
Member

@NicoRenaud NicoRenaud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great addition ! One small remark about opening the h5 file with 'w'. It automatically erase any file with that name which can be dangerous. To avoid future issue where we accidentally erase data we want to keep I would change that. Otherwise very good !!

"""

# Output file
fname = os.path.join(self.outdir, hdf5)
self.f5 = h5py.File(fname, 'w')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by default this will erase the file fname if it exists which can be dangerous if you rerun an experiment but want to keep the previous results. I would add a check to see if the file exists and if it does change the name with a number. train_data_001.hdf5

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done ! If train_data_001.hdf5 exists, then I change it to train_data_002.hdf5 and so on.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great !

if (save_epoch == 'all') or (epoch == nepoch) :
self._export_epoch_hdf5(epoch, self.data)

elif (save_epoch == 'intermediate') and (epoch%5 == 0) :
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could add save_every=n as named argument of the function so that we can decide how often we save

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, defaults is set to 5

Comment on lines +547 to +589
def _export_epoch_hdf5(self, epoch, data):
"""Export the epoch data to the hdf5 file.
Export the data of a given epoch in train/valid/test group.
In each group are stored the predcited values (outputs),
ground truth (targets) and molecule name (mol).
Args:
epoch (int): index of the epoch
data (dict): data of the epoch
"""

# create a group
grp_name = 'epoch_%04d' % epoch
grp = self.f5.create_group(grp_name)

grp.attrs['task'] = self.task
grp.attrs['target'] = self.target
grp.attrs['batch_size'] = self.batch_size

# loop over the pass_type : train/valid/test
for pass_type, pass_data in data.items():

# we don't want to breack the process in case of issue
try:

# create subgroup for the pass
sg = grp.create_group(pass_type)

# loop over the data : target/output/molname
for data_name, data_value in pass_data.items():

# mol name is a bit different
# since there are strings
if data_name == 'mol':
string_dt = h5py.special_dtype(vlen=str)
sg.create_dataset(
data_name, data=data_value, dtype=string_dt)

# output/target values
else:
sg.create_dataset(data_name, data=data_value)

except TypeError:
logger.exception("Error in export epoch to hdf5")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice :) !

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took that part from Deeprank;)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looked familiar :)

"""

# Output file
fname = os.path.join(self.outdir, hdf5)
self.f5 = h5py.File(fname, 'w')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great !

Comment on lines +547 to +589
def _export_epoch_hdf5(self, epoch, data):
"""Export the epoch data to the hdf5 file.
Export the data of a given epoch in train/valid/test group.
In each group are stored the predcited values (outputs),
ground truth (targets) and molecule name (mol).
Args:
epoch (int): index of the epoch
data (dict): data of the epoch
"""

# create a group
grp_name = 'epoch_%04d' % epoch
grp = self.f5.create_group(grp_name)

grp.attrs['task'] = self.task
grp.attrs['target'] = self.target
grp.attrs['batch_size'] = self.batch_size

# loop over the pass_type : train/valid/test
for pass_type, pass_data in data.items():

# we don't want to breack the process in case of issue
try:

# create subgroup for the pass
sg = grp.create_group(pass_type)

# loop over the data : target/output/molname
for data_name, data_value in pass_data.items():

# mol name is a bit different
# since there are strings
if data_name == 'mol':
string_dt = h5py.special_dtype(vlen=str)
sg.create_dataset(
data_name, data=data_value, dtype=string_dt)

# output/target values
else:
sg.create_dataset(data_name, data=data_value)

except TypeError:
logger.exception("Error in export epoch to hdf5")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looked familiar :)

@manonreau manonreau merged commit 64039b0 into master Nov 25, 2020
@NicoRenaud NicoRenaud deleted the export branch November 26, 2020 08:17
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants