This repository contains key parts of the code for the training procedure of Blush regularization. In particular, it contains the data augmentation function and the loss function that where used in the paper. To train a model, one needs to prepare the training dataset and write the training loop.
The model definition used in the paper is available here. The cryo-EM maps used as the training dataset can be downloaded from the EMDB. A list of entry IDs along with the manually curated masks can be downloaded from Zenodo: 10.5281/zenodo.10553452
The augment_data function is designed to augment 3D cryo-electron microscopy (cryo-EM) maps for noise2noise training. It manipulates two half-maps (half1 and half2) and a mask (mask), applying various transformations and noise augmentations to enhance training data diversity and quality.
half1: numpy array- The first half of the 3D cryo-EM map.
half2: numpy array- The second half of the 3D cryo-EM map.
mask: numpy array- A mask indicating regions of interest in the cryo-EM maps.
box_size: int- The size of the output.
voxel_size: float- The size of each voxel in the cryo-EM map.
do_smooth_solvent: bool, defaultTrue- Determines whether to apply smoothing in the solvent regions.
augment_orientational_bias: bool, defaultFalse- Enables augmentation for orientational bias in the data.
augment_bfactor: bool, defaultFalse- Enables augmentation simulating the effect of B-factor sharpening.
augment_noise: bool, defaultFalse- Enables noise augmentation in the frequency domain.
device: torch.device, defaulttorch.device("cpu")- The device on which to perform the computations (CPU or GPU).
verbose: bool, defaultFalse- Enables verbose output for debugging or informational purposes.
-
Pre-processing:
- Stacks and converts the input half-maps into a 3D grid.
- Computes the spectral amplitude of the mean grid.
- Normalizes and Fourier-transforms the grids and masks.
-
Noise Augmentation (if
augment_noiseisTrue):- Applies noise in the frequency domain based on an exponential distribution.
-
Input Filtering (if
augment_bfactorisTrue):- Applies B-factor sharpening using a low-pass filter.
-
Real-space Processing:
- Transforms the Fourier-transformed grids back to real-space.
- Adjusts the mean and standard deviation.
-
Real-space Masked Smoothing (if
do_smooth_solventisTrue):- Applies low-pass filtering and smoothing in solvent regions.
-
Augment Orientational Bias (if
augment_orientational_biasisTrue):- Applies directional blur to simulate orientational bias.
-
Final Processing:
- Applies a radial mask to the grids.
- Resizes the grids to the specified
box_size. - Normalizes the final output grids.
grid1,grid2: torch.Tensor- The augmented half-maps after all transformations.
gt_grid1,gt_grid2: torch.Tensor- The ground truth half-maps after all transformations.
mask: torch.Tensor- The transformed mask.
- Ensure that
half1,half2, andmaskare numpy arrays of the same size. - The function supports both CPU and GPU processing, controlled by the
deviceparameter. - Enable
verbosemode for detailed log output, useful for troubleshooting and understanding the internal processing steps.
# Sample usage
augmented_half1, augmented_half2, gt_half1, gt_half2, updated_mask = augment_data(
half1, half2, mask,
box_size=64, voxel_size=1.0,
do_smooth_solvent=True,
augment_orientational_bias=False,
augment_bfactor=False,
augment_noise=True,
device=torch.device("cuda"),
verbose=True
)The work in this repository is based on the research conducted in the paper "Data-driven regularisation lowers the size barrier of cryo-EM structure determination" by Kimanius, Dari, et al.
For the complete details of the research and methods that inspired this work, please refer to the following publication:
Kimanius, D., et al. (2023) Data-driven regularisation lowers the size barrier of cryo-EM structure determination. bioRxiv. DOI: 10.1101/2023.10.23.563586.
This project is licensed under the MIT License - see the LICENSE.md file for details.