A Minimal Case
What does the input/output look like?
We will delve into step-by-step instructions illustrating how to effectively utilize cryoSTAR with a synthetic dataset. Our aim is to equip you with the skills you need to make the most of this powerful tool. Let's jump right in! 😊📖
Data Preparation
The image below illustrates the process of generating the synthetic dataset. More details can be found in our paper.

Download the data from Google Drive (a link for download via wget command will be provided soon):
Extract the zip file at the path: cryostar/projects/star, the directory looks like:

The directory contains three sub-directories:
pdbscontains 50pdbfiles, where1akeA_{i}.pdbis the interpolation between1akeA_1.pdb(pdbid: 4ake) and1akeA_50.pdb(pdbid: 1ake). Please note that the proteins 4ake and 1ake have identical sequences but feature different conformations. The interpolation between these two conformations is generated using PyMol's morph tool.mrcscontains 50mrcfiles, where1akeA_{i}.mrcis the density corresponding to the1akeA_{i}.pdb. Themrcfile is generated with EMAN2's e2pdb2mrc tool.uniform_snr0-0001_ctfcontains the particles projected from themrcfiles. We add CTF distortions and Gaussian noises to random projections.We also offer some reconstructed results through RELION. The density
rln.mrcis reconstructed from all particles. In contrast,rln_reconstruct/rln{i}.mrcis uniquely reconstructed from particles generated from the i-thmrc.
Here are some visualizations of the dataset.

Heterogenous Reconstruction
Step 1: Reconstruct Atomic Structures
You need to modify the dataset_dir in your atom_configs/1ake.py according to the file path where you have extracted your data. This step takes 65 minutes on a 4-card V100.
$ python train_atom.py atom_configs/1ake.pyOverview of the Outputs
In this case, the result is saved to work_dirs/atom_1ake_0. We will particularly focus on the 0123_0024000 folder. This folder holds the results obtained from the 123rd epoch and the 24,000th step of the training procedure.

Key Output 1: A Stack of Atomic Structures (pca-*.pdb)
Open the pca-1.pdb file which contains 10 structures, sampled along the first PCA dimension of the latent space. We utilize ChimeraX for animation. Simply open the file and enter the command mseries slider all.

Key Output 2: Latent Codes (z.npy)
Another key file is z.npy, containing the latent codes for each particle. This deviates from traditional 3D classification which allocates a discrete label (e.g., class-1, class-2, class-3) to every particle. In contrast, cryoSTAR assigns a continuous label to each particle, taking the form of a vector (e.g., [0.1, 0.3, 0.4]). The distance amongst different latent codes serves to measure the similarity of the underlying conformation of each particle.
z.npy is a 2-D matrix, and its shape is (num_particles x latent_dimension). In the below image, z.npy is a matrix whose shape is 25x3 since there are 25 particles and the latent space is set to 3.

Step 2: Reconstruct Densities
You need to modify the dataset_dir in your density_configs/1ake.py and change the following xxx/z.npy to the path of the latest output z.npy file path from step 1. This step takes about 10 minutes on a 4-card V100.
$ python train_density.py density_configs/1ake.py --cfg-options extra_input_data_attr.given_z=xxx/z.npyOverview of the Outputs
In this instance, the result is saved to work_dirs/density_1ake_0. We are particularly interested in the 0019_0015640 folder. As indicated earlier, it's quite easy to deduce that this naming convention represents the 19th epoch and 15,640th step of the training process.

Key Output: Densities (*.mrc)
Let's take a look at the vol_pca_1_*.mrc files! These are 10 volumes produced by cryoSTAR's density generator, using z.npy as an additional input. Let's visualize them with Chimera again.

🤔 Without a PDB File?
Wait, but if I do not have a reference pdb file?
So easy! cryoSTAR can circumvent this case! Just run the train_density code without specifying the z.npy!
$ python train_density.py density_configs/1ake.pyNote that this looks similar to cryoDRGN but with some differences. For instance, cryoDRGN implements certain pre-processing measures, including pre-shifting, CTF phase-flipping, the pre-computation of Fourier Transforms, and others. CryoSTAR eliminates the need for many such preprocessing steps while maintaining both the quality and speed.
Last updated