A Minimal Case
What does the input/output look like?
We will delve into step-by-step instructions illustrating how to effectively utilize cryoSTAR with a synthetic dataset. Our aim is to equip you with the skills you need to make the most of this powerful tool. Let's jump right in! 😊📖
Data Preparation
The image below illustrates the process of generating the synthetic dataset. More details can be found in our paper.
Download the data from Google Drive (a link for download via wget
command will be provided soon):
Extract the zip file at the path: cryostar/projects/star
, the directory looks like:
The directory contains three sub-directories:
pdbs
contains 50pdb
files, where1akeA_{i}.pdb
is the interpolation between1akeA_1.pdb
(pdbid: 4ake) and1akeA_50.pdb
(pdbid: 1ake). Please note that the proteins 4ake and 1ake have identical sequences but feature different conformations. The interpolation between these two conformations is generated using PyMol's morph tool.mrcs
contains 50mrc
files, where1akeA_{i}.mrc
is the density corresponding to the1akeA_{i}.pdb
. Themrc
file is generated with EMAN2's e2pdb2mrc tool.uniform_snr0-0001_ctf
contains the particles projected from themrc
files. We add CTF distortions and Gaussian noises to random projections.We also offer some reconstructed results through RELION. The density
rln.mrc
is reconstructed from all particles. In contrast,rln_reconstruct/rln{i}.mrc
is uniquely reconstructed from particles generated from the i-thmrc
.
Here are some visualizations of the dataset.
Heterogenous Reconstruction
Step 1: Reconstruct Atomic Structures
You need to modify the dataset_dir
in your atom_configs/1ake.py
according to the file path where you have extracted your data. This step takes 65 minutes on a 4-card V100.
Overview of the Outputs
In this case, the result is saved to work_dirs/atom_1ake_0
. We will particularly focus on the 0123_0024000
folder. This folder holds the results obtained from the 123rd epoch and the 24,000th step of the training procedure.
Key Output 1: A Stack of Atomic Structures (pca-*.pdb)
Open the pca-1.pdb
file which contains 10 structures, sampled along the first PCA dimension of the latent space. We utilize ChimeraX for animation. Simply open the file and enter the command mseries slider all
.
Key Output 2: Latent Codes (z.npy)
Another key file is z.npy
, containing the latent codes for each particle. This deviates from traditional 3D classification which allocates a discrete label (e.g., class-1, class-2, class-3) to every particle. In contrast, cryoSTAR assigns a continuous label to each particle, taking the form of a vector (e.g., [0.1, 0.3, 0.4]). The distance amongst different latent codes serves to measure the similarity of the underlying conformation of each particle.
z.npy
is a 2-D matrix, and its shape is (num_particles x latent_dimension). In the below image, z.npy
is a matrix whose shape is 25x3 since there are 25 particles and the latent space is set to 3.
Step 2: Reconstruct Densities
You need to modify the dataset_dir
in your density_configs/1ake.py
and change the following xxx/z.npy
to the path of the latest output z.npy
file path from step 1. This step takes about 10 minutes on a 4-card V100.
Overview of the Outputs
In this instance, the result is saved to work_dirs/density_1ake_0
. We are particularly interested in the 0019_0015640
folder. As indicated earlier, it's quite easy to deduce that this naming convention represents the 19th epoch and 15,640th step of the training process.
Key Output: Densities (*.mrc)
Let's take a look at the vol_pca_1_*.mrc
files! These are 10 volumes produced by cryoSTAR's density generator, using z.npy
as an additional input. Let's visualize them with Chimera again.
🤔 Without a PDB File?
Wait, but if I do not have a reference pdb file?
So easy! cryoSTAR can circumvent this case! Just run the train_density
code without specifying the z.npy
!
Note that this looks similar to cryoDRGN but with some differences. For instance, cryoDRGN implements certain pre-processing measures, including pre-shifting, CTF phase-flipping, the pre-computation of Fourier Transforms, and others. CryoSTAR eliminates the need for many such preprocessing steps while maintaining both the quality and speed.
Last updated