> For the complete documentation index, see [llms.txt](https://byte-research.gitbook.io/cryostar/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://byte-research.gitbook.io/cryostar/a-minimal-case.md).

# A Minimal Case

We will delve into step-by-step instructions illustrating how to effectively utilize cryoSTAR with a synthetic dataset. Our aim is to equip you with the skills you need to make the most of this powerful tool. Let's jump right in! 😊📖

## Data Preparation

The image below illustrates the process of generating the synthetic dataset. More details can be found in our paper.

<figure><img src="/files/WSpeax05vlCCfedGjv6z" alt="" width="375"><figcaption><p>The process of generating the synthetic dataset (1ake)</p></figcaption></figure>

Download the dataset from [Google Drive](https://drive.google.com/file/d/1OMvwBFUjR-97-nRMAcqXBfpNqEn5eFvw/view), [Zenodo](https://zenodo.org/records/17581921), or via command line:

`wget https://zenodo.org/records/17581921/files/tutorial_data_1ake.zip` .

Extract the zip file at the path: `cryostar/projects/star`, the directory looks like:

<figure><img src="/files/MzyA2kexwIo8gGxdHp5F" alt="" width="355"><figcaption><p>Overview of the inputs</p></figcaption></figure>

The directory contains three sub-directories:

* `pdbs` contains 50 `pdb` files, where `1akeA_{i}.pdb` is the interpolation between  `1akeA_1.pdb` ([pdbid: 4ake](https://www.rcsb.org/structure/4ake)) and `1akeA_50.pdb` ([pdbid: 1ake](https://www.rcsb.org/structure/1ake)). Please note that the proteins 4ake and 1ake have identical sequences but feature different conformations. The interpolation between these two conformations is generated using PyMol's *morph* tool.
* `mrcs` contains 50 `mrc` files, where `1akeA_{i}.mrc` is the density corresponding to the `1akeA_{i}.pdb`. The `mrc` file is generated with EMAN2's *e2pdb2mrc* tool.
* `uniform_snr0-0001_ctf` contains the particles projected from the `mrc` files. We add CTF distortions and Gaussian noises to random projections.&#x20;
  * We also offer some reconstructed results through RELION. The density `rln.mrc` is reconstructed from all particles. In contrast, `rln_reconstruct/rln{i}.mrc` is uniquely reconstructed from particles generated from the i-th `mrc`.

Here are some visualizations of the dataset.

<figure><img src="/files/yBJYtAWbZqVXNX4Dd2wa" alt="" width="375"><figcaption></figcaption></figure>

## Heterogenous Reconstruction

### Step 1: Reconstruct Atomic Structures

You need to modify the `dataset_dir` in your `atom_configs/1ake.py` according to the file path where you have extracted your data. This step takes 65 minutes on a 4-card V100.

```bash
$ python train_atom.py atom_configs/1ake.py
```

#### Overview of the Outputs

In this case, the result is saved to `work_dirs/atom_1ake_0`. We will particularly focus on the `0123_0024000` folder. This folder holds the results obtained from the 123rd epoch and the 24,000th step of the training procedure.

<figure><img src="/files/aZhiVBpdmnGpBpWUs0pL" alt="" width="309"><figcaption><p>Overview of the outputs</p></figcaption></figure>

#### Key Output 1: A Stack of Atomic Structures (pca-\*.pdb)

Open the `pca-1.pdb` file which contains 10 structures, sampled along the first PCA dimension of the latent space. We utilize ChimeraX for animation. Simply open the file and enter the command `mseries slider all`.

<figure><img src="/files/R9ibLPr8NGrL9KIwyPhr" alt=""><figcaption><p>Visualization of pca-1.pdb</p></figcaption></figure>

#### Key Output 2: Latent Codes (z.npy)

Another key file is `z.npy`, containing the latent codes for each particle. This deviates from traditional 3D classification which allocates a discrete label (e.g., class-1, class-2, class-3) to every particle. In contrast, cryoSTAR assigns a continuous label to each particle, taking the form of a vector (e.g., \[0.1, 0.3, 0.4]). The distance amongst different latent codes serves to measure the similarity of the underlying conformation of each particle.

`z.npy` is a 2-D matrix, and its shape is (num\_particles x latent\_dimension). In the below image, `z.npy` is a matrix whose shape is 25x3 since there are 25 particles and the latent space is set to 3.

<figure><img src="/files/xbSPINICpmaS9BftXwWi" alt=""><figcaption><p>z.npy</p></figcaption></figure>

### Step 2: Reconstruct Densities

You need to modify the `dataset_dir` in your density\_configs`/1ake.py` and change the following `xxx/z.npy` to the path of the latest output `z.npy` file path from step 1. This step takes about 10 minutes on a 4-card V100.

<pre class="language-bash"><code class="lang-bash"><strong>$ python train_density.py density_configs/1ake.py --cfg-options extra_input_data_attr.given_z=xxx/z.npy
</strong></code></pre>

#### Overview of the Outputs

In this instance, the result is saved to `work_dirs/density_1ake_0`. We are particularly interested in the `0019_0015640` folder. As indicated earlier, it's quite easy to deduce that this naming convention represents the 19th epoch and 15,640th step of the training process.

<figure><img src="/files/YJeMgQyRzEY2MwtTghxf" alt="" width="375"><figcaption><p>Overview of the outputs</p></figcaption></figure>

#### Key Output: Densities (\*.mrc)

Let's take a look at the `vol_pca_1_*.mrc` files! These are 10 volumes produced by cryoSTAR's density generator, using `z.npy` as an additional input. Let's visualize them with Chimera again.

<figure><img src="/files/cDuHpcpXG1O1Hv6e0zSr" alt=""><figcaption><p>Visualization of generated densities</p></figcaption></figure>

## 🤔 Without a PDB File?

Wait, but if I do not have a reference pdb file?

So easy! cryoSTAR can circumvent this case! Just run the `train_density` code without specifying the `z.npy`!

```bash
$ python train_density.py density_configs/1ake.py
```

Note that this looks similar to cryoDRGN but with some differences. For instance, cryoDRGN implements certain pre-processing measures, including pre-shifting, CTF phase-flipping, the pre-computation of Fourier Transforms, and others. CryoSTAR eliminates the need for many such preprocessing steps while maintaining both the quality and speed.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://byte-research.gitbook.io/cryostar/a-minimal-case.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
