Does SCVI automatically use highly variable genes?

Does SCVI automatically use highly variable genes?

1

According to the SCVI tutorials, it is recommended to pre-select highly variable genes before training the SCVI model. Here is a piece of the code from here: docs.scvi-tools.org/en/stable/user_guide/notebooks/harmonization.html

adata.layers["counts"] = adata.X.copy()
sc.pp.normalize_total(adata, target_sum=1e4)
sc.pp.log1p(adata)
adata.raw = adata  # keep full dimension safe
sc.pp.highly_variable_genes(
    adata,
    flavor="seurat_v3",
    n_top_genes=2000,
    layer="counts",
    batch_key="batch",
    subset=True

What leaves me confused is that they set subset = True, which means they are not filtering the non-variable genes, they are just marking the highly variable ones.
Then, they train the SCVI model:

scvi.data.setup_anndata(adata, layer="counts", batch_key="batch")
vae = scvi.model.SCVI(adata)
vae.train()

How does SCVI know which are highly variable genes and which not? Is it because of the layer counts? Does anybody know if this is because the layer count only contains the highly variable genes or because the layer marks the highly variable genes in a way SCVI understand?


scRNA-seq


SCVI


Highly variable genes

• 317 views

Read more here: Source link