I am starting out with bulk ATAC data as bed files that include the read counts. I want to use this data for a package called MOFA, which requires these preprocessing steps:
Normalisation: For count-based data such as RNA-seq or ATAC-seq we recommend size factor normalisation + variance stabilisation (i.e. a log transformation).
It is strongly recommended that you select highly variable features (HVGs) per assay before fitting the model. This ensures a faster training and a more robust inference procedure. Also, for data modalities that have very different dimensionalities we suggest a stronger feature selection fort he bigger views, with the aim of reducing the feature imbalance between data modalities.