Generate scATACpipe config file

Input type

Reference genome source

ArchR genome source

Preprocessing strategy

Is genome index (bwa) available?

Is genome index (chromap) available?

Is genome index (cellranger) available?

Other pipeline-level parameters

CSV file (fragment)
? Full path to CSV file containing fragment file info.
CSV file (FASTQ)
? Provide full path to a CVS file with each row containing four columns, i.e., the absolute paths to R1, R2, and R3 from each individual run of each sample, and a unique sample name. An example can be found here. If multiple lane/run sequencing data is provided for a sample, the corresponding fastq files for each lane/run must be supplied in separate rows with the same sample name. Importantly, if users have sequencing data from multiple libraries on the same sample, the sample name for each library should be distinct to avoid collapsing data from different libraries. The INPUT_CHECK_FASTQ sub-workflow checks if the input CSV file is valid. To speed up the preprocessing of large fastq files, users can set the command-line parameter split_fastq to split the files into 20 million reads each using the SPLIT_FASTQ module.
output folder
? Full path or path relative to working directory.
species latin name
genome FASTA
genome GTF
ENSEMBL genome name
UCSC genome name
whether or not to split FASTQ
? Set to "true" to split reads into 20M chunks to gain more speed.
BWA index file
barcode correction algorithm
whitelist barcode folder
? Full path or path relative to working directory.
How to filter BAM files
? Choose from 'false' (no bam filtering will be performed), 'improper' (reads with low mapping quality, extreme fragment size(outside of 38 - 2000bp), etc. will be filtered out), and 'both' ('improper' + mitochondrial reads will be filtered out.)
cellranger index folder
chromap index folder
doublet removal algorithm
Amulet rmsk bed
? Full path to your Amulet rmsk BED file. Set to 'false' to skip.
Amulet autosomes file
? Full path (path relatively to working directory) to your Amulet autosome file. E.g. assets/
homo_sapiens_autosomes.txt
ArchR thread
ArchR genome
ArchR genome FASTA
TxDb
? Bioconductor TxDb name for building ArchR genome.
OrgDb
? Bioconductor OrgDb name for building ArchR genome.
BSgenome
? Bioconductor BSgenome name for building ArchR genome.
ArchR blacklist
? Full path to blacklist BED file, will be used in building ArchR genome. Set to 'false' to skip.
batch correction (Harmony)
filter sample
? Samples to get rid of for downstream analysis, refer to header line of archr_clustering/
Cluster_xxx_matrix.csv for valid sample names. Default to 'false' meaning that no sample will be excluded. E.g. 'PBMC_1K_N, PBMC_5K_V'.
filter cluster ILSI
? To filter out undesired clusters (e.g. outliers). Filtered clusters will not appear in downstream analysis. The clusters are generated with dimension reduced matrix using ISLI. Refer to archr_clustering/
Cluster_xxx_matrix.csv for valid cluster names. Default to 'false' meaning that no clusters will be excluded. E.g. 'C1, C2'.
filter cluster harmony
? To filter out undesired clusters (e.g. outliers). Filtered clusters will not appear in downstream analysis. Refer to archr_clustering/
Cluster_xxx_matrix.csv for valid cluster names. Default to 'false' meaning that no clusters will be excluded. E.g. 'C1, C2'.
custom peaks
? Name and path to custom peak file in .bed.gz format, used for motif enrichment and deviation analyses. E.g. 'Encode_K562_GATA1 = "https://www.encodeproject.org/files/ENCFF632NQI/@@download/ENCFF632NQI.bed.gz"'
scRNAseq Seurat object
? Full path to scRNA-seq Seurat object, when supplied, will perform integrated scRNA-seq analysis.
scRNAseq grouplist
? scRNAseq cluster grouping information for constrained integration. Example see conf/test.config.
profile
? Profile refers to a set of pre-defined parameters that are bundled together. E.g. "lsf" bundles "executor = 'lsf'" and "queue = 'long'".
lsf
singularity local test