config_generator

Other pipeline-level parameters

CSV file (fragment)

Full path to CSV file containing fragment file info.

CSV file (FASTQ)

Provide full path to a CVS file with each row containing four columns, i.e., the absolute paths to R1, R2, and R3 from each individual run of each sample, and a unique sample name. An example can be found here. If multiple lane/run sequencing data is provided for a sample, the corresponding fastq files for each lane/run must be supplied in separate rows with the same sample name. Importantly, if users have sequencing data from multiple libraries on the same sample, the sample name for each library should be distinct to avoid collapsing data from different libraries. The INPUT_CHECK_FASTQ sub-workflow checks if the input CSV file is valid. To speed up the preprocessing of large fastq files, users can set the command-line parameter split_fastq to split the files into 20 million reads each using the SPLIT_FASTQ module.

output folder

Full path or path relative to working directory.

species latin name

genome FASTA

genome GTF

ENSEMBL genome name

UCSC genome name

whether or not to split FASTQ

Set to "true" to split reads into 20M chunks to gain more speed.

BWA index file

barcode correction algorithm

whitelist barcode folder

Full path or path relative to working directory.

How to filter BAM files

Choose from 'false' (no bam filtering will be performed), 'improper' (reads with low mapping quality, extreme fragment size(outside of 38 - 2000bp), etc. will be filtered out), and 'both' ('improper' + mitochondrial reads will be filtered out.)

cellranger index folder

chromap index folder

doublet removal algorithm

Amulet rmsk bed

Full path to your Amulet rmsk BED file. Set to 'false' to skip.

Amulet autosomes file

Full path (path relatively to working directory) to your Amulet autosome file. E.g. assets/
homo_sapiens_autosomes.txt

ArchR thread

ArchR genome

ArchR genome FASTA

TxDb

Bioconductor TxDb name for building ArchR genome.

OrgDb

Bioconductor OrgDb name for building ArchR genome.

BSgenome

Bioconductor BSgenome name for building ArchR genome.

ArchR blacklist

Full path to blacklist BED file, will be used in building ArchR genome. Set to 'false' to skip.

batch correction (Harmony)

filter sample

Samples to get rid of for downstream analysis, refer to header line of archr_clustering/
Cluster_xxx_matrix.csv for valid sample names. Default to 'false' meaning that no sample will be excluded. E.g. 'PBMC_1K_N, PBMC_5K_V'.

filter cluster ILSI

To filter out undesired clusters (e.g. outliers). Filtered clusters will not appear in downstream analysis. The clusters are generated with dimension reduced matrix using ISLI. Refer to archr_clustering/
Cluster_xxx_matrix.csv for valid cluster names. Default to 'false' meaning that no clusters will be excluded. E.g. 'C1, C2'.

filter cluster harmony

To filter out undesired clusters (e.g. outliers). Filtered clusters will not appear in downstream analysis. Refer to archr_clustering/
Cluster_xxx_matrix.csv for valid cluster names. Default to 'false' meaning that no clusters will be excluded. E.g. 'C1, C2'.

custom peaks

Name and path to custom peak file in .bed.gz format, used for motif enrichment and deviation analyses. E.g. 'Encode_K562_GATA1 = "https://www.encodeproject.org/files/ENCFF632NQI/@@download/ENCFF632NQI.bed.gz"'

scRNAseq Seurat object

Full path to scRNA-seq Seurat object, when supplied, will perform integrated scRNA-seq analysis.

scRNAseq grouplist

scRNAseq cluster grouping information for constrained integration. Example see conf/test.config.

profile

Profile refers to a set of pre-defined parameters that are bundled together. E.g. "lsf" bundles "executor = 'lsf'" and "queue = 'long'".

lsf
singularity local test

Generate scATACpipe config file

Input type

Reference genome source

ArchR genome source

Preprocessing strategy

Is genome index (bwa) available?

Is genome index (chromap) available?

Is genome index (cellranger) available?

Other pipeline-level parameters