Usage:
fimo [options] <motifs> <database>
Description:
The name fimo
stands for "find individual motif
occurences." The program searches a sequence database for occurrences
of known motifs, treating each motif independently. The program uses
a dynamic programming algorithm to convert log-odds scores into
p-values, assuming a zero-order background model. The p-values for
each motif are then converted to q-values following the method of
Benjamini and Hochberg (where "q-value" is defined as the minimal
false discovery rate at which a given motif occurrence is deemed significant).
The program reports
all motif occurrences that receive q-values smaller than a specified
threshold. If a given motif has the strand
feature set
to +/-
(rather than +
), then
fimo
will search both strands for occurrences.
The most accurate estimation of q-values requires FIMO to retain
the p-values for all matches to a motif in memory.
This is not feasible for very large sequence databases.
The parameter --max-stored-scores
sets the maximum number of matches
that will be retained for a motif. It defaults to 100,000.
If the number of matches reaches the maximum value allowed,
FIMO will discard 50% of the least significant matches,
and new matches falling below the significance level of the retained matches
will also be discarded.
If FIMO has to discard matches it will not be able to use boostraping on the
complete set of p-values to estimate the parameter pi0.
In this case FIMO will calculate q-values using pi0 = 1.0;
Input:
-
<motifs>
is a list of motifs, in MEME format. -
<database>
is a collection of sequences in FASTA format.
Output:
FIMO will create a directory, named fimo_out
by default.
Any existing output files in the directory will be overwritten.
The directory will contain:
-
An XML file named
fimo.xml
using the CisML schema. -
An HTML file named
fimo.html
-
A plain text file named
fimo.text
-
A plain text file in GFF format named
fimo.gff
The default output directory can be overridden using the --o
or --oc
options which are described below.
The --text
option will limit output to plain text sent to the standard output.
The HTML and plain text output contain the following columns:
- The motif identifier
- The sequence identiifer
- The start position of the motif occurence
- The end position of the motif occurence. If the start position is larger then the end position, the motif occurrence is on the reverse strand.
- The score for the motif occurence. The score is computed by by summing the appropriate entries from each column of the position-dependent scoring matrix that represents the motif.
- The p-value of the motif occurence. The p-value is the probability of a random sequence of the same length as the motif matching that position of the sequence with a score at least as good.
- The q-vlavlue of the motif occurence. The q-value is the estimated false discovery rate if the occurrence is accepted as significant. See Storey JD, Tibshirani R. Statistical significance for genome-wide studies. Proc. Natl Acad. Sci. USA (2003) 100:9440–9445
- The sequence matched to the motif.
The HTML and plain text output is sorted by increasing p-value.
Options:
--bgfile <bfile>
- Read background frequencies from<bfile>
. The file should be in MEME background file format. The default is to use frequencies embedded in the application from the non-redundant database. If the argument is the keywordmotif-file
, then the frequencies will be taken from the motif file.--max-seq-length
- Set the maximum length allowed for input sequences. By default the maximum allowed length is 250000000.<max>
--max-stored-scores
- Set the maximum number of scores that will be stored. Precise calculation of q-values depends on having a complete list of scores. However, keeping a complete list of scores may exceed available memory. Once the number of stored scores reaches the maximum allowed, the least significant 50% of scores will be dropped, and approximate q-values will be calculated. By default the maximum number of stored matches is 100,000.<max>
--motif <id>
- Use only the motif identified by<id>
. This option may be repeated.--motif-pseudo <float>
- A pseudocount to be added to each count in the motif matrix, after first multiplying by the corresponding background frequency (default=0.1).--norc
- Do not score the reverse complement DNA strand. Both strands are scored by default.--o <dir name>
- Specifies the output directory. If the directory already exists, the contents will not be overwritten.--oc <dir name>
- Specifies the output directory. If the directory already exists, the contents will be overwritten.--output-pthresh <float>
- The p-value threshold for displaying search results. If the p-value of a match is greater than this value, then the match will not be printed. Using the--output-pthresh
option will set the q-value threshold to 1.0. The default p-value threshold is 1e-4.--output-qthresh <float>
- The q-value threshold for displaying search results. If the q-value of a match is greater than this value, then the match will not be printed. Using the--output-qthresh
option will set the p-value threshold to 1.0. The default q-value threshold is 1.0.--no-qvalue
- Do not compute a q-value for each p-value. The q-value calculation is that of Benjamini and Hochberg (1995). By default, q-values are computed.--text
Limits output to plain text sent to standard out. For FIMO, the text output is unsorted, and q-values are not reported. This mode allows the program to search an arbitrarily large database, because results are not stored in memory.--verbosity 1|2|3|4
- Set the verbosity of status reports to standard error. The default level is 2.