Using IRF on the Command Line

Example usage: irf3.exe Human21.fa 2 3 5 80 10 40 100000 500000 -t7 500000 -d -l

Things to note:

  1. This will use alignment scoring parameters: +2,-3,-5 (match, mismatch, indel) and a minimum score of 40
  2. The probability of a match is 80% and the probability of an indel is 10% (these values cannot be change)
  3. A maximum stem length of 100K is allowed
  4. A maximum loop length of 500K is allowed
  5. A tuple of length 7 (least sensitive) will look back at most 500K
  6. A data file will be produced
  7. Lowercase letters will be ignored during the detection phase

Once the program is installed you can run it with no parameters to obtain information on proper usage syntax. For example, if the program was installed as irf.exe, then by typing irf.exe on the command line, you will see the following:

Please use: irf.exe File Match Mismatch Delta PM PI Minscore Maxlength MaxLoop [options]

Where: (all weights, penalties, and scores are positive)
  File = sequences input file
  Match  = matching weight
  Mismatch  = mismatching penalty
  Delta = indel penalty
  PM = match probability (whole number)
  PI = indel probability (whole number)
  Minscore = minimum alignment score to report
  MaxLength = maximum stem length to report (10,000 minimum and no upper limit, but system will run out memory if this is too large)
  MaxLoop = filters results to have loop less than this value (will not give you more results unless you increase -t4,-t4,-t7 as well)
  [options] = one or more of the following :
               -m    masked sequence file
               -f    flanking sequence
               -d    data file
               -h    suppress HTML output (this automatically switches -d to ON)

               -l    lowercase letters do not participate in a k-tuple match, but can be part of an alignment
               -gt   allow the GT match (gt matching weight must follow immediately after the switch)
               -mr   target is mirror repeats
               -r    set the identity value of the redundancy algorithm (value 60 to 100 must follow immediately after the switch)

               -la   lookahead test enabled. Results are slightly different as a repeat might be found at a different interval. Faster.
               -a3   perform a third alignment going inward. Produces longer or better alignments. Slower.
               -a4   same as a3 but alignment is of maximum narrowband width. Slightly better results than a3. Much slower.
               -i1   Do not stop once a repeat is found at a certain interval and try larger intervals at nearby centers. Better(?) results. Slower.
               -i2   Do not stop once a repeat is found at a certain interval and try all intervals at same and nearby centers. Better(?) results. Much slower.
               -r0   do not eliminate redundancy from the output
               -r2   modified redundancy algorithm, does not remove stuff which is redundant to redundant. Slower and not good for TA repeat regions, would not leave the largest, but a whole bunch.

               -t4   set the maximum loop separation for tuple of length4 (default 154, separation <=1,000 must follow)
               -t5   set the maximum loop separation for tuple of length5 (default 813, separation <=10,000 must follow)
               -t7   set the maximum loop separation for tuple of length7 (default 14800, limited by your system's memory, make sure you increase maxloop to the same value)

               -ngs  more compact .dat output on multisequence files, returns 0 on success.

Note the sequence file should be in FASTA format:

>Name of sequence
   aggaaacctg ccatggcctc ctggtgagct gtcctcatcc actgctcgct gcctctccag
   atactctgac ccatggatcc cctgggtgca gccaagccac aatggccatg gcgccgctgt
   actcccaccc gccccaccct cctgatcctg ctatggacat ggcctttcca catccctgtg

Program Parameters:

Using recommended parameters the command line will look something like:

irf yoursequence.txt 2 7 7 80 10 50 500 -f -d -m

Once the program starts running it will print update messages to the screen. The word "Done" will be printed when the program finishes.

For single sequence input files there will be at least two HTML format output files, a repeat table file and an alignment file. If the number of repeats found is greater than 120, multiple linked repeat tables are produced. The links to the other tables appear at the top and the bottom of each table. To view the results start by opening the first repeat table file with your web browser. This file has the extension ".1.html". Alignment files can be accessed from the repeat table files. Alignment files end with the ".txt.html" extension.

For input files containing multiple sequences a summary page is produced that links to the output of individual sequences. This file has the extension "summary.html". You should start by opening this file if your input had multiple sequences in the same file. Also note that the output files of individual sequences will have an identifier of the form ".sn." ( n an integer) embedded in the name indicating the index of the sequence in the input file. The identifier is omitted for single sequence input files.

For more information on the output please see Table Explanation and Alignment Explanation.


Last revised July 5, 2023
Send any questions or comments to:
Gary Benson