Example usage: irf3.exe Human21.fa 2 3 5 80 10 40 100000 500000 -t7 500000 -d -l
Things to note:
Once the program is installed you can run it with no parameters to obtain information on proper usage syntax. For example, if the program was installed as irf.exe, then by typing irf.exe on the command line, you will see the following:
Please use: irf.exe File Match Mismatch Delta PM PI Minscore Maxlength MaxLoop [options]
Where: (all weights, penalties, and scores are positive)
File = sequences input file
Match = matching weight
Mismatch = mismatching penalty
Delta = indel penalty
PM = match probability (whole number)
PI = indel probability (whole number)
Minscore = minimum alignment score to report
MaxLength = maximum stem length to report (10,000 minimum and no upper limit, but system will run out memory if this is too large)
MaxLoop = filters results to have loop less than this value (will not give you more results unless you increase -t4,-t4,-t7 as well)
[options] = one or more of the following :
-m masked sequence file
-f flanking sequence
-d data file
-h suppress HTML output (this automatically switches -d to ON)
-l lowercase letters do not participate in a k-tuple match, but can be part of an alignment
-gt allow the GT match (gt matching weight must follow immediately after the switch)
-mr target is mirror repeats
-r set the identity value of the redundancy algorithm (value 60 to 100 must follow immediately after the switch)
-la lookahead test enabled. Results are slightly different as a repeat might be found at a different interval. Faster.
-a3 perform a third alignment going inward. Produces longer or better alignments. Slower.
-a4 same as a3 but alignment is of maximum narrowband width. Slightly better results than a3. Much slower.
-i1 Do not stop once a repeat is found at a certain interval and try larger intervals at nearby centers. Better(?) results. Slower.
-i2 Do not stop once a repeat is found at a certain interval and try all intervals at same and nearby centers. Better(?) results. Much slower.
-r0 do not eliminate redundancy from the output
-r2 modified redundancy algorithm, does not remove stuff which is redundant to redundant. Slower and not good for TA repeat regions, would not leave the largest, but a whole bunch.
-t4 set the maximum loop separation for tuple of length4 (default 154, separation <=1,000 must follow)
-t5 set the maximum loop separation for tuple of length5 (default 813, separation <=10,000 must follow)
-t7 set the maximum loop separation for tuple of length7 (default 14800, limited by your system's memory, make sure you increase maxloop to the same value)
-ngs more compact .dat output on multisequence files, returns 0 on success.
Note the sequence file should be in FASTA format:
>Name of sequence
aggaaacctg ccatggcctc ctggtgagct gtcctcatcc actgctcgct gcctctccag
atactctgac ccatggatcc cctgggtgca gccaagccac aatggccatg gcgccgctgt
actcccaccc gccccaccct cctgatcctg ctatggacat ggcctttcca catccctgtg
Using recommended parameters the command line will look something like:
irf yoursequence.txt 2 7 7 80 10 50 500 -f -d -m
Once the program starts running it will print update messages to the screen. The word "Done" will be printed when the program finishes.
For single sequence input files there will be at least two HTML format output files, a repeat table file and an alignment file. If the number of repeats found is greater than 120, multiple linked repeat tables are produced. The links to the other tables appear at the top and the bottom of each table. To view the results start by opening the first repeat table file with your web browser. This file has the extension ".1.html". Alignment files can be accessed from the repeat table files. Alignment files end with the ".txt.html" extension.
For input files containing multiple sequences a summary page is produced that links to the output of individual sequences. This file has the extension "summary.html". You should start by opening this file if your input had multiple sequences in the same file. Also note that the output files of individual sequences will have an identifier of the form ".sn." ( n an integer) embedded in the name indicating the index of the sequence in the input file. The identifier is omitted for single sequence input files.
For more information on the output please see Table Explanation and Alignment Explanation.
|
Last revised February 19, 2025
Send any questions or comments to: Gary Benson |