MERLIN - Multipoint Engine for Rapid Likelihood Inference (c) 2000 - 2001 Goncalo Abecasis This document contains two sections: RESTRICTIONS ON USE and BRIEF INSTRUCTIONS. While you may find the RESTRICTIONS ON USE draconian, we feel that in some circunstances this prerelease could still be useful. =========================================================================== RESTRICTIONS FOR USE OF THIS PRE-RELEASE BETA VERSION =========================================================================== DISCLAIMER ========== This is a pre-release version of MERLIN. It comes with no guarantees. If you report bugs and provide helpful comments, the release version could be better. BENCHMARKING ============ This version of Merlin includes debug code and has been compiled without optimization. It will underperform significantly compared to final versions. Benchmarking will produce biased results. DISTRIBUTION ============ You must not redistribute this version in any circunstances. ============================================================================ START OF BRIEF INSTRUCTIONS ============================================================================ INPUT FILES =========== Pedigree and data files can be in linkage or QTDT format. For a detailed description of these formats see http://www.well.ox.ac.uk/asthma/QTDT and http://linkage.rockefeller.edu. Brief description of QTDT format files ====================================== 1. Pedigree File - Each line in a pedigree file starts with five mandatory columns. The first two provide a family and individual identifier. These two identifiers are interpreted as text and must be unique. The next two columns indicate each individual's parents and the final column encodes sex (1 - M, 2 - F). For example, the first five columns in a pedigree file describing a simple sib-pair family named ANONYMOUS might read: ANONYMOUS FATHER . . 1 ANONYMOUS MOTHER . . 2 ANONYMOUS SON FATHER MOTHER 1 ANONYMOUS DAUGHTER FATHER MOTHER 2 Often data might be encoded with numbers, rather than long identifiers: 1 1 . . 1 1 2 . . 2 1 3 1 2 1 1 4 1 2 1 2. Data File - The QTDT data file includes one line per item in the pedigree. Allowable items are markers (M), quantitative traits (T) and discrete affection status (A). In the pedigree affection status is usually encoded as 0 for Unknown, 1 for Normal, and 2 for Affected. MAP FILE ======== Linkage format files include information on recombination fractions between markers. If you use QTDT format input files you will need a separate map file, which has the following 3-column format: CHROMOSOME MARKER POSITION 1 D1S123 134.0 The first column is the chromosome number (1-22), the second the marker name (as in the data file) and the third the marker position (in cM). Map files do not need to be sorted and include more (or less) markers than present in the data file. ALLELE FREQUENCY FILE ===================== By default MERLIN estimates genotype frequency based on allele counts among all individuals (-fa). Alternatively, you could use allele counts among founders (-ff) or assume equal allele frequencies for simulated data (-fe). If none of these is satisfactory, MERLIN supports user-specified allele frequency. These can be provided in a separate input file (or in a linkage format data file). The MERLIN format input file expect a header line for each marker (labelled with M and the marker name) followed by one or more files listing allele frequencies starting at allele 1. M MARKER F freq01 freq02 freq03 freq04 F freq05 freq06 ... M ANOTHER_MARKER To specify input files use the -d, -p, -m and -f command line options: >merlin -d datafile -p pedfile -m mapfile -f freqfile BASIC ANALYSES ============== Basic analysis can be requested on the command line, and include error checking (--error), information content (--info) and calculation of ibd and kinship matrices (--ibd and --kinship). ERROR DETECTION =============== The --error option attempts to flag unlikely recombinants. A lower score indicates a genotype that seems more likely to be wrong. Unlikely genotypes are output to the screen and listed in merlin.err. INFORMATION =========== The --info option provides information content mapping. Information is calculated using the Kruglyak et al. definition, although slightly different results may be given when ungenotyped grandparents are present, since MERLIN uses the Gudbjartsson et al founder couple symmetry. PAIR-WISE SHARING STATISTICS ============================ The --ibd and --kinship calculate pair-wise IBD matrices. These are stored in output files merlin.ibd and merlin.kin respectively. LINKAGE ANALYSES ================ This BETA version includes only limited linkage analysis support. Non-parametric analysis are available through the NPL pairs and NPL all scoring functions for affecteds only analyses and through two analogous distribution free functions for QTL. The Kong and Cox function is used to convert non-parametric scores into LODs and calculate p-values. NOTE: Since only informative families (eg. with multiple phenotyped individuals and marker data) are considered, resulting in slightly different Z-scores from those reported by Genehunter. HAPLOTYPING =========== Haplotyping options include finding the best set of inheritance vectors (--best) or sampling one such set at random (--sample). Alternatively, all non-recombinant haplotypes can be listed by combining the --all and --zero options (see bellow). In the output haplotype file the following labels are used for recombination: | There is no recombination in this interval \ Recombinant on the paternal chromosome / Recombinant in the maternal chromosome + Recombinant in both chromosomes : This interval is not flanked by two informative markers, and is therefore uninformative. REDUCED RECOMBINATION ===================== In dense maps, the options --zero, --one, --two and --three limit the number of recombinations allowed between consecutive informative markers within each family. For single point analysis use the --single option. SIMULATION ========== Marker genotypes can be simulated conditional on the observed missing data pattern and genetic map through the --simulate option. If the --save option is also used, the simulated pedigree will be output to a file. Use the random seed option (-r seed) to select a different replicate. goncalo@well.ox.ac.uk