MERLIN - Multipoint Engine for Rapid Likelihood Inference
(c) 2000 - 2001 Goncalo Abecasis

This document contains two sections: RESTRICTIONS ON USE and BRIEF
INSTRUCTIONS.

While you may find the RESTRICTIONS ON USE draconian, we feel that in some
circunstances this prerelease could still be useful.

===========================================================================
RESTRICTIONS FOR USE OF THIS PRE-RELEASE BETA VERSION
===========================================================================

DISCLAIMER
==========

This is a pre-release version of MERLIN.  It comes with no guarantees. If
you report bugs and provide helpful comments, the release version could be
better.

BENCHMARKING
============

This version of Merlin includes debug code and has been compiled without
optimization. It will underperform significantly compared to final
versions.

Benchmarking will produce biased results.

DISTRIBUTION
============

You must not redistribute this version in any circunstances.

============================================================================
START OF BRIEF INSTRUCTIONS
============================================================================

INPUT FILES
===========

Pedigree and data files can be in linkage or QTDT format. For a detailed 
description of these formats see http://www.well.ox.ac.uk/asthma/QTDT and 
http://linkage.rockefeller.edu.

Brief description of QTDT format files
======================================

1. Pedigree File - Each line in a pedigree file starts with five mandatory 
               columns. The first two provide a family and  individual 
               identifier. These two identifiers are interpreted as text and
               must be unique. The next two columns indicate each individual's
               parents and the final column encodes sex (1 - M, 2 - F).
               
For example, the first five columns in a pedigree file describing a simple 
sib-pair family named ANONYMOUS might read:

ANONYMOUS   FATHER         .      .  1
ANONYMOUS   MOTHER         .      .  2
ANONYMOUS   SON       FATHER MOTHER  1
ANONYMOUS   DAUGHTER  FATHER MOTHER  2

Often data might be encoded with numbers, rather than long identifiers:

1 1 . . 1
1 2 . . 2
1 3 1 2 1
1 4 1 2 1

2. Data File - The QTDT data file includes one line per item in the pedigree.
               Allowable items are markers (M), quantitative traits (T) and 
               discrete affection status (A). In the pedigree affection status
               is usually encoded as 0 for Unknown, 1 for Normal, and 2 for 
               Affected.

MAP FILE
========

Linkage format files include information on recombination fractions between 
markers. If you use QTDT format input files you will need a separate map file,
which has the following 3-column format:

CHROMOSOME    MARKER     POSITION
1             D1S123     134.0

The first column is the chromosome number (1-22), the second the marker name
(as in the data file) and the third the marker position (in cM). Map files
do not need to be sorted and include more (or less) markers than present in
the data file.

ALLELE FREQUENCY FILE
=====================

By default MERLIN estimates genotype frequency based on allele counts among all
individuals (-fa). Alternatively, you could use allele counts among founders (-ff)
or assume equal allele frequencies for simulated data (-fe).

If none of these is satisfactory, MERLIN supports user-specified allele frequency.
These can be provided in a separate input file (or in a linkage format data file).
The MERLIN format input file expect a header line for each marker (labelled with M
and the marker name) followed by one or more files listing allele frequencies 
starting at allele 1. 

M  MARKER
F  freq01 freq02 freq03 freq04
F  freq05 freq06 ...

M ANOTHER_MARKER

To specify input files use the -d, -p, -m and -f command line options:

>merlin -d datafile -p pedfile -m mapfile -f freqfile


BASIC ANALYSES
==============

Basic analysis can be requested on the command line, and include error
checking (--error), information content (--info) and calculation of ibd and
kinship matrices (--ibd and --kinship).


ERROR DETECTION
===============

The --error option attempts to flag unlikely recombinants. A lower score
indicates a genotype that seems more likely to be wrong. Unlikely genotypes
are output to the screen and listed in merlin.err.

INFORMATION
===========

The --info option provides information content mapping. Information is calculated
using the Kruglyak et al. definition, although slightly different results may 
be given when ungenotyped grandparents are present, since MERLIN uses the Gudbjartsson
et al founder couple symmetry.

PAIR-WISE SHARING STATISTICS
============================

The --ibd and --kinship calculate pair-wise IBD matrices. These are stored in output
files merlin.ibd and merlin.kin respectively.


LINKAGE ANALYSES
================

This BETA version includes only limited linkage analysis support.
Non-parametric analysis are available through the NPL pairs and NPL all
scoring functions for affecteds only analyses and through two analogous
distribution free functions for QTL.

The Kong and Cox function is used to convert non-parametric scores into
LODs and calculate p-values.

NOTE: Since only informative families (eg. with multiple phenotyped
individuals and marker data) are considered, resulting in slightly different
Z-scores from those reported by Genehunter.

HAPLOTYPING
===========

Haplotyping options include finding the best set of inheritance vectors
(--best) or sampling one such set at random (--sample). Alternatively, all
non-recombinant haplotypes can be listed by combining the --all and --zero
options (see bellow).

In the output haplotype file the following labels are used for recombination:

            |   There is no recombination in this interval
            \   Recombinant on the paternal chromosome
            /   Recombinant in the maternal chromosome
            +   Recombinant in both chromosomes
            :   This interval is not flanked by two informative
                markers, and is therefore uninformative.

REDUCED RECOMBINATION
=====================

In dense maps, the options --zero, --one, --two and --three limit the number
of recombinations allowed between consecutive informative markers within each
family.

For single point analysis use the --single option.

SIMULATION
==========

Marker genotypes can be simulated conditional on the observed missing data
pattern and genetic map through the --simulate option. If the --save option
is also used, the simulated pedigree will be output to a file.

Use the random seed option (-r seed) to select a different replicate.


goncalo@well.ox.ac.uk