Haploview currently accepts input data in three formats, standard linkage format, completely or partially phased haplotypes and HapMap Project data dumps. It also takes in a separate file with marker position information. The four formats are explained in depth below.
Linkage data should be in the Linkage Pedigree (pre MAKEPED) format, with columns of family, individual, father, mother, gender, affected status and genotypes. The file should not have a header line (i.e. the first line should be for the first individual, not the names of the columns). Please note that Haploview can only interpret biallelic markers — markers with greater than two alleles (e.g. microsatellites) will not work correctly. A sample line from such a file might look something like:
3 12 8 9 1 2 1 2 3 3 0 0 4 2 a b c d e f -----------g------------ |
A unique alphanumeric identifier for this individual's family. Unrelated individuals should not share a pedigree name.
An alphanumeric identifier for this individual. Should be unique within his family (see above).
Identifier corresponding to father's individual ID or "0" if unknown father. Note that if a father ID is specified, the father must also appear in the file.
Identifier corresponding to mother's individual ID or "0" if unknown mother Note that if a mother ID is specified, the mother must also appear in the file.
Individual's gender (1=MALE, 2=FEMALE).
Affectation status to be used for association tests (0=UNKNOWN, 1=UNAFFECTED, 2=AFFECTED).
Each marker is represented by two columns (one for each allele, separated by a space) and coded 1-4 where: 1=A, 2=C, 3=G, T=4. A 0 in any of the marker genotype position (as in the the genotypes for the third marker above) indicates missing data.
It is also worth noting that this format can be used with non-family based data. Simply use a dummy value for the pedigree name (1, 2, 3...) and fill in zeroes for father and mother ID. It is important that the "dummy" value for the ped name be unique for each individual. Affectation status can be used to designate cases vs. controls (2 and 1, respectively).
Files should also follow the following guidelines:
Haplotype data for Haploview's input must be formatted in columns of Family, Individual and Genotypes. There should be two lines (chromosomes) for each individual. This is the standard format of Genehunter's TDT output. See the sample below:
FAM1 FAM1M01 0 4 2 2 FAM1 FAM1M01 0 4 2 2 FAM1 FAM1F02 3 h 1 2 FAM1 FAM1F02 3 h 1 2 |
The data format uses the numerals 1-4 to represent genotypes, the number zero to represent missing data, and the letter "h" to represent a heterozygous allele. That is, if an individual is heterozygous at a locus, both alleles should be "h" if the phasing (which allele falls on which chromosome) is uncertain.
Data from the HapMap Project can be dumped by region using the GBrowse interface. Downloading data requires user registration and agreement to the terms of use. The saved data file is in a marker-per-line format which can be loaded in Haploview.
GBrowse dumps only one file, which has one marker per line and which includes familial relationships among the HapMap samples as well as marker position information. The file format has several header lines (beginning with "#") which Haploview parses. Open the file by selecting "Browse HapMap Data" option and selecting the downloaded file.
The marker info file is two columns, marker name and position. The positions can be either absolute chromosomal coordinates or relative positions. It might look something like this:
marker01 190299 marker02 190950 marker03 191287 |
An optional third column can be included in the info file to make additional notes for specific SNPs. SNPs with additional information are highlighted in green on the LD display. For instance, you could make note that the first SNP is a coding variant as follows:
marker01 190299 CODING_SNP marker02 190950 marker03 191287 |
The "-batch" flag on the command line allows you to run Haploview automatically (in nogui mode) on several files. Batch input files should have one genotype file per line, along with an info file (if desired) separated by a space. Filenames must conform to the following rules:
The following example shows 2 pedfiles (with info files) and a hapmap file:
sample1.ped sample1.info sample2.ped sample2.info sample3.hmp |
For any given tab the information in the display can be saved. For the data check and association test tabs, a simple tab-delimited text file is generated from the tables. For the LD and Haplotype tabs, data can either be dumped to text files or the image can be saved to a PNG.
LD text output is a tab delimited set of columns containing the various measures of LD used by the program. Details for each column are shown below:
Details about additional options for this output type can be found below in the Export Options section.
When saving the LD table to a PNG, Haploview saves an image using the current display settings. This includes color scheme, zoom and proportional spacing. Thus, in order to save a less detailed image to a PNG, first zoom out, then export the tab. Note that Haploview cannot save large datasets at the higher zoom levels. For more information see the Export Options section below.
Haplotype output shows a block, its markers, the haplotypes and their population frequencies, the crossover percentages to the next block and the multiallelic D prime. Tag SNPs are denoted with a "!". Crossover percentages are shown as a matrix with this block's haplotypes as the rows and the next block's haplotypes as the columns. An example might look like:
BLOCK 1. MARKERS: 1 2 3! 4! 3312 (0.825) |0.800 0.025 0.000| 1144 (0.163) |0.031 0.125 0.007| 3342 (0.013) |0.006 0.000 0.006| Multiallelic Dprime: 0.802 BLOCK 2. MARKERS: 10! 11! 12 441 (0.837) 222 (0.150) 242 (0.013) |
In this example, the first block has 4 markers with 3 haplotypes displayed and the second block has 3 markers and 3 haplotypes. The tag SNPs for each block are (3,4) and (10,11) respectively. The crossover percentage matrix can be read as follows: 80% of all samples have the pattern 3312-441, 3.1% have the pattern 1144-441 and so forth.
Saving the haplotype tab to a PNG produces an image using the current display settings (such as haplotype frequency cutoff).
Single marker association results are saved in a tab-delimited text file with the following columns:
Trio (TDT) data only:
Case-Control data only:
Haplotype association text output is a tab-delimited file, broken into sections by block. The columns are:
Trio (TDT) data only:
Case-Control data only:
The marker check data is a tab-delimited file with the following columns:
The "Export Options" item in the File Menu allows adjustment of several parameters and allows the user to save any tab without having to switch to it. Specifically, the LD tab allow the markers to be filtered to output only some of the markers:
The default setting (and only one available for most tabs) is to use all the markers.
Generates the LD text or PNG file for only a specific range of markers.
Generates the LD text file for only adjacent markers. This can be useful to view the T-int stat, which measures LD information content in the gaps between markers.
There is also an option to generate a "compressed" LD PNG, which is useful for very large datasets. The image is shrunk to an arbitrary zoom level which allows Haploview to save the PNG with minimal memory usage.
You can specify a set of blocks by loading a blocks file. Each line is a space separated list of markers with one block per line. For example:
1 2 3 4 9 10 11 12 13 14 15 |
Would create one block from markers 1-4 and another from 9-15. The first marker in the file is number 1 (not 0).
You can add an analysis track along the top of the LD display by loading a file with two columns, <position> <value>. Haploview will plot the values continuously with respect to the positions of the markers, so the positions should use the same coordinates as the marker info file. For example:
1000 0.3 2000 1.7 3000 11.0 4000 2.3 5000 4.6 |
Would plot a line from position 1000 to 5000. The values can be of any units or magnitude, as the Haploview scales the analysis track to the bounds of the values.