0.- HOW TO RUN GENOMEPOP
a) Windows: double click the .exe file.
Alternatively you can run from the command line redirecting the standard output:
GenomePopv1.0.exe >> GPoutput.txt.
b) Linux: Just type ./genomepopLnx1.0.6e &. The linux executable was compiled in Debian 2.4.22-openmosix-2. Source code for compilation in any other linux/unix system is provided.
c) Mac: Just type ./genomepopv1.0.6e in the command line from the appropriate directory or double click if it is saved in your home directory.
The Mac executable was compiled under OsX 10.5.6
In any case, the output data file will be written in the directory GenomePop_Results that will be created if it does not exist.
The GenomePopInput.txt must be in the same directory as the executable.
 
1.- HOW TO SET A VALID INPUT FILE IN GENOMEPOP
The input file must be called GenomePopInput.txt. In such a file include one line beginning with the word 'chromsize'. Below this line add the following 7 values corresponding to:
- Chromosome Size
- Number of Chromosomes
- Initial population size
- Number of populations
- Number of generations
- Mutation rate per haploid genome
- Recombination rate per haploid genome
chromsize numchroms popsize Numpops maxgen genome mut Rate Rec rate per haploid genome 1000 1 1000 4 200 0.01 0.0
The example in red will run 1 replicate (default value) of a two allele model (the default evolution model called JC2) with 4 isolated (default migration rate is 0) populations with 1000 haploid individuals each with 1 chromosome of size 1000 linked sites. The system will evolve during 200 generations with mutation rate per genome of 0.01 (0.00001 mutation rate per site). To add more runs just add the following line begining with the word 'runs':
runs diploid constantMetapopSize 100 false true
Which will run the same haploid model but 100 replicates.
 
2.- HOW TO DEFINE DIFFERENT ALLELE AND NUCLEOTIDE EVOLUTION MODELS
The allowed models are Jukes and Cantor with 2 alleles (JC2) or with 4 alleles (JC4) or the General Time Reversible Model (GTR, Tavare 1982; Zarkihk 1994). In the 4 allele models the default frequencies are 0.25 and the instantaneous transition rates are 1. In the example below a GTR model is defined with the indicated equilibrium frequencies and rates. The three frequencies are A C G. T is computed as 1 - (A + C +G) and doesn't need to be provided by the user. The rates are AC AG AT CG CT GT
model GTR freqs 0.2 0.17 0.33 rates 3.0 5.0 0.9 1.3 5.3 1.0
 
3.- HOW TO DEFINE VARIABLE POPULATION SIZE
Just change to 'false' the third value in the line beginning with 'runs'. Population size will vary due to migration and/or selection if they exist. However, the maximum allowed population size will be that defined above below the third identifier (the value below popsize or kmax).
runs diploid constantMetapopSize 100 false false
 
4.- HOW TO DEFINE POPULATIONS WITH DIFFERENT SIZES
differentPopSizes 100 200 12000
If the number of items under the identifier 'differentpopsizes' is lower than the number of populationns previously defined under the first line, as in the example (3 < 4), then the lacking populations will have the initially defined size. In the example the fourth population will have 1,000 individuals as previously defined. If the number of items is higher than the number of populations defined under the first line then only a number of 'numpops' will be read.
 
5.- HOW TO DEFINE DIFFERENT SNPs ANCESTRAL ALLELES IN DIFFERENT POPULATIONS
This will only change the identifier used for the ancestral allele. To define different SNP frequencies see point 17
Biological Model viral chromsize numcroms popsizeKmax Numpops maxgen haploidGenomeMutRate Recombination 10 1 1000 2 200 0.001 5 runs diploid constantMetapopSize 1 true false flowmodel ISLAND migration 0.01 recurrentmut retromutation true true different pop sizes 100 120 sample size 20 model jc2 # next line fixes the whole pop 1 with allele 1 and the whole pop2 with allele 2 SNP freqs 1.0 0.0
This example defines 2 populations, with size 100 and 120, under a 2-allele model (JC2). Each population with different initial allele frequencies (the 'viral' model). The
frequencies are defined under the identifier 'SNP freqs'. The items under this identifier should correspond with the number of populations.
If this identifier does not appear and the biological model is 'viral', the frequencies are assumed to be 0.5 in all populations. If model is not 'viral'
the frequencies are 1 (only allele 1 exists initially) for every population. In the example, 10 (chromsize=10) independent (recombination=10 x 0.5 = 5) 2-allele SNPs
are defined with initial frequencies of 1 (allele 1 fixed in pop 1) and 0 (allele 2 fixed in pop 2). Therefore, note that in population 1 the ancestral allele will be
'1' while the ancestral will be '2' in the population 2.
Note that this model is different of the "isf" one (see howto 17) which defines different individual SNP mutant frequencies at a given population and position.
 
6.- HOW TO DEFINE RECOMBINATION HOT-SPOTS
chromsize numcroms popsizeKmax Numpops maxgen haploidGenomeMutRate Recombination 1000 1 1000 2 2000 0.001 0.5 runs diploid constantMetapopSize 1 true true migration rate 0.01 migrationModel SteppingStone1 # recombination hot spots are, by now, only allowed for two allele models # Recombination at region 0 to 100 is 0.05 but at region 500-600 is 0.01. The other regions having NO recombination # NOTE that if recombination hot spot are defined the above general parameter 'Recombination' will be ignored hotspots 0 100 0.05 500 600 0.01 # default substitution model is Jukes Cantor with 2 alleles (JC2). The output format is genpop format.
This example defines 2 populations, with equal constant size 1000, under a 2-allele model (JC2). There are a putative number of 1000 SNPs (chromsize=1000) however recombination hot-spots are defined so that the region 0 to 100 recombine at a rate of 0.05 and region 500 to 600 at a rate of 0.01. The rest of the genome is fully linked. Note that the value of recombination (0.5) in the first line is overwritten by the hotspot line.
 
7.- HOW TO DEFINE SNPs OR NUCLEOTIDES UNDERGOING SELECTIVE PRESSURE
chromsize numcroms popsizeKmax Numpops maxgen haploidGenomeMutRate Recombination 100 1 1000 1 2000 0.001 0.01 runs diploid constantMetapopSize 1 true true # default substitution model is Jukes Cantor with 2 alleles (JC2). The output format is genpop format. selpos s 0 0.9 41 0.0 74 -0.5
This example defines 1 population, with equal constant size 1000, under a 2-allele model (JC2). The recombination rate through the 100 putative SNPs is 0.01. Additionally the mutations at the specified positions below the identifier 'selpos' will have a contribution to the fitness of 1-hs (if homozygous or haploid h=1). The fitness model is multiplicative and is scaled to a maximum fitness of 1. Fitness-based selection under a codon model is allowed. In this case to define specific codon positions under selection the identifier should be 'selaa' instead of selpos.
 
8.- HOW TO DEFINE A CODON MODEL
chromsize numcroms popsizeKmax Numpops maxgen haploidGenomeMutRate Recombination 10000 1 1000 1 200 0.001 0.5 runs diploid constantMetapopSize 1 true true recurrentmut retromutation true true sample size 20 model GTR freqs 0.2 0.17 0.33 rates 3.0 5.0 0.9 1.3 5.3 1.0 # this defines a MG94 codon evolution model with dN/dS = 0.5 codon omega true 0.5
As can be seen in the example the identifier 'codon' must be set with 'true' to run a MG94 codon model. The default omega is 1 and any desired different value must be introduced in the same line that the codon value. To compute the different codon changes, the nucleotide equilibrium frequencies are used. The defaults are 0.25. Other values can be introduced under the identifier 'freqs'. The codon model can be combined with any desired nucleotide model as for example GTR. If the chromosome size is not multiple of 3 the size will be shortened to get a multiple of 3.
 
9.- HOW TO SET MIGRATION
To include migration with rate 0.01 under an island model just add:
migration 0.01
Because the island model is the default migration model. To set a stepping stone of 1 dimension add a 'flowmodel' identifier and the 'steppingstone' identifier below it. See Migration Models in GenomePop for information on user-defined migration models.
migration 0.01 flowmodel steppingstone
 
10.- HOW TO INCLUDE YOUR OWN SEQUENCES
model jc4 sequence >root ATGGGTGCGAGAGCGTCAGTATTAAGCGGGGGAGAATTAGA >seq1 ATGGGTGCGAGAGCGTCAGTATTAAGCGGGGGAGAATTAGA >seq2 ATGGGTGCGAGAGCGTCAGTATTAAGCGGGGGAGAATTAGA
If the user wish to include its own sequences the word "sequence" must be included in the input file at the begining of a line. The sequence format must be Fasta as in the example. The nucleotide model must be of 4 alleles (JC4 or GTR) otherwise the sequences will be ignored.
 
11.- HOW TO SAMPLE HAPLOTYPES
haplotypes true
Different haplotypes instead of just sequences (that coud be equal) will be sampled. This option do not apply to the JC2 (2-allele) model.
 
12.- HOW TO DEFINE INDEPENDENTLY SEGREGATING SNPS
independentSNPs true
This line only will apply if the model is JC2. The SNPs will segregate independently whatever the recombination value. For example, the following lines define the settings to evolve 100,000 SNPs (chromsize) with a SNP mutation rate of 10^-7 (10^-2 /10^5) during 2N generations. Note the line with the 'independentSNPs' identifier that define each SNP as unlinked.
scaling 10 chromsize numcroms popsizeKmax Numpops maxgen mutRate per haploid genome Rec per haploid genome 100000 1 1000 1 2000 0.01 0.0 independentSNPs true sample size 15 recurrentmut retromut true true
 
13.- HOW TO EFFICIENTLY SIMULATE A NUMBER OF SNPs COVERING 10 Mb
runs diploid constantMetapopSize 1 true true scaling 10 chromsize numcroms popsizeKmax Numpops maxgen mutRate per haploid genome Rec 1000 100 1000 1 1000 0.05 0.1 recurrentmut retromut false false
The above simulation took 20 seconds in a Pentium 4 (3.2 GHz). The recombination 0.1 is per genome i.e. 0.001 per chromosome. If we consider 1cM per 1Mb this implies we are covering 0.001 x 100 = 0.1 Mb per chromosome. That is, a total genome of 10 Mb. The simulation time is reduced to just 3 seconds if we use a scaling of 20. There were no multiple hits under these settings.
 
14.- HOW TO DEFINE SEQUENCES WITH A GIVEN PERCENTAGE OF IDENTITY BETWEEN POPULATIONS
biological model viral 90 runs diploid constantMetapopSize 1 true true chromsize numcroms popsizeKmax Numpops maxgen mutRate per haploid genome Rec 10000 1 1000 3 1000 0.1 0.1
The above settings will define a viral model. The identity of the sequences between the 3 populations will be of 90%. The default identity under the viral model is undefined (random).
 
15.- HOW TO DEFINE BOTTLENECK AND/OR EXPANSION SCENARIOS
runs diploid constantMetapopSize 1 true true chromsize numcroms popsizeKmax Numpops maxgen mutRate per haploid genome Rec 10000 1 1000 3 1000 0.1 0.1 CEDS 1 1 20 2 1 250 350 2000
The above settings will define, under the CEDS identifier, a bottleneck of size N = 2 in population 1 from generation 1 to 20 and an expansion of N = 2000 in the same population from generation 250 to 350. After that, the original population size of 1000 is recovered. The user can define as many lines as desired under the CEDS identifier. At each line, first item corresponds to the population number, two next to the generation period and the last one to the desired population size.
 
16.- HOW TO STORE THE ROOT SEQUENCE
storeroot true runs diploid constantMetapopSize 1 true true chromsize numcroms popsizeKmax Numpops maxgen mutRate per haploid genome Rec 10000 1 1000 3 1000 0.1 0.1
The identifier storeroot set to true will save the original sequence in a GP_root.dat file.
 
17.- HOW TO DEFINE SPECIFIC SNPs AT A GIVEN FREQUENCY
Add a line with the isf identifier (initial snp frequency) and below as much as desired lines with 3 items representing population, position and heterozygote frequency. Note that the identifier for the first population should be 1 but is 0 for the first position.
isf popid position freq 1 20 0.1 1 0 0.11 1 10 0.21 2 20 0.1
Note that this isf model is different of the SNP freqs (see howto 5) one which define different SNPs ancestrals '1' or '2' between whole populations.