HOW TO... DO DIFFERENT THINGS USING GENOMEPOP


 

GenomePop is a flexible simulation tool. Here we will explain how to apply some of its facilities. The basic input file must be called GenomePopInput.txt. Inside this file the lines above identifiers beginning with '#' are coments.

 

 

 

a) Windows: double click the .exe file. Alternatively you can run from the command line redirecting the standard output: GenomePopv1.0.exe >> GPoutput.txt.
b) Linux: Just type ./genomepopLnx1.0.6e &. The linux executable was compiled in Debian 2.4.22-openmosix-2. Source code for compilation in any other linux/unix system is provided.
c) Mac: Just type ./genomepopv1.0.6e in the command line from the appropriate directory or double click if it is saved in your home directory. The Mac executable was compiled under OsX 10.5.6

In any case, the output data file will be written in the directory GenomePop_Results that will be created if it does not exist. The GenomePopInput.txt must be in the same directory as the executable.

 

The input file must be called GenomePopInput.txt. In such a file include one line beginning with the word 'chromsize'. Below this line add the following 7 values corresponding to:

  1. Chromosome Size
  2. Number of Chromosomes
  3. Initial population size
  4. Number of populations
  5. Number of generations
  6. Mutation rate per haploid genome
  7. Recombination rate per haploid genome
chromsize    numchroms    popsize    Numpops    maxgen    genome mut Rate    Rec rate per haploid genome
1000           1               1000              4         200             0.01                 0.0

The example in red will run 1 replicate (default value) of a two allele model (the default evolution model called JC2) with 4 isolated (default migration rate is 0) populations with 1000 haploid individuals each with 1 chromosome of size 1000 linked sites. The system will evolve during 200 generations with mutation rate per genome of 0.01 (0.00001 mutation rate per site). To add more runs just add the following line begining with the word 'runs':

runs     diploid     constantMetapopSize
100       false         true

Which will run the same haploid model but 100 replicates.

 

The allowed models are Jukes and Cantor with 2 alleles (JC2) or with 4 alleles (JC4) or the General Time Reversible Model (GTR, Tavare 1982; Zarkihk 1994). In the 4 allele models the default frequencies are 0.25 and the instantaneous transition rates are 1. In the example below a GTR model is defined with the indicated equilibrium frequencies and rates. The three frequencies are A C G. T is computed as 1 - (A + C +G) and doesn't need to be provided by the user. The rates are AC AG AT CG CT GT

model
GTR

freqs 
0.2 0.17 0.33

rates
3.0 5.0 0.9 1.3 5.3 1.0

 

Just change to 'false' the third value in the line beginning with 'runs'. Population size will vary due to migration and/or selection if they exist. However, the maximum allowed population size will be that defined above below the third identifier (the value below popsize or kmax).

runs     diploid     constantMetapopSize
100       false         false

 

differentPopSizes
100 200 12000

If the number of items under the identifier 'differentpopsizes' is lower than the number of populationns previously defined under the first line, as in the example (3 < 4), then the lacking populations will have the initially defined size. In the example the fourth population will have 1,000 individuals as previously defined. If the number of items is higher than the number of populations defined under the first line then only a number of 'numpops' will be read.

 

This will only change the identifier used for the ancestral allele. To define different SNP frequencies

Biological Model
viral

chromsize  numcroms  popsizeKmax  Numpops   maxgen   haploidGenomeMutRate  Recombination
10              1           1000              2              200          0.001                         5

runs    diploid     constantMetapopSize
1       true        false

flowmodel
ISLAND

migration
0.01

recurrentmut    retromutation
true            true

different pop sizes
100  120

sample size 
20

model
jc2

# next line fixes the whole pop 1 with allele 1 and the whole pop2 with allele 2

SNP freqs
1.0  0.0

This example defines 2 populations, with size 100 and 120, under a 2-allele model (JC2). Each population with different initial allele frequencies (the 'viral' model). The frequencies are defined under the identifier 'SNP freqs'. The items under this identifier should correspond with the number of populations. If this identifier does not appear and the biological model is 'viral', the frequencies are assumed to be 0.5 in all populations. If model is not 'viral' the frequencies are 1 (only allele 1 exists initially) for every population. In the example, 10 (chromsize=10) independent (recombination=10 x 0.5 = 5) 2-allele SNPs are defined with initial frequencies of 1 (allele 1 fixed in pop 1) and 0 (allele 2 fixed in pop 2). Therefore, note that in population 1 the ancestral allele will be '1' while the ancestral will be '2' in the population 2.

Note that this model is different of the "isf" one which defines different individual SNP mutant frequencies at a given population and position.

 

chromsize  numcroms  popsizeKmax  Numpops   maxgen   haploidGenomeMutRate  Recombination
1000              1           1000              2              2000          0.001                         0.5

runs    diploid     constantMetapopSize
1       true        true

migration rate
0.01

migrationModel
SteppingStone1

# recombination hot spots are, by now, only allowed for two allele models
# Recombination at region 0 to 100 is 0.05 but at region 500-600 is 0.01. The other regions having NO recombination
# NOTE that if recombination hot spot are defined the above general parameter 'Recombination' will be ignored

hotspots
0 100 0.05
500 600 0.01

# default substitution model is Jukes Cantor with 2 alleles (JC2). The output format is genpop format.

This example defines 2 populations, with equal constant size 1000, under a 2-allele model (JC2). There are a putative number of 1000 SNPs (chromsize=1000) however recombination hot-spots are defined so that the region 0 to 100 recombine at a rate of 0.05 and region 500 to 600 at a rate of 0.01. The rest of the genome is fully linked. Note that the value of recombination (0.5) in the first line is overwritten by the hotspot line.

 

chromsize  numcroms  popsizeKmax  Numpops   maxgen   haploidGenomeMutRate  Recombination
100              1           1000             1              2000          0.001                         0.01

runs    diploid     constantMetapopSize
1       true        true

# default substitution model is Jukes Cantor with 2 alleles (JC2). The output format is genpop format.

selpos	s
0	0.9
41	0.0
74	-0.5

This example defines 1 population, with equal constant size 1000, under a 2-allele model (JC2). The recombination rate through the 100 putative SNPs is 0.01. Additionally the mutations at the specified positions below the identifier 'selpos' will have a contribution to the fitness of 1-hs (if homozygous or haploid h=1). The fitness model is multiplicative and is scaled to a maximum fitness of 1. Fitness-based selection under a codon model is allowed. In this case to define specific codon positions under selection the identifier should be 'selaa' instead of selpos.

 

chromsize  numcroms  popsizeKmax  Numpops   maxgen   haploidGenomeMutRate  Recombination
10000              1           1000              1              200          0.001                         0.5

runs    diploid     constantMetapopSize
1       true        true

recurrentmut    retromutation
true            true

sample size 
20

model
GTR
freqs 
0.2 0.17 0.33
rates
3.0 5.0 0.9 1.3 5.3 1.0

# this defines a MG94 codon evolution model with dN/dS = 0.5
codon omega
true	0.5

As can be seen in the example the identifier 'codon' must be set with 'true' to run a MG94 codon model. The default omega is 1 and any desired different value must be introduced in the same line that the codon value. To compute the different codon changes, the nucleotide equilibrium frequencies are used. The defaults are 0.25. Other values can be introduced under the identifier 'freqs'. The codon model can be combined with any desired nucleotide model as for example GTR. If the chromosome size is not multiple of 3 the size will be shortened to get a multiple of 3.

 

To include migration with rate 0.01 under an island model just add:

migration
0.01

Because the island model is the default migration model. To set a stepping stone of 1 dimension add a 'flowmodel' identifier and the 'steppingstone' identifier below it. See Migration Models in GenomePop for information on user-defined migration models.

migration
0.01
flowmodel
steppingstone

 

model
jc4

sequence
>root    
ATGGGTGCGAGAGCGTCAGTATTAAGCGGGGGAGAATTAGA
>seq1    
ATGGGTGCGAGAGCGTCAGTATTAAGCGGGGGAGAATTAGA
>seq2    
ATGGGTGCGAGAGCGTCAGTATTAAGCGGGGGAGAATTAGA

If the user wish to include its own sequences the word "sequence" must be included in the input file at the begining of a line. The sequence format must be Fasta as in the example. The nucleotide model must be of 4 alleles (JC4 or GTR) otherwise the sequences will be ignored.

 

haplotypes
true

Different haplotypes instead of just sequences (that coud be equal) will be sampled. This option do not apply to the JC2 (2-allele) model.

 

independentSNPs
true

This line only will apply if the model is JC2. The SNPs will segregate independently whatever the recombination value. For example, the following lines define the settings to evolve 100,000 SNPs (chromsize) with a SNP mutation rate of 10^-7 (10^-2 /10^5) during 2N generations. Note the line with the 'independentSNPs' identifier that define each SNP as unlinked.

scaling
10

chromsize	numcroms	popsizeKmax	Numpops	maxgen 	mutRate per haploid genome      Rec per haploid genome
100000		1		1000		1	2000			0.01	     0.0

independentSNPs
true

sample size
15

recurrentmut retromut 
true	true

 


runs		diploid	constantMetapopSize
1		true		true

scaling
10

chromsize	numcroms	popsizeKmax	Numpops     maxgen 	mutRate per haploid genome     Rec
1000		100		     1000		1	       1000		0.05		      0.1

recurrentmut retromut 
false	false

The above simulation took 20 seconds in a Pentium 4 (3.2 GHz). The recombination 0.1 is per genome i.e. 0.001 per chromosome. If we consider 1cM per 1Mb this implies we are covering 0.001 x 100 = 0.1 Mb per chromosome. That is, a total genome of 10 Mb. The simulation time is reduced to just 3 seconds if we use a scaling of 20. There were no multiple hits under these settings.

 

biological model
viral	90
runs		diploid	constantMetapopSize
1		true		true

chromsize	numcroms	popsizeKmax	Numpops     maxgen 	mutRate per haploid genome     Rec
10000		1		     1000		3	       1000		0.1		      0.1

The above settings will define a viral model. The identity of the sequences between the 3 populations will be of 90%. The default identity under the viral model is undefined (random).

 


runs		diploid	constantMetapopSize
1		true		true

chromsize	numcroms	popsizeKmax	Numpops     maxgen 	mutRate per haploid genome     Rec
10000		1		     1000		3	       1000		0.1		      0.1

CEDS
1	1 20	2 
1	250 350 2000

The above settings will define, under the CEDS identifier, a bottleneck of size N = 2 in population 1 from generation 1 to 20 and an expansion of N = 2000 in the same population from generation 250 to 350. After that, the original population size of 1000 is recovered. The user can define as many lines as desired under the CEDS identifier. At each line, first item corresponds to the population number, two next to the generation period and the last one to the desired population size.

 


storeroot
true

runs		diploid	constantMetapopSize
1		true		true

chromsize	numcroms	popsizeKmax	Numpops     maxgen 	mutRate per haploid genome     Rec
10000		1		     1000		3	       1000		0.1		      0.1

The identifier storeroot set to true will save the original sequence in a GP_root.dat file.

 

Add a line with the isf identifier (initial snp frequency) and below as much as desired lines with 3 items representing population, position and heterozygote frequency. Note that the identifier for the first population should be 1 but is 0 for the first position.


isf	popid position freq
	1	20	0.1
	1	0	0.11
	1	10	0.21
	2	20	0.1


Note that this isf model is different of the SNP freqs one which define different SNPs ancestrals '1' or '2' between whole populations.

 

Back | Contact | ©2008 Antonio Carvajal Rodríguez