artadia: September 2019

Gibbsf90test is the newest Gibbs Sampler of the blupf90 distribution. It is usually used to estimate variance components and sometimes to infer breeding values. However, the MCMC framework and it can be used, for instance, to get estimates of the genetic trend with Bayesian incertitude, or estimates of the contrast between two levels of the same fixed effect. This can be done coupling the inner loops of the Gibbs Sampler, that produce samples of the EBV, variance components, etc, with an external program that will get samples of quantity of interest from them.

We have to use the new option

OPTION external_halfway_script ./pepe 100

this says that every 100 iterations gibbsf90test stops, write to file samples of variances, unknowns and thresholds, all in file last_solutions.txt, that looks like

$ head last_solutions.txt

10000 348226670 76922156

trait effect level solution

1 1 1 191.3774026

1 1 2 176.2276688

1 1 3 147.9162130

1 1 4 113.5460502

1 1 5 144.9539913

1 1 6 148.2475683

1 1 7 125.5390674

1 1 8 172.5408439

$ tail last_solutions.txt

1 6 34584 3.789916994

1 6 34585 59.69961280

1 6 34586 37.09941679

1 6 34587 -34.31674133

1 6 34588 48.62475154

1 6 34589 -20.61125146

1 6 34590 -7.197029014

round variances thresholds

10000 3

530.7338772 591.4818928 1372.611864

then, wait for external process '''name''' (in this case pepe) to run and then restart.

The program pepe is a shell script:

#!/bin/bash

awk '$2==5' last_solutions.txt > ebvs

# select sires and put generation number in front of the ebvs

./put_gen_ebvs.awk males_per_gen ebvs > ebvs_with_gen

./meansby ebvs_with_gen 4 5 >> trend

which in this particular case is condensing average ebv's per generation, like:

-10.1759 -23.0505 -20.9627 -17.4328 -5.93856 -2.26723 19.2128 25.6035 35.5568 38.4155 32.5992

-10.3355 -22.6277 -20.2801 -20.1763 -6.22339 -1.44617 14.995 23.444 33.3165 34.7167 49.2152

-15.2105 -24.0234 -17.4057 -17.352 -5.32282 -0.854277 19.2186 28.2536 36.6748 44.6192 42.8246

-10.912 -21.8507 -19.1843 -18.6449 -4.84831 -1.31681 16.9725 28.5895 37.5614 41.3514 37.848

-13.2666 -24.3615 -22.9572 -18.5257 -6.43787 -3.17327 15.8594 22.785 36.065 34.9368 34.9344

-18.4349 -23.5337 -22.4248 -17.3172 -8.06582 -4.8434 14.9129 22.3128 34.6 34.9195 37.5187

-13.485 -25.4537 -21.1887 -19.5909 -7.28584 -3.8915 17.3923 25.9329 35.2859 38.0994 64.6005

this are samples of the means per generation (each column is one generation, each row is one sample), and thus we can make means +- se per generation. More interestingly, we can get the posterior distribution of the mean of the generation 10 minus the mean of the generation 5:

$ awk '{print $10-$5}' trends | ~/julia ~/save/scripts/myHist.jl

┌ ┐

[32.0, 34.0) ┤▇ 1

[34.0, 36.0) ┤▇▇▇▇ 3

[36.0, 38.0) ┤▇▇▇▇▇▇▇▇▇ 7

[38.0, 40.0) ┤▇▇▇▇▇▇▇▇▇ 7

[40.0, 42.0) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 18

[42.0, 44.0) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 19

[44.0, 46.0) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 27

[46.0, 48.0) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 12

[48.0, 50.0) ┤▇▇▇▇▇▇▇▇ 6

└ ┘

I find boring (and slow) to open R to draw an histogram of my data. I have been looking for histograms in command line for some time, and finally I decided to use a small Julia script that uses UnicodePlots.

The script myHist.jl is as simple as this:

@time a=parse.(Float64,chomp.(readlines()))

using UnicodePlots

@time print(histogram(a),"\n")

and it runs beatifully:

awk '{print $1}' ~/save/mtr_2015/renf90.dat | julia myHist.jl

  1.442402 seconds (7.46 M allocations: 402.056 MiB, 19.39% gc time)

                    ┌                                        ┐ 

   [   0.0,  500.0) ┤▇▇ 29256                                  

   [ 500.0, 1000.0) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 193081                     

   [1000.0, 1500.0) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 445996    

   [1500.0, 2000.0) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 454985   

   [2000.0, 2500.0) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 311294             

   [2500.0, 3000.0) ┤▇▇▇▇▇▇▇▇▇▇▇▇ 166513                       

   [3000.0, 3500.0) ┤▇▇▇▇▇ 69611                               

   [3500.0, 4000.0) ┤▇▇ 23819                                  

   [4000.0, 4500.0) ┤ 6778                                     

   [4500.0, 5000.0) ┤ 1721                                     

   [5000.0, 5500.0) ┤ 367                                      

   [5500.0, 6000.0) ┤ 76                                       

   [6000.0, 6500.0) ┤ 13                                       

   [6500.0, 7000.0) ┤ 4                                        

   [7000.0, 7500.0) ┤ 0                                        

   [7500.0, 8000.0) ┤ 1                                        

└

artadia

Tuesday, September 24, 2019

inferences within Gibbs Sampler in gibbsf90test

Thursday, September 19, 2019

plot histogram on command line