Tuesday, September 24, 2019

inferences within Gibbs Sampler in gibbsf90test

Gibbsf90test is the newest Gibbs Sampler of the blupf90 distribution. It is usually used to estimate variance components and sometimes to infer breeding values. However, the MCMC framework and it can be used, for instance, to get estimates of the genetic trend with Bayesian incertitude, or estimates of the contrast between two levels of the same fixed effect. This can be done coupling the inner loops of the Gibbs Sampler, that produce samples of the EBV, variance components, etc, with an external program that will get samples of quantity of interest from them.

We have to use the new option


OPTION external_halfway_script ./pepe 100

this says that every 100 iterations gibbsf90test stops, write to file samples of variances, unknowns  and thresholds, all in file last_solutions.txt, that looks like

$ head last_solutions.txt 
               10000           348226670           76922156
trait effect level  solution
  1  1       1   191.3774026      
  1  1       2   176.2276688      
  1  1       3   147.9162130      
  1  1       4   113.5460502      
  1  1       5   144.9539913      
  1  1       6   148.2475683      
  1  1       7   125.5390674      
  1  1       8   172.5408439      
$ tail last_solutions.txt 
  1  6   34584   3.789916994      
  1  6   34585   59.69961280      
  1  6   34586   37.09941679      
  1  6   34587  -34.31674133      
  1  6   34588   48.62475154      
  1  6   34589  -20.61125146      
  1  6   34590  -7.197029014      
round variances thresholds
     10000                   3

   530.7338772         591.4818928         1372.611864     

then, wait for external process '''name''' (in this case pepe) to run and then restart.

The program pepe is a shell script:


#!/bin/bash
awk '$2==5' last_solutions.txt > ebvs
# select sires and put generation number in front of the ebvs
./put_gen_ebvs.awk males_per_gen ebvs > ebvs_with_gen
./meansby ebvs_with_gen 4 5 >> trend


which in this particular case is condensing average ebv's per generation, like:


-10.1759 -23.0505 -20.9627 -17.4328 -5.93856 -2.26723 19.2128 25.6035 35.5568 38.4155 32.5992
-10.3355 -22.6277 -20.2801 -20.1763 -6.22339 -1.44617 14.995 23.444 33.3165 34.7167 49.2152
-15.2105 -24.0234 -17.4057 -17.352 -5.32282 -0.854277 19.2186 28.2536 36.6748 44.6192 42.8246
-10.912 -21.8507 -19.1843 -18.6449 -4.84831 -1.31681 16.9725 28.5895 37.5614 41.3514 37.848
-13.2666 -24.3615 -22.9572 -18.5257 -6.43787 -3.17327 15.8594 22.785 36.065 34.9368 34.9344
-18.4349 -23.5337 -22.4248 -17.3172 -8.06582 -4.8434 14.9129 22.3128 34.6 34.9195 37.5187
-13.485 -25.4537 -21.1887 -19.5909 -7.28584 -3.8915 17.3923 25.9329 35.2859 38.0994 64.6005


this are samples of the means per generation (each column is one generation, each row is one sample), and thus we can make means +- se per generation. More interestingly, we can get the posterior distribution of the mean of the generation 10 minus the mean of the generation 5:


$ awk '{print $10-$5}' trends | ~/julia ~/save/scripts/myHist.jl 

                                                         
   [32.0, 34.0) ┤▇ 1                                       
   [34.0, 36.0) ┤▇▇▇▇ 3                                    
   [36.0, 38.0) ┤▇▇▇▇▇▇▇▇▇ 7                               
   [38.0, 40.0) ┤▇▇▇▇▇▇▇▇▇ 7                               
   [40.0, 42.0) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 18               
   [42.0, 44.0) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 19              
   [44.0, 46.0) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 27   
   [46.0, 48.0) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 12                       
   [48.0, 50.0) ┤▇▇▇▇▇▇▇▇ 6                                
                                                         

Thursday, September 19, 2019

plot histogram on command line

I find boring (and slow) to open R to draw an histogram of my data. I have been looking for histograms in command line for some time, and finally I decided to use a small Julia script that uses UnicodePlots.

The script myHist.jl is as simple as this:

@time a=parse.(Float64,chomp.(readlines()))
using UnicodePlots
@time print(histogram(a),"\n")


and it runs beatifully:

awk '{print $1}' ~/save/mtr_2015/renf90.dat | julia myHist.jl



  1.442402 seconds (7.46 M allocations: 402.056 MiB, 19.39% gc time)
                    ┌                                        ┐ 
   [   0.0,  500.0) ┤▇▇ 29256                                  
   [ 500.0, 1000.0) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 193081                     
   [1000.0, 1500.0) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 445996    
   [1500.0, 2000.0) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 454985   
   [2000.0, 2500.0) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 311294             
   [2500.0, 3000.0) ┤▇▇▇▇▇▇▇▇▇▇▇▇ 166513                       
   [3000.0, 3500.0) ┤▇▇▇▇▇ 69611                               
   [3500.0, 4000.0) ┤▇▇ 23819                                  
   [4000.0, 4500.0) ┤ 6778                                     
   [4500.0, 5000.0) ┤ 1721                                     
   [5000.0, 5500.0) ┤ 367                                      
   [5500.0, 6000.0) ┤ 76                                       
   [6000.0, 6500.0) ┤ 13                                       
   [6500.0, 7000.0) ┤ 4                                        
   [7000.0, 7500.0) ┤ 0                                        
   [7500.0, 8000.0) ┤ 1                                        
                    └