Friday, December 9, 2016

Sires in validation and Sires in training

I want to test metafounders for genomic selection. So I have these MTR sires that have daughters in the validation , in the training, or in both of them, and I want a file with these numbers. I have two files:

T is:
juan 10
pepe 5


V is:

luis 6
pepe 4


so luis has no daughters on Training, and I want this to appear. Then I use this pipeline from Toni Reverter from CSIRO:

awk '{print $1}' $1 $2 | sort -u | join -a1 - $1 | awk '{print $1, (NF==2?$2:0)}' | join -a1 - $2 | awk '{print $1, $2, (NF==3?$3:0)}'


which gives

putterri:yarp andres$ ./joinToni.sh V T
juan 0 10
luis 6 0

pepe 4 5





Install GNU coreutils in the Mac

The MacOSX join is poorer than GNU one. So I try toinstall it using macports. First I have to update the data base of port:

putterri:mf andres$ port -v selfupdate

But this does not work. Then I try


putterri:mf andres$ sudo port -d selfupdate

This does work. Join is actually part of coreutils:


putterri:mf andres$ port search --name coreutils
coreutils @8.25 (sysutils)
    GNU File, Shell, and Text utilities

xml-coreutils @0.8.1_1 (textproc, xml)
    Command line tools for XML processing

Found 2 ports.

So I do install coreutils:


putterri:mf andres$ sudo port install coreutils

The most important information is at the end:

The tools provided by GNU coreutils are prefixed with the character 'g' by default to distinguish them from the BSD commands.
For example, cp becomes gcp and ls becomes gls.

If you want to use the GNU tools by default, add this directory to the front of your PATH environment variable:
    /opt/local/libexec/gnubin/