[Tip]awk笔记
指定输出的域分隔符:
awk -vOFS="\t" '{print $1,$2,$3,$4,$6}' awk 'BEGIN { OFS = "\t" } {print $1,$2,$3,$4,$6}'
指定输出的域分隔符:
awk -vOFS="\t" '{print $1,$2,$3,$4,$6}' awk 'BEGIN { OFS = "\t" } {print $1,$2,$3,$4,$6}'
说是整理,其实,先粘过来再说。
* N50 is calculated by first ordering all contigs by size and then adding the lengths (starting from the longest contig) until the summed length exceeds 50% of the total length of all contigs.
http://www.opensubscriber.com/message/r-help@r-project.org/10844139.html
阅读全文…
How to make Notepad use utf8 as default encoding for new Text file?
It is easy since whenever Notepad open an utf8 text file that with BOM, it will use utf8 for that file. And when we create a new .txt file with new->Text file, we just use the default NullFile.
So, what if the NewFile.txt is EFBBBF instead of an null file ?
Well, the problem just solved.
阅读全文…
http://stackoverflow.com/questions/1430548
I am writing a perl module “Galaxy::SGE::MakeJobSH” with OO.
I want to use MakeJobSH -> new() instead of Galaxy::SGE::MakeJobSH -> new(), or some other shortnames.
So, anyway to do it ?
You can suggest that your users use the aliased module to load yours:
use aliased 'Galaxy::SGE::MakeJobSH'; my $job = MakeJobSH->new();
Or you could export your class name in a variable named $MakeJobSH;
use Galaxy::SGE::MakeJobSH; # Assume this exports $MakeJobSH = 'Galaxy::SGE::MakeJobSH'; my $job = $MakeJobSH->new();
Or you could export a MakeJobSH function that returns your class name:
use Galaxy::SGE::MakeJobSH; # Assume this exports the MakeJobSH function my $job = MakeJobSH->new();
I’m not sure this is all that great an idea, though. People don’t usually have to type the class name all that often.
Here’s what you’d do in your class for the last two options:
package Galaxy::SGE::MakeJobSH; use Exporter 'import'; our @EXPORT = qw(MakeJobSH $MakeJobSH); our $MakeJobSH = __PACKAGE__; sub MakeJobSH () { __PACKAGE__ };
Of course, you’d probably want to pick just one of those methods. I’ve just combined them to avoid duplicating examples.
ll -rt ./_log/*.o* |awk '{if($5==84){print $9}}' | perl -ne '/(GP.*Chr.*\.sh)/;print "$1\n";' | while read a; do find . -name "$a" ;done | while read ss; do qsub -l vf=280M -cwd $ss;done
……
原因嘛,有些计算节点默认的SHELL不是bash,如果没shell-bang就出错到STDOUT:
Warning: no access to tty (Bad file descriptor). Thus no job control in this shell.
也有可能是其他原因,反正就这了……
后记,这样还是不行。也不知道计算节点的配置到底是出啥问题了。
后后记:结果嘛,# -S /bin/bash
正则表达式也算用了不少了,该耍耍高级的了……
1. 把样品名按最后一组数字排序。
sub sortlibname { # Not many libs, so no Schwartzian needed. my ($aa,$bb,$at,$bt); $a =~ /^.*((?<=\D)\d+)(\D*)/; # for AR2202-3m, we need (3,m). $aa=$1;$at=$2; $b =~ /^.*((?<=\D)\d+)(\D*)/; $bb=$1;$bt=$2; $aa <=> $bb || $at cmp $bt; } foreach my $sample (sort sortlibname keys %hSampleLib) {}
2. 把对chromosome的所有缩写,从头3个字符到全写的都给抠掉。(这个有些EP了……)
my ($chrid,$svtype,$start,$end)=(split /\t/)[0,1,4,5]; $chrid =~ s/^chr (?> ((?<=^chr)o)? ((?<=^chro)m)? ((?<=^chrom)o)? ((?<=^chromo)s)? ((?<=^chromos)o)? ((?<=^chromoso)m)? ((?<=^chromosom)e)? )//xi;
优化?目前就这水平,以后再慢慢啃。
要查3个表,虽然SQLite还比较快,可那句”SELECT score FROM dbCNSblk$opt_s WHERE chrid=? AND ? BETWEEN begin AND end”还是太慢,只好塞内存。
反正是为了效率,不如来玩玩C,所以……
h2xs -A -n ChromByte
ChromByte/ChromByte.xs
#include "EXTERN.h" #include "perl.h" #include "XSUB.h" #include "ppport.h" #include <malloc .h> MODULE = ChromByte PACKAGE = ChromByte long initchr(len) int len CODE: void *address = malloc( len + 1 ); memset( address , 0 , len + 1 ); RETVAL = (long)address; OUTPUT: RETVAL void setbases( address, begin, end, val ) long address int begin int end int val CODE: char * buf = ( char * ) address ; memset( buf + begin , val , end - begin + 1 ); int getbase( address, pos ) long address int pos CODE: char * buf = ( char * ) address ; RETVAL = *( buf + pos ); OUTPUT: RETVAL void freechr( address ) long address CODE: void * buf = ( void * ) address ; free( buf ); </malloc>
还是说明下,这段C是找李旭帮忙写,然后我改的。本人目前还是会读C但没记语法的阶段(貌似已经这样8年了……)。
有空一定要去把C和Cpp补完!
近期评论