Bedtools coverage란 ?


Bedtools coverage는 A파일과 B파일의 범위와 깊이(features) 을 계산해 주는 tool 입니다.

좀 더 정확하게 말하자면 다음과 같은 4가지 요소를 알 수 있습니다.

1. 겹쳐지는 feature 의 수 (depth)

2. A파일에 대한 B파일의 적용 범위

3. A파일 feature의 길이

4. 백분율 


다음 그림을 보시면 조금 더 이해가 쉬우실 것 같습니다.







Usage


bedtools coverage [Options] –a <File A>  –b <FIle B...> > output


★ File 형식으로는 BAM, BED, GFF, VCF 파일을 사용 할 수 있습니다.


OptionDescription
-aBAM/BED/GFF/VCF file “A”. Each feature in A is compared to B in search of overlaps. Use “stdin” if passing A with a UNIX pipe.
-bOne or more BAM/BED/GFF/VCF file(s) “B”. Use “stdin” if passing B with a UNIX pipe. NEW!!!: -b may be followed with multiple databases and/or wildcard (*) character(s).
-abamBAM file A. Each BAM alignment in A is compared to B in search of overlaps. Use “stdin” if passing A with a UNIX pipe: For example: samtools view -b <BAM> | bedtools intersect -abam stdin -b genes.bed. Note: no longer necessary after version 2.19.0
-hist
Report a histogram of coverage for each feature in A as well as a summary histogram for _all_ features in A.
Output (tab delimited) after each feature in A:
1) depth
2) # bases at depth
3) size of A
4) % of A at depth
-dReport the depth at each position in each A feature. Positions reported are one based. Each position and depth follow the complete A feature.
-countsOnly report the count of overlaps, don’t compute fraction, etc. Restricted by -f and -r.
-fMinimum overlap required as a fraction of A. Default is 1E-9 (i.e. 1bp).
-FMinimum overlap required as a fraction of B. Default is 1E-9 (i.e., 1bp).
-rRequire that the fraction of overlap be reciprocal for A and B. In other words, if -f is 0.90 and -r is used, this requires that B overlap at least 90% of A and that A also overlaps at least 90% of B.
-eRequire that the minimum fraction be satisfied for A _OR_ B. In other words, if -e is used with -f 0.90 and -F 0.10 this requires that either 90% of A is covered OR 10% of B is covered. Without -e, both fractions would have to be satisfied.
-sForce “strandedness”. That is, only report hits in B that overlap A on the same strand. By default, overlaps are reported without respect to strand.
-SRequire different strandedness. That is, only report hits in B that overlap A on the _opposite_ strand. By default, overlaps are reported without respect to strand.
-splitTreat “split” BAM (i.e., having an “N” CIGAR operation) or BED12 entries as distinct BED intervals.
-sortedFor very large B files, invoke a “sweeping” algorithm that requires position-sorted (e.g., sort -k1,1 -k2,2n for BED files) input. When using -sorted, memory usage remains low even for very large files.
-gSpecify a genome file the defines the expected chromosome order in the input files for use with the -sortedoption.
-headerPrint the header from the A file prior to results.
-sortoutWhen using multiple databases (-b), sort the output DB hits for each record.
-nobufDisable buffered output. Using this option will cause each line of output to be printed as it is generated, rather than saved in a buffer. This will make printing large output files noticeably slower, but can be useful in conjunction with other software tools and scripts that need to process one line of bedtools output at a time.
-iobufFollow with desired integer size of read buffer. Optional suffixes K/M/G supported. Note: currently has no effect with compressed files.





'Bioinformatics' 카테고리의 다른 글

bbmap 세팅 메뉴얼  (0) 2019.01.07
NCBI blast+ local install OS Linux  (0) 2018.12.13
Miso Analysis  (0) 2018.10.29
SAM FILE Format  (0) 2018.10.18
BIGWIG FILE  (0) 2018.10.12

+ Recent posts