Genomic arithmetic with BedTools - https://bedtools.readthedocs.io/en/latest/
Often want to ask questions about genomic ranges. For example.
Some starter example code:
https://github.com/biodataprog/GEN220_2019_examples/tree/master/Bioinformatics_1/Ranges
#!/usr/bin/bash
module load bedtools
bedtools intersect -a rice_chr6.fixed_Chr.gff -b rice_chr6_3kSNPs_filt.bed -wo > snp_gene_intersect.tab
# how many features have SNPS?
cut -f3 snp_gene_intersect.tab | sort | uniq -c
# how many SNPs does each gene have?
grep -P "\tgene\t" snp_gene_intersect.tab > snp_gene_intersect.genes_only.tab
# this outputs gene SNP counts ordered by genename which is actually chromosome
# position nicely
cut -f9 snp_gene_intersect.genes_only.tab | sed 's/^ID=//; s/;Name=.*//' | sort | uniq -c > gene_snp_count.txt
# which genes have the most snps?
sort -nr gene_snp_count.txt > gene_snps_count.by_number.txt