Note I have provided working functions read_fasta
which can be used as-is to read in a fasta file and return a list of sequences as strings.
protein_freq.py
which will read in a protein Fasta format file Saccharomyces_cerevisiae.peps.fa
Using the Fasta files for the genomes ‘Ecoli_K-12.fasta’ and ‘B_subtilis_str_168.fasta’
Print out a table to compute frequency of all di-nucleotide combination (e.g. AA, AC, AG, AT, CA, CC, …).
Report should be tab delimited and look like
Motif Ecoli_K-12 B_subtilis_str_168
AA 7.28 9.85
Process a tabular BLAST report file ‘Ecoli-vs-Senterica.BLASTP.tab’ which has the following columns for a pairwise sequence alignment report
QUERYNAME SUBJECTNAME PERCENTID LENGTHALN NUMMISMATCHES GAPOPEN QSTART QEND SSTART SEND EVALUE BITSCORE
Only print the lines which match the criteria:
Update the report to add 1 more columns after the existing ones.
print out the new report on the STDOUT