2020 Edition of the Class
Write a program sevenless.py
to print out all the numbers from 0 to
99, one on each line, except, do not print any number perfectly
divisible by 7.
Compute let’s calculate some statistics from this GFF file which lists the location of genes and exons locations. Remember GFF is a structured format, tab delimited, which describes locations of features in a genome.
Here is a GFF file for the E. coli K-12 genome. ftp://ftp.ensemblgenomes.org/pub/bacteria/release-45/gff3/bacteria_0_collection/escherichia_coli_str_k_12_substr_mg1655/Escherichia_coli_str_k_12_substr_mg1655.ASM584v2.37.gff3.gz
Here is a FASTA file for the genome of E. coli K-12. ftp://ftp.ensemblgenomes.org/pub/bacteria/release-45/fasta/bacteria_0_collection/escherichia_coli_str_k_12_substr_mg1655/dna/Escherichia_coli_str_k_12_substr_mg1655.ASM584v2.dna.chromosome.Chromosome.fa.gz
Write a script called count_up.py
to:
curl
command from within your script. But if this doesn’t make sense to you, you can remove that.Use the following files to examine codon usage across these two bacteria. Remember that codons are triplets (eg ACA, GAT, …). There are 64 total possible triplets. To count these, know that they are non-overlapping sets of three adjacent bases in the sequences, start with the very first base as the reading frame.
These files are coding sequences of the predicted genes in each of two species.
Write a script called codon_compute.py
. You can download the data outside of the python script or you can include these steps in your script. I already wrote part of this for you in the template code you can start with that executes a curl
command from within your script.
The code you write will need to process these files in order to print out the following information: