Write a program sevenless.py
to print out all the numbers from 0 to
99, one on each line, except do not print any number perfectly
divisible by 7
Write a script open_shut.py
to write a new file called ‘closed.txt’
A database of all movies file is located at https://datasets.imdbws.com/title.basics.tsv.gz or you can use the already downloaded file at /bigdata/gen220/shared/simple/title.basics.tsv.gz
In this file please have it print out:
Compute let’s calculate some statistics from this GFF file which lists the location of genes and exons locations. Remember GFF is a structured format, tab delimited, which describes locations of features in a genome.
Here is a GFF file for the E. coli K-12 genome. ftp://ftp.ensemblgenomes.org/pub/bacteria/release-45/gff3/bacteria_0_collection/escherichia_coli_str_k_12_substr_mg1655/Escherichia_coli_str_k_12_substr_mg1655.ASM584v2.37.gff3.gz
Here is a Fasta file for the genome of E. coli K-12. ftp://ftp.ensemblgenomes.org/pub/bacteria/release-45/fasta/bacteria_0_collection/escherichia_coli_str_k_12_substr_mg1655/dna/Escherichia_coli_str_k_12_substr_mg1655.ASM584v2.dna.chromosome.Chromosome.fa.gz
Write a script called count_up.py
to:
Use the following files to examine codon usage across these two bacteria. Remember that codons are triplets (eg ACA, GAT, …). There are 64 total possible triplets. To count these, know that they are non-overlapping sets of three adjacent bases in the sequences, start with the very first base as the reading frame.
These files are coding sequences of the predicted genes in each of two species.
Write a script called codon_compute.py
which will download and process these files in order to print out