
2020 Edition of the Class

Homework 1

Homework can be submitted via the github link which will create a repository for you with basic template of files you can edit to solve the homework. See the the table for homework submission links which will help you create a github repository in the class

  1. Write a script called which does the following.
  2. Write a script called which summarizes the total length of exons in the file data/rice_random_exons.bed. These data are in the BED file format. The columns are “Chromosome”, “Start position”, “Stop position”. The length of a feature (or exon in this case) is computed by doing the computation: STOP - START
    • read in the file
    • use a loop structure to read each line
    • add up the length of each exon by summing this into a variable
    • Print out the total length of exon features at the end.
    • You do not need to save this for each chromosome, just print out the total length for this example - however if this is too easy for you, go ahead and make a more sophisticated report which presents, per chromosome, the total length of exons as well as the total number of exons, and the average length of exons.
  3. Write a script called to calculate the number of genes that are on the positive (+) and negative (-) strand in the file. This file format is called GFF - the strand of the gene is encoded in 7th column.