class: center, middle # Dictionaries, Functions --- #Lists and Array Reminders To create a list of items, use the `[ ]` ```python genes = ['SOD1','CDC11','YFG1'] print(genes) print(genes[1]) print(genes[1:]) # everything after slot 1 (incl 1) print(genes[:1]) # everything before slot 1 print(len(genes)) ``` ```shell ['SOD1', 'CDC11', 'YFG1'] CDC11 ['CDC11', 'YFG1'] ['SOD1'] 3 ``` --- # Some built-in list functions * [range()](https://docs.python.org/3/library/functions.html#func-range) - _range(start, stop[, step])_ ```python >>> range(5,10,1) [5, 6, 7, 8, 9] >>> range(5,-1,-1) [5, 4, 3, 2, 1, 0] ``` * [map()](https://docs.python.org/3/library/functions.html#map) - lets you update a list with a function ```python l = [ 'a', 100, 12/3.3 ] # ",".join(l) # this throws an error ";".join(map(str,l))) # have to cast numbers as string print( ";".join(map(str,l))) l = [1,2,3,4] squares = map(lambda x: x**2,l) print(squares) ``` ```shell ['a', 100, 3.6363636363636367] [1, 2, 3, 4] [1, 4, 9, 16] ``` --- #Reverse a list * [reversed()](https://docs.python.org/3/library/functions.html#reversed) - iterate in reverse order of an array/string ```python l = ['zzz','yyy','a'] print(list(reversed(l))) for n in reversed(l): print(n) ``` ```shell ['a', 'yyy', 'zzz'] a yyy zzz ``` --- #More array functions See more details here https://docs.python.org/3/tutorial/datastructures.html * `list.append(x)` - Add an item to the end of the list; * `list.pop([i])` - Remove the item at the given position in the * `list.extend(L)` - Extend the list by appending all the items in the given list; * `list.insert(i, x)` - Insert an item at a given position. The first argument is the index of the element before which to insert, so a.insert(0, x) inserts at the front of the list, and a.insert(len(a), x) is equivalent to a.append(x). * `list.remove(x)` - Remove the first item from the list whose value is x. It is an error if there is no such item. list, and return it. If no index is specified, a.pop() removes and returns the last item in the list. * `list.index(x)` - Return the index in the list of the first item whose value is x. It is an error if there is no such item. * `list.count(x)` - Return the number of times x appears in the list. * `list.sort(cmp=None, key=None, reverse=False)` - Sort the items of the list in place * `list.reverse()` - Reverse the order of the items in the list --- #Sorting Lists The `LIST.sort()` function on a list or the `sorted(LIST)` https://docs.python.org/3/howto/sorting.html ```python #!/usr/bin/env python3 genes = ['SOD1','CDC11','YFG1'] print(genes) sort_genes = sorted(genes) print(sort_genes) numbers = [141, 7, 90, 3, 13] print("unsorted",numbers) numbers.sort() print("sorted",numbers) print("reversed",sorted(numbers,reverse=True)) alphanumbers = ['141', '7', '90', '3', '13'] print("Alphanumeric strings",alphanumbers) print("Alpha sorted numbers",sorted(alphanumbers)) print("Numberic sorted",sorted(alphanumbers,key=int)) ``` --- #Dates and times See https://docs.python.org/3/library/datetime.html ```python from datetime import datetime dates = ['3-Jan-2016', '4-Mar-2015', '2-Aug-1999', '1-May-2000'] print(dates) dates.sort() print(dates) #newdates = [ datetime.strptime(d,"%d-%b-%Y") for d in dates ] newdates = [] for str in dates: newdates.append(datetime.strptime(str,'%d-%b-%Y')) print(newdates) newdates.sort() print(newdates) for n in newdates: print(datetime.strftime(n,"%Y-%b-%d")," OR ", datetime.strftime(n,"%Y-%m-%d"), " OR ", datetime.strftime(n,"%A, %b %d, %Y"), " OR ", datetime.strftime(n,"%c") ) ``` --- # Iterate on Strings/Arrays in the same way ```python lst = [ 'BRCA1','SOD1','PTEN'] for gene in sorted(lst): print("gene is",gene) DNA='AAAACCGTAG' for let in DNA: print(let) for let in reversed(DNA): print(let) ``` ```text BRCA1 PTEN SOD1 A A A ... G A T ... ``` --- #Dictionaries Initialize a dictionary, Dictionaries are key and value pairs ```python things = {} # an empty dictionary listofstuff = [] # an empty array print(things) things = {'diane': 10, 'jack': 13} print(things) things['diane'] things['billy'] = 15 # assign a new key/value pair # if you have a list of pairs of things strangerthings = dict([('Will', 12), ('Jim', 44), ('Joyce', 45), ('Eleven',11),('Lucas',10)]) strangerthings['Eleven'] ``` ```shell {} {'diane': 10, 'jack': 13} 10 11 ``` --- #Iterate through a dictionary Using the for loop and the items() function ```python for key,value in strangerthings.items(): print("key is", key,"value is",value) ``` ```text key is Will value is 12 key is Jim value is 44 key is Joyce value is 45 key is Eleven value is 11 key is Lucas value is 10 ``` --- #Functions These are blocks of code that can be called repeatedly. Simplify tool development. Might have subroutine to read a sequence file. Or compute a statistic. Uses indentation just like loops. ```python def ROUTINENAME(ARGUMENTS): CODE HERE ``` --- #Read Fasta code part 1 https://drj11.wordpress.com/2010/02/22/python-getting-fasta-with-itertools-groupby/ See https://github.com/biodataprog/code_templates/blob/master/Lists_Dictionaries/fasta_parser.py ```python import itertools import sys import re # based on post here # https://drj11.wordpress.com/2010/02/22/python-getting-fasta-with-itertools-groupby/ # define what a header looks like in FASTA format def isheader(line): return line[0] == '>' ``` --- # Read Fasta code part 2 ```python # this function reads in fasta file and returns pairs of data # where the first item is the ID and the second is the sequence # it isn't that efficient as it reads it all into memory # but this is good enough for our project def aspairs(f): seq_id = '' sequence = '' for header,group in itertools.groupby(f, isheader): if header: line = next(group) seq_id = line[1:].split()[0] else: sequence = ''.join(line.strip() for line in group) yield seq_id, sequence ``` --- # Read Fasta example code Part 3 ```python # here is my program # get the filename from the cmdline filename = sys.argv[1] with open(filename,"r") as f: seqs = dict(aspairs(f)) # iterate through the sequences n=0 for k,v in seqs.items(): print( "id is ",k,"seq is",v) n += 1 print(n,"sequences") ``` ```text id is Q0142 seq is MTGSGTPPSREVNTYYMTMTMTMTMIMIMTMTMNIHFNNNNNNNINMNSRRMYLFIL*M id is Q0143 seq is MGLWISFGTPPSYTYLLIMNHKLLLINNNNLTEVHTYFNININIDKMYIH* ``` --- #Dictionaries For Unique Lists Dictionaries are useful ways to generate a unqiue list ```python dna = 'AAGAGAGGATACA' bases = {'A':0, 'C':0, 'G':0, 'T':0 } for l in dna: bases[l] += 1 print(bases) ``` ```shell {'A': 7, 'C': 1, 'G': 4, 'T': 1} ``` --- #Class problem Write a script to translate DNA into Protein * Use dictionary to lookup codons in a FASTA file and convert CDS to amino acid