class: center, middle # Programming for data analyses Jason Stajich https://biodataprog.github.io/programming-intro/ --- # What to expect * This course will cover basics of using UNIX and Python for data analyses * The topics will focus on skills for processing, filtering data with an emphasis on biological, genomic or transcriptomic sequence data. * No prior programming experience required. * You must have access to a computer that allows you to either run the terminal (e.g OSX or Linux) or run programs like SSH to log into UCR Biocluster. * Some amount biology background will be assumed here. When in doubt ask questions. Others will be likely to have same questions! --- # Course Logistics 1. Meeting in University Laboratory Building (ULB) Bioinformatics suite 2. Wednesdays scheduled for 2 hrs, Fridays for 1hr - likely this will end up being 1.5hrs. 3. Expect to have a mix of in-class workshops and lectures. 3. Homework problem sets will be assigned to give you more practice and improve skills. If you don't practice you won't retain or master these skills. 3. Laptops are available during class setting, but you will need to have access to a computer outside of class to complete assignments 4. Accounts to the IIGB Biocluster will be provided during the course. --- # Resources 1. _Bioinformatics Data Skills: Reproducible and Robust Research with Open Source Tools_. Vince Buffalo. 2015 O'Reilly & Associates. Available from [O'Reilly and Associates](http://shop.oreilly.com/product/0636920030157.do), [Amazon](http://amazon.com/Bioinformatics-Data-Skills-Reproducible-Research/dp/1449367372) 2. _Unix and Perl to the Rescue: A Primer_. Keith Bradnam and Ian Korf. [Free PDF](http://korflab.ucdavis.edu/unix_and_Perl/) 3. _Unix and Perl to the rescue!_ Bradnam and Korf. [Amazon](https://www.amazon.com/gp/product/0521169828?tag=keithbradnamc-20) 4. [Rosalind](http://rosalind.info/problems/locations/) - An online platform to learn bioinformatics and programming in Python. 5. Software Carpentry - [https://software-carpentry.org/](https://software-carpentry.org/) and Data Carpentry - [http://www.datacarpentry.org/](http://www.datacarpentry.org/). 6. Berk Ekmekci, Charles E. McAnany, Cameron Mura. An Introduction to Programming for Bioscientists: A Python-Based Primer. PLoS Comp Bio. DOI: [10.1371/journal.pcbi.1004867](https://doi.org/10.1371/journal.pcbi.1004867) --- # Grading 1. Homeworks (5 in total) will be worth 50% of the grade 2. Team project - presentation, written report and code repository worth 50% of grade. 3. You are expected to attend class. If you need be absent do let me know. Material will be presented from the slides, but also in-class explanations. 4. Your work is expected to be your own. Google, StackExchange are useful tools to find solutions, and resuse is part of coding. However, you need to work on understanding the steps in your code so what you turn in should be a reflection of your own efforts. --- # Homework 1. Code should be runnable as turned in. You will deposit your code in your github repository. You can make one private personal repository to deposit and should organize a folder for each homework assignment (e.g. hw1, hw2, hw3). 3. Homework is due BEFORE class on the Wednesdays it is due. The next homework will be posted on Friday at latest. --- # Projects 1. Topics to be selected from a set of choices or the team's choosing (with approval from instructor). Selection of topics will occur by end of October. 2. Project teams will be 2-3 individuals working together. 3. A presentation will be made by each team - last day(s) of class but may have to extend class time to accomodate this. Or present during finals week. 4. A final report with the details will be turned in by the group. 5. The report needs to detail what each person's contribution is to the project.