Syllabus for GEN220: High Throughput Biological Data Processing

Course Description

This course focuses on computational skills for processing data using programming language Python and UNIX environment. No prior programming experience is required, but some basic computer skills will be useful.

With the advancement of high throughput data generation methods, a major challenge that graduate students in life sciences have to face today is to analyze large amount of biological data. The objective of this course is to provide an opportunity for graduate students with no computer science background to learn the basic skills of handling high throughput biological data. It covers the Linux/Unix environment and the importance of the command line interface; the Python programming language; program design, implementation, and testing; BioPython; Strategies for analyzing genome resequencing, RNASeq, and microbiome sequencing data. Students build hands-on skills by analyzing real high throughput biological data through homework assignments and team projects.

Units: 3

Instructor: Jason Stajich (jason.stajich@ucr.edu)

Time and location: T/T 1-3PM Boyce Hall 1467

Office Hours: Wed 9-10 and best by appointment.

https://biodataprog.github.io/GEN220_2025/

Prerequisites

Resources

None of these texts are required for completion of the course but they will provide a great deal of helpful background and examples that will improve your ability to master UNIX or Programming in Python.

  1. Bioinformatics Data Skills: Reproducible and Robust Research with Open Source Tools. Vince Buffalo. 2015 O’Reilly & Associates. Available from O’Reilly and Associates, Amazon Free to read on UCR network (or use VPN) - Safari link.

  2. Unix and Perl to the Rescue: A Primer. Keith Bradnam and Ian Korf. Unix and Perl Primer for Biologists

  3. Unix and Perl to the rescue! Bradnam and Korf. Amazon

  4. Rosalind - An online platform to learn bioinformatics and programming in Python.

  5. Software Carpentry - https://software-carpentry.org/ and Data Carpentry - http://www.datacarpentry.org/.

  6. Berk Ekmekci, Charles E. McAnany, Cameron Mura. An Introduction to Programming for Bioscientists: A Python-Based Primer. PLoS Comp Bio. DOI: 10.1371/journal.pcbi.1004867

  7. Ken Youens-Clark. Tiny Python Projects. https://www.manning.com/books/tiny-python-projects

  8. Pat Schloss’s Riffomonas Code Club has great videos and links to programming and microbiome analyses.

Grading

Homework

Operating systems:

Projects

Schedule

Date Day Lecture Topic Notes
Sept-25 Th Course Intro / UNIX I: Cmdline, GitHub
Sept-30 Tu UNIX II: Biocluster HPCC, Running programs Homework 0 Due
Oct-2 Th UNIX III: Tools for data processing
Oct-7 Tu Python I: Variables, running, cmdline, strings, math Homework 1 Due
Oct-9 Th Python II - Logic, loops, lists, iterator; I/O reading/writing files
Oct-14 Tu Python III - Pandas and Dictionaries
Oct-16 Th Python IV - Functions, modules, BioPython Homework 2 Due
Oct-21 Tu Python V - Regular Expressions
Oct-23 Th Python Data / Bioinformatics I. Class Project Info
Oct-28 Tu Alignment and Bioinformatics Algorithms; BLAST cmdline & automation Homework 3 Due
Oct-30 Th Cluster HPC, NextFlow. Class Project Outline/Abstract due
Nov-4 Tu TBD (Stajich away)
Nov-6 Th Bioinformatics II - RNASeq analyses
Nov-11 Tu No Class - holiday
Nov-13 Th Bioinformatics III - SNPs and variants Homework 4 Due
Nov-18 Tu Bioinformatics IV - Protein Sequence analyses (HMMER, InterPro, SignalP)
Nov-20 Th Bioinformatics V - Orthology, Phylogenetics, pipeline
Nov-25 Tu Bioinformatics VI - AlphaFold Homework 5 Due
Nov-27 Th ** NO CLASS **
Dec-2 Tu Genome and Statistical Data visualizations Extra Topics
Dec-4 Th Class Presentations
Dec-10 Wed Final papers due

*note these dates and topics may changes if illness or conflict arises. Class will also vote to emphasize special topics towards the end of quarter.