Welcome to Bojan Zunar’s GitHub Pages
These pages show tasks completed throughout Coursera's Data Science and Genomic Data Science Specializations.
Coursera provides online courses from top universities and, to students that complete a course, offers a Verified certificate. Through ID verification, this document guarantees the identity of the person which completed the required assignments, thus ensuring academic integrity. Yet, while finishing the course highlights commitment, it does not illustrate what the students learned, i.e. what they can do.
Most Coursera's courses include Peer Reviewed Assignments, tasks which cannot be completed by merely "clicking-to-victory" and by answering interactive questions, but by implementing learned concepts to solve and document non-trivial problems reproducibly. This page lists few such reports to demonstrate the skills I acquired throughout the Specializations.
‘Genomic Data Science’ Specialization (John Hopkins)
- Bioconductor for Genomic Data Science (certificate)
-
Assignment: H3K4me3 vs. Transcription
The airway dataset contains more than 64k features. How many of these features overlaps with transcripts on the autosomes? A feature has to overlap the actual transcript, not the intron of a transcript.
The expression measures of the airway dataset are the number of reads mapping to each feature. How many reads map to features which overlaps these transcripts?
We should be able to very roughly divide these transcripts into expressed and non expressed transcript. Expressed transcripts should be marked by H3K4me3 at their promoter. What is the median number of counts per feature containing a H3K4me peak in their promoter? Compare this to the median number of counts for features without a H3K4me3 peak.
-
- Python for Genomic Data Science (certificate)
-
Assignment: Finding ORFs and Repeats
Write a Python program that takes as input a file containing DNA sequences in multi-FASTA format, and computes the answers to the following questions.
How many records are in the file? What are the lengths of the sequences in the file? What is the longest sequence and what is the shortest sequence? Is there more than one longest or shortest sequence? What are their identifiers?
What is the length of the longest ORF in the file? What is the identifier of the sequence containing the longest ORF? For a given sequence identifier, what is the longest ORF contained in the sequence represented by that identifier? What is the starting position of the longest ORF in the sequence that contains it?
Identify all repeats of length n in all sequences in the FASTA file. Determine how many times each repeat occurs in the file, and which is the most frequent repeat of a given length.
-
‘Data Science’ Specialization (John Hopkins) (certificate)
- Data Science Capstone (certificate)
-
Assignment: Building a Predictive Text Model
Around the world, people are spending an increasing amount of time on their mobile devices for email, social networking, banking and a whole range of other activities. But typing on mobile devices can be a serious pain. SwiftKey, our corporate partner in this capstone, builds a smart keyboard that makes it easier for people to type on their mobile devices. One cornerstone of their smart keyboard is predictive text models. When someone types:
I went to the
the keyboard presents three options for what the next word might be. For example, the three words might be gym, store, restaurant. In this capstone you will work on understanding and building predictive text models like those used by SwiftKey.
-
- Developing Data Products (certificate)
-
Assignment: Shiny Nobel App
Create a Shiny app with supporting documentation. Think of documentation as whatever a user will need to start using your application. Deploy the application on RStudio's Shiny server. Share your server.R and ui.R code on GitHub.
Then, use Slidify or RStudio Presenter to prepare a reproducible pitch about the application with an html5 slide deck. You get 5 slides (inclusive of the title slide) to pitch a your app. The presentation must contain some embedded R code that gets run when slidifying the document.
-
- Practical Machine Learning (certificate)
-
Assignment: Weight Lifting
Devices like Jawbone Up, Nike FuelBand, and Fitbit collect a large amount of data about personal activity. They are a part of the quantified self movement – a group of enthusiasts who take measurements to improve their health, to find patterns in their behavior, or because they are tech geeks. People regularly quantify how much of a particular activity they do, but rarely how well they do it. In this project, you use data from accelerometers on the belt, forearm, arm, and dumbell of 6 participants (Weight Lifting Exercises Dataset). They were asked to perform barbell lifts correctly and incorrectly in 5 different ways.
The goal is to predict the manner in which they did the exercise. This is the "classe" variable in the training set. You may predict with any of the other variables. Describe how you built your model, how you used cross validation, what you think the expected out of sample error is, and why you made the choices you did. Use your machine learning algorithm to predict 20 different test cases.
-
‘Python’ Specialization (University of Michigan) (certificate)
Thank you for visiting! :)