BIO/CSC 295 2009F Bioinformatics : Handouts

Final Questions

A collection of potential questions for the final, submitted by students in the class. At least two of these will be on the final exam. As one student noted Thinking of potential exam questions is helping me study.

What are the primary characteristics that distinguish a strong paper in Bioinformatics from a weaker paper? Give examples to support your argument.

Explain a bioinformatics problem that can be solved using Microarrays. Explain a situation where Microarray data can be inaccurate.

Give a hypothesis and some data. However, make the data not fit the hypothesis, or worse yet, not have a clear interpretation whatsoever. Or we could attempt to align two sequences that have no good alignment. This would be really mean, but also a good way to prepare us for the real lesson that we've been through over and over again: this is a messy, messy subject, and we will have some problems and beliefs that just won't lead us anywhere. Of course, some of us toyed with this on our projects, but I think it would be interesting. You could also ask us to agree or disagree with the given hypothesis (citing reasons) and develop a new one if necessary.

You were able isolate a bacteria species, R. rudolphi, from the north pole.

a. During microscopic studies you noticed that in the presence of fog rudolphi cells begin turn red and aggregate. How would you investigate the cellular changes happening in the presence of fog? What prior information is necessary? What limitations will your analysis face?

b. You identify the individual gene responsible for the redening of rudolphi cells. What biological mechanisms could be controlling the expression rate of the red genes?

c. Your colleague believes the gene was inherited by a horizontal gene transfer event. Given its complete genome, how would you test this hypothesis? You can describe programs you would use, programs you would write, or any other method you might apply.

Given a PAM matrix, some short sequences, and a simple BLAST-like algorithm, explain and demonstrate the process by which the algorithm would unite the sequences into a single longer sequence. Why is this process important?

Given a sequence of amino acids and a portion of the Chou-Fasman table, identify any alpha-helices, beta-strands, or beta-turns which the Chou-Fasman algorithm would find.

How are ORFs used to study evolutionary relatedness? What computational tools are required for this process?

As part of your project, you developed some Python code to aid in the analysis of biological data. Present and explain that code.

Examine one bioinformatics technique we have studied this semester and give specific suggestions for improvements that could refine the technique.

Explain why the different programs/algorithms we used to determine phylogenetic trees came up with different trees for the same batch of species given.

Describe the Chou-Fasman algorithm, specifically the implementation of the alpha helix and beta sheet discovery.

Describe three methods for gene prediction in eukaryotes.

Compare and contrast the initial T. Rex paper and the Neanderthal one with respect to findings and methods.

The prediction of protein structure seems like a difficult problem to solve as there are many multidisciplinary teams all attempting to tackle the problem. Name several biological techniques that have arisen. Then name some computational approaches that have been used to resolve this problem. Analyze their strengths and weaknesses.

Kellis et al. discuss the comparative analysis of closely related genomes to find genes and their regulatory mechanisms. List two other gene finding techniques that were in use before the publication. Explain, in brief, the rationale and methodology in finding a gene with specific insight into ORFs, motifs and gene conservation. Why did they choose different methods for finding genes as opposed the ones they used for finding their regulatory networks? List three results from the paper and explain their significance.

You have just finished sequencing the first completed genome of a newly discovered bacteria species isolated from agricultural soil.

a. What methods would you use to determine the phylogeny of the bacteria using its complete genome sequence? Are there any advantages or disadvantages?

b. You now want to understand the content of the genome. How would you identify and classify genes? What measures would you take to ensure your interpretations are accurate?

c. Your genomic analysis reveal a novel gene coding for an unknown protein. How would you analyze the protein's gene sequence to determine its properties? What are the strengths and weaknesses of your methods?

You are on a board that allocates grant money to bioinformatic research. After an exhausting series of allocation decisions the board has enough cash to fund one of two proposed projects. The first research team promises to increase the accuracy of of protein fold prediction algorithms. The other research team promises to complete a technique that would dramatically reduce the time and cost of genome sequencing, making it available to the general public. Which project do you choose to fund? Justify your choice, making sure to acknowledge the arguments of those who would make the opposite choice.

Explain your final project as you understand it. What was something that didn't go according to your plans or expectations during the course of your project, and how did you adapt to that problem?

Disclaimer: I usually create these pages on the fly, which means that I rarely proofread them and they may contain bad grammar and incorrect details. It also means that I tend to update them regularly (see the history for more details). Feel free to contact me with any suggestions for changes.

This document was generated by Siteweaver on Thu Dec 10 12:55:44 2009.
The source to the document was last modified on Thu Dec 10 12:55:41 2009.
This document may be found at

You may wish to validate this document's HTML ; Valid CSS! ; Creative Commons License

Samuel A. Rebelsky,

Copyright © 2009 Vida Praitis and Samuel A. Rebelsky. This work is licensed under a Creative Commons Attribution-NonCommercial 2.5 License. To view a copy of this license, visit or send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA.