A real life example.

Recent Posts
Recent Comments
Archives
Categories
Meta
A real life example.
I recently gave a lab session on using dot matrix plots to detect interesting sequence patterns using Dotlet to some students in genetics. A group showed an interesting palindrome from Tetrahymena pigmentosa rDNA, about 250bp long. I did a quick check and found out some interesting things about this organism (such as it is a protozoa, which I didn’t know). It appears to be a wellstudied model, attracting attention of people working the J.Craig Venter Institute. Interestingly, one species actually comes from Malacca (T.malaccensis). In 1991, Yasuda and Yao elucidated a mechanism for the production of long palindromes in Tetrahymena that depends on the presence of short inverted repeats. Here’s a diagram showing the result of a dot plot for the Tetrahymena thermophila macronuclear extrachromosomal rRNA gene (GenBank ID: M11155). The sequence is 1935 bases long, and the palindromic sequence is located at position 1200. The complete palindrome is 42 bases long. Note the presence of a repeat upstream.
It is rewarding to be stimulated by students in this way.
Reference: Yasuda, L.F. & Yao, MC. (1991). Short inverted repeats at a free end signal large palindromic DNA formation in Tetrahymena. Cell, 67:505516.
Regardless of published statistics (which don’t adjust for unreported crimes) that reportedly show that it is getting safer in the streets, events at the ground show otherwise. Neighbourhoods rush to put up their own community barricades, a fresh undergrad is killed by snatch thefts while going to work, a jogger is murdered in broad daylight, a highranking government official is shot and killed while on the way to his office – these things are rarely unheard of when I was growing up, but now pepper the front page of newspapers almost on a daily basis.
Contrary to common belief, putting more meninblue on the streets will not solve the problem. We can imagine the problem of crime as a pool of water accumulating in a basin. At any time, the amount of water (criminals) in the basin is determined by the rate of water flowing in from the tap (source) and how much get removed via the drain hole (sink). You will agree, that enlarging the drain hole without a concomitant reduction in the rate of flowin cannot reduce the water level in the basin. The trick is to prevent people from entering into a life of crime at the first place since the odds of reforming a criminal is poor.
The metaphorical tap can be fixed if there are policies of wealth creation that are more equitable, rather than concentrated in the hands of old elites, as it frequently happens in this part of the world. More importantly, don’t forget the teachers, who play crucial roles in preventing a student from entering a life of delinquency. They are actually the frontline fighters against crime, not the meninblue, a fact that is sadly overlooked by the policy makers. What goes on today in classrooms can hardly be said to be inspiring. Studying for the sake of examinations has become the norm, and some classrooms are completely devoid of quality instructions. When young minds are left idle for long periods of time, they become breeding ground for all kinds of nasty social experiments, hurtful to themselves and people around them.
The teaching profession no longer receives the cream of society as its apprentices, as it used to be. This is a consequence of the capitalistic economy that rules the world today. As a nation develops, the expanding financial sector (which is very lucrative) sucks up the bright young people, and impoverishes the talent pool in the sphere of public services. This is also true of the quality of research scientists in universities. Of course, there will always be a small group of enthusiastic people who dedicate themselves to the latter, but their numbers cannot hold the ground, and wholesale degeneration of the average quality of these sectors is inevitable. Ironically, it may be in countries that are just about to develop that one may find the best teachers.
The hottest news in the campus now appears to surround the latest findings from UM’s Centre for Democracy and Elections (UMCEDEL), which put some quarters ill at ease. You can read about the summary of the findings here. Many news portals quote the UMCEDEL study as reporting that 46% of young voters favour DSAI as compared to 39% for DSNR. Hillariously, this study is immediately attacked for being “flawed”, “insufficiently sampled”, etc, nevermind that the centre has previously published very favourable results for the grieving side.
These responses are reactionary and unwise. Fortunately, one can more or less decide whether to believe the findings by doing some research, with the help of some statistics. The study employed 20 enumerators and they sampled 1407 respondents across major ethnic groups randomly (and presumably, reflecting the ethic composition. However, the slides given in UMCEDEL do not explicitly mention this). The method used was face to face interview, using a structured questionaire. This method seems reasonable, and for now let’s suppose that the experimental design is OK.
Let us assume that the question gave three options (two options for each of the politicans, one for “others”). The standard error (SE) computed under the most conservative assumption (equal support for both politicians), is the square root of 1/4n, where n=1407. The margin of error at 95% confidence, is given by about two times of the SE. A simple calculation shows this to be about 2.6% (let’s take 3%).
However, a look at the UMCEDEL slides suggests that they phrased the question in a more complex way.
This slide shows tabulated results from the UMCEDEL study, from which we can infer that a respondent was presented with the questions:
1) “Is DSAI your favourite leader?” (Yes,No,Unsure)
2) “Is DSNR your favorite leader?” (Yes,No,Unsure)
whereupon, 46% of them answered “Yes” in Question 1, and 39% of them answered “Yes” in Question 2.
The interpretation of this result is not straightforward. There are actually 9 possible combinations here, and it is not clear how to make sense of some of them (e.g. “Yes”, “Yes”). If I were to set the question, I would just ask them who was their favourite leader, and provide three options, the third being “someone else”. One way to analysis the result is to perform the chisquared test to test the null hypothesis of equal support, e.g. the ratio of support is 1:1 for DSAI:DSNR.
Anyway, if we collapse the “No” and “Unsure” responses in both questions, then the 95% confidence interval of the proportion of young voters favouring DSAI as their leader is 46%, give or take 53% , which gives a low of 4143% and a high of 5149%. For the DSNR case, we have a low of 3436% and a high of 4442%. Note that the confidence intervals overlap, which may indicate that support for the two candidates is not statistically significant. However it is not clear to me how this can be tested – a two sample Ztest for binomial proportions seems tenuous, as both answers are obtained from the same person and we don’t have independence of samples.
P/S: After correcting for the calculation mistake, it seems that the 95% confidence intervals don’t overlap!
After going through this thinking, does the study really tell us much? It is unclear how the “random” sampling was done (hopefully, it was not haphazard sampling, which is frequently taken to be equal to random sampling!). Were the respondents sampled primarily in towns? hamlets? At what time of the day were the studies conducted? Did the respondents trust the professionalism of the interviewers (i.e. trying to guess what the interviewer’s political inclination and go along with it)? What was the nonresponse rate?
Is there a need to get upset with the findings of the study?
while(TRUE){
Learn new material ;
Teach ;
if(admin work burden > tolerable limit OR distractions == too many) break;
else Update material;
}
Most of us remember Albert Einstein as a giant in physics, but from a collection of his writings, he was also very much interested in education and cultural matters. Below is his talk to a group of children in the 1930s – no longwinded moralising sermon (no one remembers long speeches), just a short, powerful reminder that if we ever discover anything new it is because generations of people before us had set up the foundation for it to happen. The one line closing argument is compelling!
My dear children: I rejoice to see you before me today, happy youth of a sunny and fortunate land. Bear in mind that the wonderful things you learn in your schools are the work of many generations, produced by enthusiastic effort and infinite labour in every country of the world. All this is put into your hands as your inheritance in order that you may receive it, honor it, add to it, and one day faithfully hand it on to your children. Thus do we mortals achieve immortality in the permanent things which we create in common. If you always keep that in mind you will find a meaning in life and work and acquire the right attitude toward other nations and ages. 
Comments from JL stirred memories of my own experience with selflearning. I remember trying to learn how to model biological phenomenon by picking up John Maynard Smith’s Mathematical Ideas in Biology from the library many years ago. One of the problems that I had a really tough time understanding was the following problem in probabilistic thinking:
Of three prisoners, Matthew, Mark and Luke, two are to be executed, but Matthew does not know which. He therefore asks the jailer ‘Since either Mark or Luke are certainly going to be executed, you will give me no information about my own chances if you give me the name of one man, either Mark or Luke, who is going to be executed’. Accepting this argument, the jailer truthfully replied ‘Mark will be executed’. Thereupon, Matthew felt happier, because before the jailer replied his own chances of execution were 2/3, but afterwards there were only two people, himself and Luke, who could be the one not to be executed, and so his chance of execution is only 1/2. Is Matthew right to feel happier?
This problem is also known as the “Serbelloni problem” and according to John Maynard Smith, it “nearly wrecked a conference in theoretical biology in 1966”. It seems that there is nothing wrong with Matthew’s intuition – one could reason that given the information that Mark would be executed, only two possibilities remain: (Mark, Matthew) or (Mark, Luke) would be executed. Since Matthew is in one of the two possible outcomes, his chance of dying is surely 1/2.
After checking the hint at the back of the book, I was puzzled that that was not the case. Maynard Smith merely said that an application of Bayes’ Theorem or “common sense” should solve the problem. I had some ideas about Bayes’ Theorem at that time, so following the hint, one arrives at the solution of 2/3, which is the same as the probability of dying prior to receiving any information from the guard. To be precise, let I be the information given by the guard and H be the hypothesis that Matthew will die. What is required is the conditional probability P(HI). According to Bayes’ Theorem, this can be related to P(IH) as follows:
P(HI) = P(IH)P(H) / P(I)
Let’s consider P(IH). If Matthew is known to die, then the outcomes must be either (Matthew, Mark) or (Matthew, Luke). Since the guard cannot explicitly say whether Matthew will die, he either says Mark or Luke will die, with probability 1/2. P(H) is just 2/3 since it is equal to P(Matthew, Mark) + P(Matthew, Luke), both of which occurs with probability 1/3.
How about P(I)? We could express P(I) as P(I,H)+P(I,H’), where H’ is the complement of H, and further rewrite it as P(IH)P(H) + P(IH’)P(H’). The first term is the same as the numerator. For the second term, P(H’) = 1/3, and P(IH’) = 1/2, since if we know that Matthew does not die, then the only outcome is (Mark, Luke), and the guard reveals that Mark dies with probability 1/2. Putting everything together, we get
P(HI) = (1/2)(2/3) / { (1/2)(2/3) + (1/2)(1/3)} = 2/3.
Bayes’ Theorem works but may lead to mechanical application without stimulating more intuition about conditional thinking. With more experience, P(HI) could be computed as follows. Given information from the guard, only two outcomes are possible: (Mark, Matthew), (Mark, Luke). The total probability shrinks from 1 to 2/3. Now between these two outcomes, the guard can only reveal Mark or Luke. Consider the case of the guard revealing Mark. Since there are two Marks to one Luke, the guard reveals Mark with probability 2/3, after which, only Matthew or Luke could die, each with probability 1/2. Now consider the other case, where the guard reveals Luke. This occurs with probability 1/3, after which, only Mark or Matthew could die. But there are two Marks to one Matthew, so the probability of Matthew dying is 1/3. Putting these together we have {(2/3)(1/2) + (1/3)(1/3)} / (2/3) = 2/3.
Things may be clearer with the help of the figure below. Maybe this is the “common sense” that Maynard Smith talked about.
Gaston Gonnet (founder of the MAPLE computer algebra system) in an interview with Thomas Haigh in 2005 (SIAM History of Numerical Analysis and Scientific Computing Project; see here for the complete interview). I have fond memories of MAPLE as as a tremendously helpful tool for checking operations involving special functions, something that I dabbled in quite some time ago.
Now that I have worked several years in bioinformatics, the work in bioinformatics can be summarized as: you have to be good at algorithms, and you have to be very good at probability and statistics. You are not working with completely deterministic objects. You are not working with mathematical formulas that go only one, you are not working with problems which have a unique and precise answer. You are working with nature that has gone into a process of evolution in a relatively random way. This randomness percolates everything that you do because this randomness is not only in nature, but in all the data that you acquire. You acquire data, and the data is not exact. It’s subject to error because of the nature of the data or the nature of the acquisition of the data.
What I tell all my students and my grad students when they come is to make sure that their background in algorithms and their background in probability and statistics are really strong. If they have a good background in algorithms and statistics, quite a bit of scientific computation helps. It helps if someone knows how to integrate a system of differential equations or finding a minimum in an efficient way. Those kinds of basic scientific computation abilities are also very helpful. But if you are good at those two and possibly that third one, you are going to be good a bioinformatician. There is no two ways about it. But you have to understand algorithms and statistics, and that’s maybe the crucial point. 
Brian Martin has this to say about the sociology of the academia in Australia after funding to universities was drastically reduced in Australia back in the 70s.
…Some positions become vacant through retirements and resignations. Many of these are not filled. But some positions are filled, and even some new ones created. The competition for these positions is now incredibly intense. Indeed, a sizable fraction of tenured academics would be very lucky to obtain their own positions should they be openly advertised. There are some tutors for example, struggling one year at a time to keep their positions, whose teaching load and research productivity shames tenured academics on twice the salary. Universities have never been meritocracies, but the squeeze has made the resemblance even more remote…
Social science research didn’t mean anything to me until I discovered Martin‘s writings. 🙂 It’s also good to know that he came from a science background (physics)!