Now that the hurricane issues are slowly dissipating, I made it back to Brooklyn today, back to the place I spend most of my time… my school… I suppose you could call it home.
I’m doing some work here before turning to my apartment, and I ran into a science teacher who asked me a question:
Let’s say you have a sequence of 3 billion nucleotides. What is the probability that there is a sequence of 20 nucleotides that repeats somewhere in the sequence? You may assume that there are 4 nucleotides (A, C, T, G) and when coming up with the 3 billion nucleotide sequence, they are all equally likely to appear.
I liked the question, but I haaaave to work on my own work and not this problem at this moment. So I thought I’d throw it to you.
A. What’s the answer to this question?
B. How would you explain it to this biology teacher (who knows basic math stuffs)?
and for the bonus…
C. How would you design a lesson that would make a student understand the process and your answer. You can assume that the student understands combinations and permutations.
If I get some work done today, I may think through this problem as a treat. If none of you beat me to the punch. But I’d rather you beat me to the punch.
PS. I might as well throw in the additional question of: “how long does the length of the sequence have to be before you are guaranteed a repetition of a sequence of 20 nucleotides?”
UPDATE: My friend Jason Lang sent me his solution, which is amazingly written and cogent.