A gentle introduction to string-count distributions in random texts Date: Thursday 25 February 2021 Time: 10am Speaker: Dr Ben O'Neill (ANU) Abstract: An interesting problem that arises in genetic analysis and other contexts is determining the exact or approximate probability distribution of the “string-count” giving the number of occurrences of a fixed character string in a random “text” (i.e., a random vector of symbols). The simplest case of interest is when the symbols in the text are IID categorical random variables. The prima facie simplicity of this problem sometimes attracts the attention of novice analysts, buoyed by the fact that it is easy to compute the probability of a match in the simple case where the length of the text is the same as the length of the string. (And of course, if the text is shorter than the string then the problem is trivial!) However, when the length of the text is larger than the length of the string, the problem becomes complicated, owing to ...
Seminar Notice Board