It turns out that at the very core randomness boils down to things even more basic than a probability distribution. You need to begin by considering set theory. It turns out that only certain kinds of sets can be interpreted as events with probabilities. In the simple case you can consider a finite number of discrete events, but from your use of the term "distribution" I'm sure you're aware that you can also consider probabilities for things which are infinite in number....or even uncountable. You'd be amazed at some of the ways you can mathematically describe observations and it turns out that some of those ways cannot be assigned probability distributions (or at least not ones which make any sense). A lot of these things are rather neatly resolved when you represent your observations with the real number system, but I don't think I need to explain to you why eliminating some of the messier aspects of real numbers (like pi) by using other structures might be appealing. In fact it turns out that in order to have a probability space your observations need to be described by what is known in mathematical circles as a "sigma field" (for a full technical discussion I refer you to the book: "Probability and Measure" by Patrick Billingsley). However, the most basic heuristic of probability theory is the presupposition that you can represent the uncertainty of every observation on a scale between 0 and 1. This fails from the get go. You can't for some kinds of observations.

Returning to the usual conventions of scientists and engineers by representing observations with real numbers, the situation can be resolved by making sure that any distribution you work with follows a number of simple rules. No doubt you are familiar with the fact that not just any function can be a probability distribution. It needs to satisfy certain properties. However, even at this level we need to start wrestling with more heuristics. For example, one rule says that the probability of all events must be equal to one. This satisfies the intuitive notion that we know something must happen (ex when I throw a ball it will hit the ground at some point). The heuristic is that since the probability scale is supposed to correspond to sureness, these kinds of events should correspond to a probability of 1. However, there is a problem. It turns out that within this framework an event can have probability 1 yet still not be guaranteed (see the statistical term

almost surely). This obviously defies reason. Another reasonable heuristic would be that those and only those events which are guaranteed can have probability 1, but this is not the heuristic used in statistical theory and we are again forced to adopt an obviously false premise for the sake of a more extensive mathematical theory.

Now is when things really start to get hairy. Thus far we have used a number of heuristics to defined observable events, each of which has some measure of sureness on a scale of 0 to 1 (however imperfect) known as probability. But what is the true concrete empirical meaning of probability so defined? How do we interpret probability? Statisticians are generally divided into two camps: Bayesians and Frequentists. Frequentists say that if you perform a random observation over and over and divide the number of affirmative results by the total number of observations performed that this quotient will get closer and closer a number known as the probability and that every probability corresponds to such a series of observations (at least in theory). Bayesians say, in a nutshell, that probability is any notion of certainty that you'd like it to be and that the

relative frequency notion of Frequentists is not necessary. Ironically enough, it turns out that no matter which road you choose to take, many results are the same or nearly the same, but from a fundamental standpoint of basic reason they both have their flaws. The bayesian approach is essentially untestable and unfalsifiable because there is no way to measure it. The frequentist approach, on the other hand, while measurable, is not guaranteed to exist for a given random process (ie Who says that the quotient will get closer to anything? Perhaps it will wander around several numbers). So we have a few more arbitrary heuristics. Do we arbitrarily assume that every sequence of random observations as described above must converge to a single number or do we adopt an interpretation which is loose, unmeasurable, unscientific, and unprovable. Just as the biases described by Yudkowsky, neither is 100% compatible with reason as we know it....but it gets worse.

Now you actually want to start using this statistical theory for practical applications. You have your arsenal of moments, scatterplots, histograms, etc. But did you know that some probability distributions don't even have certain moments? For example, the heavy tailed

Cauchy distribution has no moments. They're all nonconvergent/undefined (mean, variance, etc). In fact, if I have my history correct, Cauchy specifically thought up this distribution just to be a pain in the ass to the rest of the statistical community. Although many of the problems arising from these kinds of distributions can be resolved, I assure you that reality is a much bigger and more effective pain in the ass. Despite what you may have heard from your statistics teacher, the central limit theorem does not guarantee that the only distribution you'll ever need is the Normal distribution. Like statistical theory's version of the "Black Swan bias", real life distributions often don't work with the kinds of simple statistics we'd like to compute. They behave in surprising ways.

In similar vain, did you ever notice that outside of the gambling parlor very few random phenomenon can be isolated into a few controlled variables? Even

traditional parlor games can be affected by the problems above, but the truth is that in most real world applications there are simply too many important variables to write down the equations. It's easy to see when talking about cyclists, age, experience, self selection, etc, all make it extremely difficult to attribute a cause to an effect and details are often sacrificed for the ability to apply traditional, well established, and oversimplified equations. The general heuristic used by statisticians is to guess a probability model, take some measurements, and draw conclusions under the assumption that the model is correct. However, the complexity of the real world guarantees that the model is often incorrect and it turns out that there exist theories built on different heuristics which, even if their heuristics are not 100% correct, can be shown to give better results at these levels of complexity (see

computational learning theory or, in particular, the book "Statistical Learning Theory" by Vladimir Vapnik which explains many of these concepts in a statistical context). Remember that, given the right circumstances, heuristics like "availability", "hindsight", "conjunction", "confirmation", "anchoring", etc can each be good predictors of the correct answer just as we all know the heuristics of statistical theory can. However, as it is in much of science, the key is not necessarily in choosing the heuristic which is the most justifiable, but the one which just so happens to work (or at least that's how I earn my lunch money).