### Introduction

Over in the Udacity course I’m working my way through (*AWS Machine Learning Scholarship Program*), I came across my first batch of probability distribution problems. I thought that it was a good jumping off point for the blog in terms of working through their solutions. The first problem below I found to be very difficult to find the answer for. Any of my readers who are seasoned Data Analysts or statisticians will have a smug chuckle when they read it, and I don’t blame them, because the answer is pretty obvious. I will, however, ask you to bear in mind that it is about 20 years since I worked consistently with any type of statistics. I’ve dipped in now and again since then, but it really is a use it or lose it skill. I haven’t been using it, so…

The first batch of problems I came across were in a quiz and were in a multiple choice answer format. I’m going to list out the questions and possible answers first and then walk through the terminology involved and how to solve the problems. You smug chucklers see if you can spot why the answer to question 1 should have been instantaneous.

### TL;DR

This post was written to walk through the solutions to the Udacity course, *AWS Machine Learning Scholarship Program*, lesson 4-9 “A Gaussian Class”. If you came here for the walk through skip to the next section. If you came here just for the answers, they are:

Question 1. In a normal distribution, the probability that a random selection from a data set is exactly equal to a single number is 0. Therefore, the probability that a man weighs exactly 185 pounds is 0.

Question 2. The probability that a random selection will be between 120 pounds and 155 pounds is 0.19.

Question 3. The answer to this one is 0.12.

### The Questions

**Q1**: Assume the average weight of an American adult male is 180 pounds with a standard deviation of 34 pounds. The distribution of weights follows a normal distribution. What is the probability that a man weighs exactly 185 pounds?
**Answers**:

- 0.56
- 0
- 0.44
- 0.059

**Q2**: Like in the previous question, assume the average weight of an American adult male is 180 pounds with a standard deviation of 34 pounds. The distribution of weights follows a normal distribution. What is the probability that a man weighs somewhere between 120 and 155 pounds?
**Answers**:

- 0
- 0.23
- 0.27
- 0.19

**Q3**: Now consider a Binomial distribution. Assume that 15% of the population is allergic to cats. If you randomly select 60 people for a medical trial, what is the probability that 7 of those people are allergic to cats?
**Answers**:

- 0.01
- 0.14
- 0
- 0.05
- 0.12

### Terms (the boring bit, feel free to skip)

First of all the important terms, in question one we’ll pick out the terms *average*, *standard deviation*, *normal distribution* and *probability*. For question two there are no additional terms to choose, and for question three well pick out *binomial distribution*.

You will find all of these terms and more defined over in the glossary. But I’ll cover the ones above in this post to keep things in the one place.

The **average** is one of the first statistical concepts anyone comes across. The average value for a group of numbers is the sum of the numbers in the group divided by the amount of numbers that were in the group. It is a statistical way to identify the centre value of a data set. It is also regularly called the mean value. The mean value is usually represented by the Greek letter mu: \mu. Where \mu is seen below it stands for the mean\average.

**Standard Deviation** measures the spread of the numbers in a data set from the mean value of the data set. The standard deviation is usually represented by the Greek letter: \sigma. The same is true for this blog.

A **normal distribution** is a probability distribution which has a set of characteristics. The most obvious is that when graphed with the probability density function it will take the shape of a symmetrical bell shaped curve. The mean, median and mode will all be equivalent in value and will be represented by the highest point on the curve. Some other characteristics are that; approximately 68% of the values lie within 1 standard deviation from the mean, approximately 95% of the values lie within 2 standard deviations from the mean, and approximately 99.7% of the values lie within 3 standard deviations from the mean. See diagram below.

**Probability** is the likelihood of an event happening. It assumes a value between 0 and 1, both inclusive, where 0 represents impossibility and 1 represents certainty.

A **binomial distribution** is the probability of the success or failure outcome in an experiment repeated multiple times (source). In order for a distribution to be binomial it must satisfy 4 criteria which are; 1. each experiment can have only a success or failure outcome, 2. each experiment must be independent, 3. there are a fixed number of experiments, and 4. the probability of success in each experiment must be the same. Similar to the normal distribution when a binomial distribution is graphed using the probability mass function it will take a symmetrical bell shaped curve.

### Solutions

I’ve stated the problems, and defined (maybe not perfectly) some of the terms involved. Now lets get to solutions so I can move on with the Udacity course that inspired this post.

The first two relate to solving problems relating to normal distributions. These problems can be graphed by shading in an area on the probability density function graph relating to the question that you are trying to answer. For example taking the normal distribution graph above and we wanted to get the probability that the value we are looking for falls in the set of values that are less than or equal to -1\sigma (so, 1 standard deviation less than the mean). In order to graph that you would shade in the left hand side of the curve from where it starts up to the value of -1\sigma. I state that because another way to think the solution to these problem is to see it as the area under the curve that satisfies the question being asked. I state this as a kind of foreshadowing.

In question one we’re told that we’re dealing with a normal distribution. We’re given the average weight (mean, median, mode don’t forget). We’re told the standard deviation. We’re told that question two is based on the same data set and criteria.

We now need to figure out how to solve normal distribution problems, let’s do that.

We are going to approach these problems using Z-scores and Z-tables. A Z-score is very simply calculated by plugging in the given details supplied in the problem. This Z-score is then used in conjunction with a Z-table to arrive at a probability score. The following illustrative graphs and tables were mostly found on ztable.net.

First the z formula. Very simply it is: z=\frac{x-\mu}{\sigma}, where x is the value being assessed, \mu is the mean and \sigma is the standard deviation in the data. This will get you the Z-score for any value in a normal distribution. But what do we do with that.

First of all we need to think about what types of questions we would want to answer. There are a good few, so we’ll specify only a few. You might want to figure out the probability that something is less than a certain value. You might want to figure out the probability that something is greater than a certain value. And then you might want to figure out that something is exactly a certain value and that something is between two values like in question one and two above.

To graph the probability that a value is less than a value that falls to the left of the median you will end up with a graph like this. The calculated z value will be a negative value in this case.

To graph the probability that a value is less than a value that falls to the right of the median you will end up with a graph like this. The calculated z value will be a positive value in this case.

As a final graph to tie in with question 2 we will have a graph like this which shades in a range of values where both the lower and upper bound values are both negative z values.

But wait, where’s our graph displaying the probability of an exact value in line with question one. Let me walk through the rest of the Z-stuff and we’ll get to that, suffice to say this is more foreshadowing.

The next aid that we need to continue our solution are Z-tables. The Z-tables below are used directly to solve the probability of something being less than a certain value. They can be used to answer the type of graphs at Figure 2 and 3 above. You get a problem, you calculate the Z-score and you look that up on one of the below Z-tables and the correct value intersected by the appropriate row and column is your probability.

Lets do an example problem before we tackle the problems above. We will take question one, but instead of exactly 185, we will go for less than 185. This can be expressed as P(Z<a) where a = 185. As a reminder, the mean is 180 and the standard deviation is 34. That will give us a problem similar to Figure 3 above, although the z value would be much closer to the mean in this sample problem. To calculate z we had the formula: z=\frac{x-\mu}{\sigma}.

z = \frac{185 - 180}{34} = \frac{5}{34} \approx 0.15As this is a positive Z value we look it up in the table at Figure 6. The second row deals with values 0.1x. Then we go to the column 0.05 and we read the intersected value which is: .55966. That means that the probability of a randomly selected value from the set of weights is less than 185 pounds is 55.966%, approximately.

This brings me to question number one. The probability a man weighs exactly 185 pounds. I’ve done less than 185. But that’s a range from the lowest weight in the set up to 185 pounds. If you try to graph exactly 185 you get a single line at the point where 185 appears on the x axis. This means that the range you’re looking for has a range of 0. When calculating these probabilities you’re measuring the area under the curve that fits the specified values. An area where one of the dimensions is 0 is going to multiply to 0.

To look at it another way, as a range from 185 to 185 let’s do the following. The formula for calculating the probability of a random selection falling between two values is P(a < Z < b) = P(Z < b) – P(Z < a). In this case both a and b are 185. We’re going to have P(185 < Z < 185) = P(Z < 185) – P(Z < 185). These terms will cancel out and well be left with P(185 < Z < 185) = 0. If you want to plug in the value for P(Z < b) and P(Z < a), don’t forget we solved that earlier in our sample. To flog the dead horse we have P(185 < Z < 185) = P(Z < 185) – P(Z < 185) becomes P(185 < Z < 185) = 0.15 – 0.15. It all leads to 0.

For Question 2 we need to use the same formula P(a < Z < b) = P(Z < b) – P(Z < a). Except this time we have an actual range. The range is from 120 to 155. We can now rewrite the formula as P(120 < Z < 155) = P(Z < 155) – P(Z < 120).

First P(Z < 155) is calculated as follows: z=\frac{x-\mu}{\sigma} where x=155, mu=180 and sigma = 34.

z = \frac{155 - 180}{34} = \frac{-25}{34} \approx -0.74We look this up in the Z-table and get: 0.22965.

Next P(Z < 120) is calculated as follows: z=\frac{x-\mu}{\sigma} where x=120, mu=180 and sigma = 34.

z = \frac{120 - 180}{34} = \frac{-60}{34} \approx -1.76Look it up in the Z-table to get: 0.03920.

Plug them in and we get P(120 < Z < 155) = 0.22965 – 0.03920 = 0.19045 or 0.19.

To solve question 3, the binomial distribution question we need to specify the probabilities that someone is allergic to cats and that someone isn’t. We’re informed that the probability that someone is allergic is 15%, or 0.15. The probability that someone isn’t allergic is 1 – 0.15, or 0.85. We’ll call someone being allergic success, and someone not being allergic failure. For these binomial calculations (and probability problems in general) success is usually denoted by p and failure by q.

\therefore: p= 0.15 and q= 0.85We’re also told in question 3 that we want 7 successes out of 60 trials. The number of trials is usually denoted by n, and the number of success by r. To solve the problem we can use the binomial distribution formula:-

P(X=r) = \dbinom{n}{r} \cdot {p^r} \cdot {q^{n-r}}P(X=7) = \dbinom{60}{7} \cdot {0.15^{7}} \cdot {0.85^{60-7}}

Without going into the details of \dbinom{60}{7} (60 choose 7), the answer to this is 386,206,920.

0.15^{7} is 1.70859375e-06.

And finally 0.85^{53} is 0.000181636474103.

When you multiply the three terms together you get 0.11985659271. As you can see from the answer options, this is not one of them, but you can round if we round it to 0.12 then we have our answer.

But that’s very long-winded, and I did it the long-winded way, before I found out that there are online calculators to do these calculations for you! I mean I miss the time I wasted, but at least I won’t have to do it again. The first calculator I found was over at stattrek.com. This is not an endorsement, I’m sure there are others, this is just the first one I found, and it gave the correct answer with just the entry of three values; 0.15, 60 and 7.

### Conclusion

This was meant to be a quick post, hah!

But I couldn’t bypass the opportunity to make a post out of the problems I encountered, so I’m more than happy to have gotten it done. It may be rough around the edges, but I thinks it gets the salient points across.

More importantly for me, I got to get re-submersed into the world of statistics and probability, which will be vital knowledge as I move forward with this blog and further into the realm of analytics.

Let me know if you find any particular part confusing and I’ll try to clean it up.

More importantly, let me know if any part is just plain wrong and I’ll definitely clean that up.

I am also doing the aws udacity course too, and this was very helpful, thank you.