Next section:
Standard normal

Normal distributions are a family of distributions that have the same general
shape. They are symmetric with scores more concentrated in the middle than in
the tails. Normal distributions are sometimes described as bell shaped. Examples
of normal distributions are shown to the right. Notice that they differ in how
spread out they are. The area under each curve is the same. The height of a
normal distribution can be specified mathematically in terms of two
parameters: the
mean (μ) and the
standard deviation
(σ).

View mathematical formula

The standard normal distribution is a
normal distribution
with a
mean of 0 and a
standard deviation of
1. Normal distributions can be transformed to standard normal distributions by
the formula:

where X is a score from the original normal distribution, μ is the mean of the
original normal distribution, and σ is the standard deviation of original normal
distribution. The standard normal distribution is sometimes called the z
distribution. A z score always reflects the number of standard deviations above
or below the mean a particular score is. For instance, if a person scored a 70
on a test with a mean of 50 and a standard deviation of 10, then they scored 2
standard deviations above the mean. Converting the test scores to z scores, an X
of 70 would be:

So, a z score of 2 means the original score was 2 standard deviations above the
mean. Note that the z distribution will only be a normal distribution if the
original distribution (X) is normal.

Applying the formula will always
produce a transformed distribution with a mean of zero and a standard deviation
of one. However, the shape of the distribution will not be affected by the
transformation. If X is not normal then the transformed distribution will not be
normal either. One important use of the standard normal distribution is for
converting between scores from a normal distribution and
percentile ranks.

Areas under portions of the standard normal distribution are shown to the right.
About .68 (.34 + .34) of the distribution is between -1 and 1 while about .96 of
the distribution is between -2 and 2.

One reason the normal distribution is important is that many psychological and educational variables are distributed approximately normally. Measures of reading ability, introversion, job satisfaction, and memory are among the many psychological variables approximately normally distributed. Although the distributions are only approximately normal, they are usually quite close. A second reason the normal distribution is so important is that it is easy for mathematical statisticians to work with. This means that many kinds of statistical tests can be derived for normal distributions. Almost all statistical tests discussed in this text assume normal distributions. Fortunately, these tests work very well even if the distribution is only approximately normally distributed. Some tests work well even with very wide deviations from normality. Finally, if the mean and standard deviation of a normal distribution are known, it is easy to convert back and forth from raw scores to percentiles.

If the
mean and
standard deviation of
a normal distribution
are known, it is relatively easy to figure out the
percentile rank of a
person obtaining a specific score. To be more concrete, assume a test in
Introductory Psychology is normally distributed with a mean of 80 and a standard
deviation of 5. What is the percentile rank of a person who received a score of
70 on the test? Mathematical statisticians have developed ways of determining
the proportion of a distribution that is below a given number of standard
deviations from the mean. They have shown that only 2.3% of the population will
be less than or equal to a score two standard deviations below the mean.
(click here to see why 70
is two standard deviations below the mean.) In terms of the Introductory
Psychology test example, this means that a person scoring 70 would be in the
2.3rd percentile.

This graph shows the distribution of scores on the test. The shaded area is 2.3%
of the total area. The proportion of the area below 70 is equal to the
proportion of the scores below 70.

What about a person scoring 75 on
the test? The proportion of the area below 75 is the same as the proportion of
scores below 75.

A score of 75 is one standard deviation below the mean because the mean is 80
and the standard deviation is 5. Mathematical statisticians have determined that
15.9% of the scores in a normal distribution are lower than a score one standard
deviation below the mean. Therefore, the proportion of the scores below 75 is
0.159 and a person scoring 75 would have a percentile rank score of 15.9.

The table on this page gives the proportion of the scores below various values
of z. z is computed with the formula:
where z is the number of standard
deviations (σ) above the mean (μ) X is.

When z is negative it means that X
is below the mean. Thus, a z of -2 means that X is -2 standard deviations above
the mean which is the same thing as being +2 standard deviations below the mean.
To take another example, what is the percentile rank of a person receiving a
score of 90 on the test?

The graph shows that most people scored below 90. Since 90 is 2 standard
deviations above the mean [z = (90 - 80)/5 = 2] it can be determined from the
table that a z score of 2 is equivalent to the 97.7th percentile: The proportion
of people scoring below 90 is thus .977.

What score on the Introductory Psychology test would it have taken to be in
the 75th percentile?

(Remember the test has a mean of 80 and
a standard deviation of 5.) The answer is computed by reversing the steps in the
previous problems. First, determine how many standard deviations above the mean
one would have to be to be in the 75th percentile. This can be found by using a
z table and finding the z associated with .75. The value of z is .674. Thus, one
must be .674 standard deviations above the mean to be in the 75th percentile.
Since the standard deviation is 5, one must be (5)(.674) = 3.37 points above the
mean. Since the mean is 80, a score of 80 + 3.37 = 83.37 is necessary. Rounding
off, a score of 83 is needed to be in the 75th percentile. Since
, a little algebra demonstrates that
X = μ+ z σ. For the present example, X = 80 + (.674)(5) = 83.37 as just shown.

If a test is
normally
distributed with a
mean of 60 and a
standard deviation of 10, what proportion of the scores are above 85? This
problem is very similar to figuring out the percentile rank of a person
scoring 85. The first
step is to figure out the proportion of scores less than or equal to 85. This is
done by figuring out how many standard deviations above the mean 85 is. Since 85
is 85-60 = 25 points above the mean and since the standard deviation is 10, a
score of 85 is 25/10 = 2.5 standard deviations above the mean. Or, in terms of
the formula,

A
z table can be used
to calculate that .9938 of the scores are less than or equal to a score 2.5
standard deviations above the mean. It follows that only 1-.9938 = .0062 of the
scores are above a score 2.5 standard deviations above the mean. Therefore, only
.0062 of the scores are above 85.

Suppose you wanted to know the proportion of students receiving scores
between 70 and 80. The approach is to figure out the proportion of students
scoring below 80 and the proportion below 70. The difference between the two
proportions is the proportion scoring between 70 and 80. First, the calculation
of the proportion below 80. Since 80 is 20 points above the mean and the
standard deviation is 10, 80 is 2 standard deviations above the mean.

A z table can be used
to determine that .9772 of the scores are below a score 2 standard deviations
above the mean.

To calculate the proportion below 70,

A z table can be used
to determine that the proportion of scores less than 1 standard deviation above
the mean is .8413. So, if .1587 of the scores are above 70 and .0228 are above
80, then .1587 -.0228 = .1359 are between 70 and 80.

Assume a test is normally distributed with a mean of 100 and a standard
deviation of 15. What proportion of the scores would be between 85 and 105? The
solution to this problem is similar to the solution to the last one. The first
step is to calculate the proportion of scores below 85. Next, calculate the
proportion of scores below 105. Finally, subtract the first result from the
second to find the proportion scoring between 85 and 105.

Begin by calculating the proportion below 85. 85 is one standard deviation below
the mean:

Using a
z table with
the value of -1 for z, the area below -1 (or 85 in terms of the raw scores) is
.1587.

Doing the same thing for 105,

A z table
shows that
the proportion scoring below .333 (105 in raw scores) is .6304. The difference
is .6304 - .1587 = .4714. So .4714 of the scores are between 85 and 105.

The shape of the binomial distribution depends on the values of n and p.

For large *n* (say n > 20) and *p* not too near 0 or 1 (say 0.05 <
p < 0.95) the distribution approximately follows the Normal distribution.

This can be used to find binomial probabilities.

If X ~ binomial (n,p) where n > 20 and 0.05 < p < 0.95 then approximately X has the Normal distribution with mean E(X) = np

so is approximately N(0,1).

Use the MINITAB command MPLOT (or GMPLOT for high-resolution graphics) to
compare a Binomial Distribution with the Normal Distribution with the same
expected value and variance. First calculate probabilities for the binomial
distribution with n=16 and p=0.5. Then calculate probabilities for the Normal
distribution with µ=np=16x0.5=8, and s ^{2}=npq=16x0.5x0.5=4
so that s=2. The graph shows that the two curves are
very close together (the symbol 2 indicates that the value for the binomial (A)
and the normal (B) distributions were nearly equal).

MTB> set c1 DATA> 0:16 DATA> end MTB> pdf c1 c2; SUBC> binomial 16 0.5. MTB> name c2 'binomial' MTB> pdf c1 c3; SUBC> normal 8 2. *NOTE np = 8, npq = 4 * MTB> name c3 'normal' MTB> Gmplot c2 c1 c3 c1

Hence, if X has the binomial distribution ie. X~ binomial (n,p) and *n*
is large, then X has approximately the Normal distribution with mean µ=np and
standard deviation . This approximation
is reasonably good when np>10 and n(1-p)>10.

For accurate values for binomial probabilities, either use computer software
to do exact calculations or if *n* is not very large, the probability
calculation can be improved by using the **continuity correction**. This
method considers that each whole number occupies the interval from 0.5 below to
0.5 above it. When an outcome X needs to be included in the probability
calculation, the normal approximation uses the interval from (X-0.5) to (X+0.5).

In a particular faculty 60% of students are men and 40% are women. In a random sample of 50 students what is the probability that more than half are women?

Let RV X = number of women in the sample.

Assume X has the binomial distribution with

Then E(X) = np = 50 x 0.4 = 20

var(X) = npq = 50 x 0.4 x 0.6 = 12

so approximately X ~ N(20,12).

We need to find P(X > 25). Note - **not** P(X >= 25).

so

The exact answer calculated from binomial probabilities

is P(X>25) = P(X=26) + P(X=27) + ... + P(X=50) = 0.0573)

The approximate probability, using the **continuity correction**, is

(The value 25.5 was chosen as the outcome 25 was **not** to be included
but the outcomes 26, 27, 50 **were** to be included in the calculation.)

Similarly, if the example required the probability that less than 18 students were women, the continuity correction would require the calculation