hypothesis testing for population mean formula

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

8.6 Hypothesis Tests for a Population Mean with Known Population Standard Deviation

Learning objectives.

Conduct and interpret hypothesis tests for a population mean with known population standard deviation.

Some notes about conducting a hypothesis test:

The null hypothesis [latex]H_0[/latex] is always an “equal to.” The null hypothesis is the original claim about the population parameter.
The alternative hypothesis [latex]H_a[/latex] is a “less than,” “greater than,” or “not equal to.” The form of the alternative hypothesis depends on the context of the question.
If the alternative hypothesis is a “less than”, then the test is left-tail. The p -value is the area in the left-tail of the distribution.
If the alternative hypothesis is a “greater than”, then the test is right-tail. The p -value is the area in the right-tail of the distribution.
If the alternative hypothesis is a “not equal to”, then the test is two-tail. The p -value is the sum of the area in the two-tails of the distribution. Each tail represents exactly half of the p -value.
Think about the meaning of the p -value. A data analyst (and anyone else) should have more confidence that they made the correct decision to reject the null hypothesis with a smaller p -value (for example, 0.001 as opposed to 0.04) even if using a significance level of 0.05. Similarly, for a large p -value such as 0.4, as opposed to a p -value of 0.056 (a significance level of 0.05 is less than either number), a data analyst should have more confidence that they made the correct decision in not rejecting the null hypothesis. This makes the data analyst use judgment rather than mindlessly applying rules.
The significance level must be identified before collecting the sample data and conducting the test. Generally, the significance level will be included in the question. If no significance level is given, a common standard is to use a significance level of 5%.
An alternative approach for hypothesis testing is to use what is called the critical value approach . In this book, we will only use the p -value approach. Some of the videos below may mention the critical value approach, but this approach will not be used in this book.

Suppose the hypotheses for a hypothesis test are:

[latex]\begin{eqnarray*} H_0: & & \mu=5 \\ H_a: & & \mu \lt 5 \end{eqnarray*}[/latex]

Because the alternative hypothesis is a [latex]\lt[/latex], this is a left-tailed test. The p -value is the area in the left-tail of the distribution.

Normal distribution curve of a single population mean with a value of 5 on the x-axis and the p-value points to the area on the left tail of the curve.

[latex]\begin{eqnarray*} H_0: & & \mu=0.5 \\ H_a: & & \mu \neq 0.5 \end{eqnarray*}[/latex]

Because the alternative hypothesis is a [latex]\neq[/latex], this is a two-tailed test. The p -value is the sum of the areas in the two tails of the distribution. Each tail contains exactly half of the p -value.

Normal distribution curve of a single population mean with a value of 0.5 on the x-axis. The p-value formulas, 1/2(p-value), for a two-tailed test is shown for the areas on the left and right tails of the curve.

[latex]\begin{eqnarray*} H_0: & & \mu=10 \\ H_a: & & \mu \lt 10 \end{eqnarray*}[/latex]

Normal distribution curve of a single population mean with a value of 10 on the x-axis and the p-value points to the area on the left tail of the curve.

Steps to Conduct a Hypothesis Test for a Population Mean with Known Population Standard Deviation

Write down the null and alternative hypotheses in terms of the population mean [latex]\mu[/latex]. Include appropriate units with the values of the mean.
Use the form of the alternative hypothesis to determine if the test is left-tailed, right-tailed, or two-tailed.
Collect the sample information for the test and identify the significance level [latex]\alpha[/latex].
When the population standard deviation is known , we use a normal distribution with [latex]\displaystyle{z=\frac{\overline{x}-\mu}{\frac{\sigma}{\sqrt{n}}}}[/latex] to find the p -value. The p -value is the area in the corresponding tail of the normal distribution.
The results of the sample data are significant. There is sufficient evidence to conclude that the null hypothesis [latex]H_0[/latex] is an incorrect belief and that the alternative hypothesis [latex]H_a[/latex] is most likely correct.
The results of the sample data are not significant. There is not sufficient evidence to conclude that the alternative hypothesis [latex]H_a[/latex] may be correct.
Write down a concluding sentence specific to the context of the question.

USING EXCEL TO CALCULE THE P -VALUE FOR A HYPOTHESIS TEST ON A POPULATION MEAN WITH KNOWN POPULATION STANDARD DEVIATION

The p -value for a hypothesis test on a population mean is the area in the tail(s) of the distribution of the sample mean. When the population standard deviation is known, use the normal distribution to find the p -value.

The p -value is the area in the tail(s) of a normal distribution, so the norm.dist(x,[latex]\mu[/latex],[latex]\sigma[/latex],logic operator) function can be used to calculate the p -value.

For x , enter the value for [latex]\overline{x}[/latex].
For [latex]\mu[/latex] , enter the mean of the sample means [latex]\mu[/latex]. Note: Because the test is run assuming the null hypothesis is true, the value for [latex]\mu[/latex] is the claim from the null hypothesis.
For [latex]\sigma[/latex] , enter the standard error of the mean [latex]\displaystyle{\frac{\sigma}{\sqrt{n}}}[/latex].
For the logic operator , enter true . Note: Because we are calculating the area under the curve, we always enter true for the logic operator.

Use the appropriate technique with the norm.dist function to find the area in the left-tail or the area in the right-tail.

Jeffrey, as an eight-year old, established a mean time of 16.43 seconds with a standard deviation of 0.8 seconds for swimming the 25-meter freestyle. His dad, Frank, thought that Jeffrey could swim the 25-meter freestyle faster using goggles. Frank bought Jeffrey a new pair of goggles and timed Jeffrey swimming the 25-meter freestyle 15 different times. In the sample of 15 swims, Jeffrey’s mean time was 16 seconds. Frank thought that the goggles helped Jeffrey swim faster than 16.43 seconds. At the 5% significance level, did Jeffrey swim faster wearing the goggles? Assume that the swim times for the 25-meter freestyle are normally distributed.

Hypotheses:

[latex]\begin{eqnarray*} H_0: & & \mu=16.43 \mbox{ seconds} \\ H_a: & & \mu \lt 16.43 \mbox{ seconds} \end{eqnarray*}[/latex]

From the question, we have [latex]n=15[/latex], [latex]\overline{x}=16[/latex], [latex]\sigma=0.8[/latex] and [latex]\alpha=0.05[/latex].

This is a test on a population mean where the population standard deviation is known ([latex]\sigma=0.8[/latex]). So we use a normal distribution to calculate the p -value. Because the alternative hypothesis is a [latex]\lt[/latex], the p -value is the area in the left-tail of the distribution.

This is a normal distribution curve. On the left side of the center a vertical line extends to the curve with the area to the left of this vertical line shaded. The p-value equals the area of this shaded region.

	norm.dist
	16	0.0187
	16.43
	0.8/sqrt(15)
	true

So the p -value[latex]=0.0187[/latex].

Conclusion:

Because p -value[latex]=0.0187 \lt 0.05=\alpha[/latex], we reject the null hypothesis in favour of the alternative hypothesis. At the 5% significance level there is enough evidence to suggest that Jeffrey’s mean swim time with the goggles is less than 16.43 seconds.

The null hypothesis [latex]\mu=16.43[/latex] is the claim that Jeffrey’s mean swim time with the goggles is 16.43 seconds (the same as it is without the googles).
The alternative hypothesis [latex]\mu \lt 16.43[/latex] is the claim that Jeffrey’s swim time with the goggles is less than 16.43 seconds.
The function is norm.dist because we are finding the area in the left tail of a normal distribution.
Field 1 is the value of [latex]\overline{x}[/latex]
Field 2 is the value of [latex]\mu[/latex] from the null hypothesis. Remember, we run the test assuming the null hypothesis is true, so that means we assume [latex]\mu=16.43[/latex].
Field 3 is the standard deviation for the sample means [latex]\displaystyle{\frac{\sigma}{\sqrt{n}}}[/latex]. Note that we are not using the standard deviation from the population ([latex]\sigma=0.8[/latex]). This is because the p -value is the area under the curve of the distribution of the sample means, not the distribution of the population.
The p -value of 0.0187 tells us that under the assumption that Jeffrey’s mean swim time with goggles is 16.43 seconds (the null hypothesis), there is only a 1.87% chance that the mean time for the 15 sample swims is 16 seconds or less. This is a small probability, and so is unlikely to happen assuming the null hypothesis is true. This suggests that the assumption that the null hypothesis is true is most likely incorrect, and so the conclusion of the test is to reject the null hypothesis in favour of the alternative hypothesis.
The Type I error for this problem is to conclude that Jeffrey swims the 25-meter freestyle, on average, in less than 16.43 seconds (the alternative hypothesis) when, in fact, he actually swims the 25-meter freestyle, on average, in 16.43 seconds (the null hypothesis). That is, reject the null hypothesis when the null hypothesis is actually true.
The Type II error for this problem is to conclude that Jeffrey swims the 25-meter freestyle, on average, in 16.43 seconds (the null hypothesis) when, in fact, he actually swims the 25-meter freestyle, on average, in less than 16.43 seconds (the alternative hypothesis). That is, do not reject the null hypothesis when the null hypothesis is actually false.

The mean throwing distance of a football for Marco, a high school freshman quarterback, is 40 yards with a standard deviation of 2 yards. The team coach tells Marco to adjust his grip to get more distance. The coach records the distances for 20 throws with the new grip. For the 20 throws, Marco’s mean distance was 41.5 yards. The coach thought the different grip helped Marco throw farther than 40 yards. At the 5% significance level, is Marco’s mean throwing distance higher with the new grip? Assume the throw distances for footballs are normally distributed.

[latex]\begin{eqnarray*} H_0: & & \mu=40 \mbox{ yards} \\ H_a: & & \mu \gt 40 \mbox{ yards} \end{eqnarray*}[/latex]

From the question, we have [latex]n=20[/latex], [latex]\overline{x}=41.5[/latex], [latex]\sigma=2[/latex] and [latex]\alpha=0.05[/latex].

This is a test on a population mean where the population standard deviation is known ([latex]\sigma=2[/latex]). So we use a normal distribution to calculate the p -value. Because the alternative hypothesis is a [latex]\gt[/latex], the p -value is the area in the right-tail of the distribution.

This is a normal distribution curve. On the right side of the center a vertical line extends to the curve with the area to the right of this vertical line shaded. The p-value equals the area of this shaded region.

	1-norm.dist
	41.5	0.0004
	40
	2/sqrt(20)
	true

So the p -value[latex]=0.0004[/latex].

Because p -value[latex]=0.0004 \lt 0.05=\alpha[/latex], we reject the null hypothesis in favour of the alternative hypothesis. At the 5% significance level there is enough evidence to suggest that Marco’s mean throwing distance is greater than 40 yards with the new grip.

The null hypothesis [latex]\mu=40[/latex] is the claim that Marco’s mean throwing distance with the new grip is 40 yards (the same as it is without the new grip).
The alternative hypothesis [latex]\mu \gt 40[/latex] is the claim that Marco’s mean throwing distance with the new grip is greater than 40 yards.
Field 2 is the value of [latex]\mu[/latex] from the null hypothesis.
Field 3 is the standard deviation for the sample means [latex]\displaystyle{\frac{\sigma}{\sqrt{n}}}[/latex].
The p -value of 0.0004 tells us that under the assumption that Marco’s mean throwing distance with the new grip is 40 yards, there is only a 0.047% chance that the mean throwing distance for the 20 sample throws is more than 40 yards. This is a small probability, and so is unlikely to happen assuming the null hypothesis is true. This suggests that the assumption that the null hypothesis is true is most likely incorrect, and so the conclusion of the test is to reject the null hypothesis in favour of the alternative hypothesis.

A local college states in its marketing materials that the average age of its first-year students is 18.3 years with a standard deviation of 3.4 years. But this information is based on old data and does not take into account that more older adults are returning to college. A researcher at the college believes that the average age of its first-year students has changed. The researcher takes a sample of 50 first-year students and finds the average age is 19.5 years. At the 1% significance level, has the average age of the college’s first-year students changed?

[latex]\begin{eqnarray*} H_0: & & \mu=18.3 \mbox{ years} \\ H_a: & & \mu \neq 18.3 \mbox{ years} \end{eqnarray*}[/latex]

From the question, we have [latex]n=50[/latex], [latex]\overline{x}=19.5[/latex], [latex]\sigma=3.4[/latex] and [latex]\alpha=0.01[/latex].

This is a test on a population mean where the population standard deviation is known ([latex]\sigma=3.4[/latex]). In this case, the sample size is greater than 30. So we use a normal distribution to calculate the p -value. Because the alternative hypothesis is a [latex]\neq[/latex], the p -value is the sum of area in the tails of the distribution.

Because there is only one sample, we only have information relating to one of the two tails, either the left tail or the right tail. We need to know if the sample relates to the left tail or right tail because that will determine how we calculate out the area of that tail using the normal distribution. In this case, the sample mean [latex]\overline{x}=19.5[/latex] is greater than the value of the population mean in the null hypothesis [latex]\mu=18.3[/latex] ([latex]\overline{x}=19.5>18.3=\mu[/latex]), so the sample information relates to the right-tail of the normal distribution. This means that we will calculate out the area in the right tail using 1-norm.dist . However, this is a two-tailed test where the p -value is the sum of the area in the two tails and the area in the right-tail is only one half of the p -value. The area in the left tail equals the area in the right tail and the p -value is the sum of these two areas.

	1-norm.dist
	19.5	0.0063
	18.3
	3.4/sqrt(50)
	true

So the area in the right tail is 0.0063 and [latex]\frac{1}{2}[/latex]( p -value)[latex]=0.0063[/latex]. This is also the area in the left tail, so

p -value[latex]=0.0063+0.0063=0.0126[/latex]

Because p -value[latex]=0.0126 \gt 0.01=\alpha[/latex], we do not reject the null hypothesis. At the 1% significance level there is not enough evidence to suggest that the average age of the college’s first-year students has changed.

The null hypothesis [latex]\mu=18.3[/latex] is the claim that the average age of the first-year students is still 18.3 years.
The alternative hypothesis [latex]\mu \neq 18.3[/latex] is the claim that the average age of the first-year students has changed from 18.3 years.
We use norm.dist([latex]\overline{x}[/latex],[latex]\mu[/latex],[latex]\sigma/\mbox{sqrt}(n)[/latex],true) to find the area in the left tail. The area in the right tail equals the area in the left tail, so we can find the p -value by adding the output from this function to itself.
We use 1-norm.dist([latex]\overline{x}[/latex],[latex]\mu[/latex],[latex]\sigma/\mbox{sqrt}(n)[/latex],true) to find the area in the right tail. The area in the left tail equals the area in the right tail, so we can find the p -value by adding the output from this function to itself.
The p -value of 0.0126 is a large probability compared to the 1% significance level, and so is likely to happen assuming the null hypothesis is true. This suggests that the assumption that the null hypothesis is true is most likely correct, and so the conclusion of the test is to not reject the null hypothesis. In other words, the claim that the average age of first-year students is 18.3 years is most likely correct.

Watch this video: Hypothesis Testing: z -test, right tail by ExcelIsFun [33:47]

Watch this video: Hypothesis Testing: z -test, left tail by ExcelIsFun [10:57]

Watch this video: Hypothesis Testing: z -test, two tail by ExcelIsFun [9:56]

Concept Review

The hypothesis test for a population mean is a well established process:

Collect the sample information for the test and identify the significance level.
When the population standard deviation is known, find the p -value (the area in the corresponding tail) for the test using the normal distribution.
Compare the p -value to the significance level and state the outcome of the test.

Attribution

“ 9.6 Hypothesis Testing of a Single Mean and Single Proportion “ in Introductory Statistics by OpenStax is licensed under a Creative Commons Attribution 4.0 International License.

Teach yourself statistics

Hypothesis Test for a Mean

This lesson explains how to conduct a hypothesis test of a mean, when the following conditions are met:

The sampling method is simple random sampling .
The sampling distribution is normal or nearly normal.

Generally, the sampling distribution will be approximately normally distributed if any of the following conditions apply.

The population distribution is normal.
The population distribution is symmetric , unimodal , without outliers , and the sample size is 15 or less.
The population distribution is moderately skewed , unimodal, without outliers, and the sample size is between 16 and 40.
The sample size is greater than 40, without outliers.

This approach consists of four steps: (1) state the hypotheses, (2) formulate an analysis plan, (3) analyze sample data, and (4) interpret results.

State the Hypotheses

Every hypothesis test requires the analyst to state a null hypothesis and an alternative hypothesis . The hypotheses are stated in such a way that they are mutually exclusive. That is, if one is true, the other must be false; and vice versa.

The table below shows three sets of hypotheses. Each makes a statement about how the population mean μ is related to a specified value M . (In the table, the symbol ≠ means " not equal to ".)

Set	Null hypothesis	Alternative hypothesis	Number of tails
1	μ = M	μ ≠ M	2
2	μ M	μ < M	1
3	μ M	μ > M	1

The first set of hypotheses (Set 1) is an example of a two-tailed test , since an extreme value on either side of the sampling distribution would cause a researcher to reject the null hypothesis. The other two sets of hypotheses (Sets 2 and 3) are one-tailed tests , since an extreme value on only one side of the sampling distribution would cause a researcher to reject the null hypothesis.

Formulate an Analysis Plan

The analysis plan describes how to use sample data to accept or reject the null hypothesis. It should specify the following elements.

Significance level. Often, researchers choose significance levels equal to 0.01, 0.05, or 0.10; but any value between 0 and 1 can be used.
Test method. Use the one-sample t-test to determine whether the hypothesized mean differs significantly from the observed sample mean.

Analyze Sample Data

Using sample data, conduct a one-sample t-test. This involves finding the standard error, degrees of freedom, test statistic, and the P-value associated with the test statistic.

SE = s * sqrt{ ( 1/n ) * [ ( N - n ) / ( N - 1 ) ] }

SE = s / sqrt( n )

Degrees of freedom. The degrees of freedom (DF) is equal to the sample size (n) minus one. Thus, DF = n - 1.

t = ( x - μ) / SE

P-value. The P-value is the probability of observing a sample statistic as extreme as the test statistic. Since the test statistic is a t statistic, use the t Distribution Calculator to assess the probability associated with the t statistic, given the degrees of freedom computed above. (See sample problems at the end of this lesson for examples of how this is done.)

Sample Size Calculator

As you probably noticed, the process of hypothesis testing can be complex. When you need to test a hypothesis about a mean score, consider using the Sample Size Calculator. The calculator is fairly easy to use, and it is free. You can find the Sample Size Calculator in Stat Trek's main menu under the Stat Tools tab. Or you can tap the button below.

Interpret Results

If the sample findings are unlikely, given the null hypothesis, the researcher rejects the null hypothesis. Typically, this involves comparing the P-value to the significance level , and rejecting the null hypothesis when the P-value is less than the significance level.

Test Your Understanding

In this section, two sample problems illustrate how to conduct a hypothesis test of a mean score. The first problem involves a two-tailed test; the second problem, a one-tailed test.

Problem 1: Two-Tailed Test

An inventor has developed a new, energy-efficient lawn mower engine. He claims that the engine will run continuously for 5 hours (300 minutes) on a single gallon of regular gasoline. From his stock of 2000 engines, the inventor selects a simple random sample of 50 engines for testing. The engines run for an average of 295 minutes, with a standard deviation of 20 minutes. Test the null hypothesis that the mean run time is 300 minutes against the alternative hypothesis that the mean run time is not 300 minutes. Use a 0.05 level of significance. (Assume that run times for the population of engines are normally distributed.)

Solution: The solution to this problem takes four steps: (1) state the hypotheses, (2) formulate an analysis plan, (3) analyze sample data, and (4) interpret results. We work through those steps below:

Null hypothesis: μ = 300

Alternative hypothesis: μ ≠ 300

Formulate an analysis plan . For this analysis, the significance level is 0.05. The test method is a one-sample t-test .

SE = s / sqrt(n) = 20 / sqrt(50) = 20/7.07 = 2.83

DF = n - 1 = 50 - 1 = 49

t = ( x - μ) / SE = (295 - 300)/2.83 = -1.77

where s is the standard deviation of the sample, x is the sample mean, μ is the hypothesized population mean, and n is the sample size.

Since we have a two-tailed test , the P-value is the probability that the t statistic having 49 degrees of freedom is less than -1.77 or greater than 1.77. We use the t Distribution Calculator to find P(t < -1.77) is about 0.04.

If you enter 1.77 as the sample mean in the t Distribution Calculator, you will find the that the P(t < 1.77) is about 0.04. Therefore, P(t > 1.77) is 1 minus 0.96 or 0.04. Thus, the P-value = 0.04 + 0.04 = 0.08.
Interpret results . Since the P-value (0.08) is greater than the significance level (0.05), we cannot reject the null hypothesis.

Note: If you use this approach on an exam, you may also want to mention why this approach is appropriate. Specifically, the approach is appropriate because the sampling method was simple random sampling, the population was normally distributed, and the sample size was small relative to the population size (less than 5%).

Problem 2: One-Tailed Test

Bon Air Elementary School has 1000 students. The principal of the school thinks that the average IQ of students at Bon Air is at least 110. To prove her point, she administers an IQ test to 20 randomly selected students. Among the sampled students, the average IQ is 108 with a standard deviation of 10. Based on these results, should the principal accept or reject her original hypothesis? Assume a significance level of 0.01. (Assume that test scores in the population of engines are normally distributed.)

Null hypothesis: μ >= 110

Alternative hypothesis: μ < 110

Formulate an analysis plan . For this analysis, the significance level is 0.01. The test method is a one-sample t-test .

SE = s / sqrt(n) = 10 / sqrt(20) = 10/4.472 = 2.236

DF = n - 1 = 20 - 1 = 19

t = ( x - μ) / SE = (108 - 110)/2.236 = -0.894

Here is the logic of the analysis: Given the alternative hypothesis (μ < 110), we want to know whether the observed sample mean is small enough to cause us to reject the null hypothesis.

The observed sample mean produced a t statistic test statistic of -0.894. We use the t Distribution Calculator to find P(t < -0.894) is about 0.19.

This means we would expect to find a sample mean of 108 or smaller in 19 percent of our samples, if the true population IQ were 110. Thus the P-value in this analysis is 0.19.
Interpret results . Since the P-value (0.19) is greater than the significance level (0.01), we cannot reject the null hypothesis.

Module: Inference for Means

Hypothesis test for a population mean (5 of 5), learning objectives.

Interpret the P-value as a conditional probability.

We finish our discussion of the hypothesis test for a population mean with a review of the meaning of the P-value, along with a review of type I and type II errors.

Review of the Meaning of the P-value

At this point, we assume you know how to use a P-value to make a decision in a hypothesis test. The logic is always the same. If we pick a level of significance (α), then we compare the P-value to α.

If the P-value ≤ α, reject the null hypothesis. The data supports the alternative hypothesis.
If the P-value > α, do not reject the null hypothesis. The data is not strong enough to support the alternative hypothesis.

In fact, we find that we treat these as “rules” and apply them without thinking about what the P-value means. So let’s pause here and review the meaning of the P-value, since it is the connection between probability and decision-making in inference.

Birth Weights in a Town

Let’s return to the familiar context of birth weights for babies in a town. Suppose that babies in the town had a mean birth weight of 3,500 grams in 2010. This year, a random sample of 50 babies has a mean weight of about 3,400 grams with a standard deviation of about 500 grams. Here is the distribution of birth weights in the sample.

Dot plot of birth weights, ranging from around 2,000 grams to 4,000 grams.

Obviously, this sample weighs less on average than the population of babies in the town in 2010. A decrease in the town’s mean birth weight could indicate a decline in overall health of the town. But does this sample give strong evidence that the town’s mean birth weight is less than 3,500 grams this year?

We now know how to answer this question with a hypothesis test. Let’s use a significance level of 5%.

Let μ = mean birth weight in the town this year. The null hypothesis says there is “no change from 2010.”

H 0 : μ < 3,500
H a : μ = 3,500

Since the sample is large, we can conduct the T-test (without worrying about the shape of the distribution of birth weights for individual babies.)

[latex]T\text{}=\text{}\frac{\mathrm{3,400}-\mathrm{3,500}}{\frac{500}{\sqrt{50}}}\text{}\approx \text{}-1.41[/latex]

Statistical software tells us the P-value is 0.082 = 8.2%. Since the P-value is greater than 0.05, we fail to reject the null hypothesis.

Our conclusion: This sample does not suggest that the mean birth weight this year is less than 3,500 grams ( P -value = 0.082). The sample from this year has a mean of 3,400 grams, which is 100 grams lower than the mean in 2010. But this difference is not statistically significant. It can be explained by the chance fluctuation we expect to see in random sampling.

What Does the P-Value of 0.082 Tell Us?

A simulation can help us understand the P-value. In a simulation, we assume that the population mean is 3,500 grams. This is the null hypothesis. We assume the null hypothesis is true and select 1,000 random samples from a population with a mean of 3,500 grams. The mean of the sampling distribution is at 3,500 (as predicted by the null hypothesis.) We see this in the simulated sampling distribution.

If the mean = 3,500 then 86 out of the 1,000 random samples have a sample mean less than 3,400. This is 0.086 = 8.6%

In the simulation, we can see that about 8.6% of the samples have a mean less than 3,400. Since probability is the relative frequency of an event in the long run, we say there is an 8.6% chance that a random sample of 500 babies has a mean less than 3,400 if the population mean is 3,500. We can see that the corresponding area to the left of T = −1.41 in the T-model (with df = 49) also gives us a good estimate of the probability. This area is the P-value, about 8.2%.

If we generalize this statement, we say the P-value is the probability that random samples have results more extreme than the data if the null hypothesis is true. (By more extreme, we mean further from value of the parameter, in the direction of the alternative hypothesis.) We can also describe the P-value in terms of T-scores. The P-value is the probability that the test statistic from a random sample has a value more extreme than that associated with the data if the null hypothesis is true.

Learn By Doing

What does a p-value mean.

Do women who smoke run the risk of shorter pregnancy and premature birth? The mean pregnancy length is 266 days. We test the following hypotheses.

H 0 : μ = 266
H a : μ < 266

Suppose a random sample of 40 women who smoke during their pregnancy have a mean pregnancy length of 260 days with a standard deviation of 21 days. The P-value is 0.04.

What probability does the P-value of 0.04 describe? Label each of the following interpretations as valid or invalid.

Review of Type I and Type II Errors

We know that statistical inference is based on probability, so there is always some chance of making a wrong decision. Recall that there are two types of wrong decisions that can be made in hypothesis testing. When we reject a null hypothesis that is true, we commit a type I error. When we fail to reject a null hypothesis that is false, we commit a type II error.

The following table summarizes the logic behind type I and type II errors.

A table that summarizes the logic behind type I and type II errors. If Ho is true and we reject Ho (accept Ha), this is a correct decision. If Ho is true and we fail to reject Ho (not enough evidence to accept Ha), this is a correct decision. If Ho is false (Ha is true) and we reject Ho (accept Ha), this is a correct decision. If Ho is false (Ha is true) and we fail to reject Ho (not enough evidence to accept Ha), this is a type II error.

It is possible to have some influence over the likelihoods of committing these errors. But decreasing the chance of a type I error increases the chance of a type II error. We have to decide which error is more serious for a given situation. Sometimes a type I error is more serious. Other times a type II error is more serious. Sometimes neither is serious.

Recall that if the null hypothesis is true, the probability of committing a type I error is α. Why is this? Well, when we choose a level of significance (α), we are choosing a benchmark for rejecting the null hypothesis. If the null hypothesis is true, then the probability that we will reject a true null hypothesis is α. So the smaller α is, the smaller the probability of a type I error.

It is more complicated to calculate the probability of a type II error. The best way to reduce the probability of a type II error is to increase the sample size. But once the sample size is set, larger values of α will decrease the probability of a type II error (while increasing the probability of a type I error).

General Guidelines for Choosing a Level of Significance

If the consequences of a type I error are more serious, choose a small level of significance (α).
If the consequences of a type II error are more serious, choose a larger level of significance (α). But remember that the level of significance is the probability of committing a type I error.
In general, we pick the largest level of significance that we can tolerate as the chance of a type I error.

Let’s return to the investigation of the impact of smoking on pregnancy length.

Recap of the hypothesis test: The mean human pregnancy length is 266 days. We test the following hypotheses.

Let’s Summarize

In this “Hypothesis Test for a Population Mean,” we looked at the four steps of a hypothesis test as they relate to a claim about a population mean.

Step 1: Determine the hypotheses.

The hypotheses are claims about the population mean, µ.
The null hypothesis is a hypothesis that the mean equals a specific value, µ 0 .
When [latex]{H}_{a}[/latex] is [latex]μ[/latex] < [latex]{μ}_{0}[/latex] or [latex]μ[/latex] > [latex]{μ}_{0}[/latex] , the test is a one-tailed test.
When [latex]{H}_{a}[/latex] is [latex]μ[/latex] ≠ [latex]{μ}_{0}[/latex] , the test is a two-tailed test.

Step 2: Collect the data.

Since the hypothesis test is based on probability, random selection or assignment is essential in data production. Additionally, we need to check whether the t-model is a good fit for the sampling distribution of sample means. To use the t-model, the variable must be normally distributed in the population or the sample size must be more than 30. In practice, it is often impossible to verify that the variable is normally distributed in the population. If this is the case and the sample size is not more than 30, researchers often use the t-model if the sample is not strongly skewed and does not have outliers.

Step 3: Assess the evidence.

If a t-model is appropriate, determine the t-test statistic for the data’s sample mean.

[latex]\frac{\mathrm{sample}\text{}\mathrm{mean}-\mathrm{population}\text{}\mathrm{mean}}{\mathrm{estimated}\text{}\mathrm{standard}\text{}\mathrm{error}}\text{}=\text{}\frac{\stackrel{¯}{x}-μ}{s/\sqrt{n}}[/latex]

Use the test statistic, together with the alternative hypothesis, to determine the P-value.
The P-value is the probability of finding a random sample with a mean at least as extreme as our sample mean, assuming that the null hypothesis is true.
As in all hypothesis tests, if the alternative hypothesis is greater than, the P-value is the area to the right of the test statistic. If the alternative hypothesis is less than, the P-value is the area to the left of the test statistic. If the alternative hypothesis is not equal to, the P-value is equal to double the tail area beyond the test statistic.

Step 4: Give the conclusion.

The logic of the hypothesis test is always the same. To state a conclusion about H 0 , we compare the P-value to the significance level, α.

If P ≤ α, we reject H 0 . We conclude there is significant evidence in favor of H a .
If P > α, we fail to reject H 0 . We conclude the sample does not provide significant evidence in favor of H a .
We write the conclusion in the context of the research question. Our conclusion is usually a statement about the alternative hypothesis (we accept H a or fail to acceptH a ) and should include the P-value.

Other Hypothesis Testing Notes

Remember that the P-value is the probability of seeing a sample mean at least as extreme as the one from the data if the null hypothesis is true. The probability is about the random sample; it is not a “chance” statement about the null or alternative hypothesis.
If our test results in rejecting a null hypothesis that is actually true, then it is called a type I error.
If our test results in failing to reject a null hypothesis that is actually false, then it is called a type II error.
If rejecting a null hypothesis would be very expensive, controversial, or dangerous, then we really want to avoid a type I error. In this case, we would set a strict significance level (a small value of α, such as 0.01).
Finally, remember the phrase “garbage in, garbage out.” If the data collection methods are poor, then the results of a hypothesis test are meaningless.
Concepts in Statistics. Provided by : Open Learning Initiative. Located at : http://oli.cmu.edu . License : CC BY: Attribution

Hypothesis Testing for Means & Proportions

Lisa Sullivan, PhD

Professor of Biostatistics

Boston University School of Public Health

hypothesis testing for population mean formula

Introduction

This is the first of three modules that will addresses the second area of statistical inference, which is hypothesis testing, in which a specific statement or hypothesis is generated about a population parameter, and sample statistics are used to assess the likelihood that the hypothesis is true. The hypothesis is based on available information and the investigator's belief about the population parameters. The process of hypothesis testing involves setting up two competing hypotheses, the null hypothesis and the alternate hypothesis. One selects a random sample (or multiple samples when there are more comparison groups), computes summary statistics and then assesses the likelihood that the sample data support the research or alternative hypothesis. Similar to estimation, the process of hypothesis testing is based on probability theory and the Central Limit Theorem.

This module will focus on hypothesis testing for means and proportions. The next two modules in this series will address analysis of variance and chi-squared tests.

Learning Objectives

After completing this module, the student will be able to:

Define null and research hypothesis, test statistic, level of significance and decision rule
Distinguish between Type I and Type II errors and discuss the implications of each
Explain the difference between one and two sided tests of hypothesis
Estimate and interpret p-values
Explain the relationship between confidence interval estimates and p-values in drawing inferences
Differentiate hypothesis testing procedures based on type of outcome variable and number of sample

Introduction to Hypothesis Testing

Techniques for hypothesis testing .

The techniques for hypothesis testing depend on

the type of outcome variable being analyzed (continuous, dichotomous, discrete)
the number of comparison groups in the investigation
whether the comparison groups are independent (i.e., physically separate such as men versus women) or dependent (i.e., matched or paired such as pre- and post-assessments on the same participants).

In estimation we focused explicitly on techniques for one and two samples and discussed estimation for a specific parameter (e.g., the mean or proportion of a population), for differences (e.g., difference in means, the risk difference) and ratios (e.g., the relative risk and odds ratio). Here we will focus on procedures for one and two samples when the outcome is either continuous (and we focus on means) or dichotomous (and we focus on proportions).

General Approach: A Simple Example

The Centers for Disease Control (CDC) reported on trends in weight, height and body mass index from the 1960's through 2002. 1 The general trend was that Americans were much heavier and slightly taller in 2002 as compared to 1960; both men and women gained approximately 24 pounds, on average, between 1960 and 2002. In 2002, the mean weight for men was reported at 191 pounds. Suppose that an investigator hypothesizes that weights are even higher in 2006 (i.e., that the trend continued over the subsequent 4 years). The research hypothesis is that the mean weight in men in 2006 is more than 191 pounds. The null hypothesis is that there is no change in weight, and therefore the mean weight is still 191 pounds in 2006.

Null Hypothesis	H : μ= 191 (no change)
Research Hypothesis	H : μ> 191 (investigator's belief)

In order to test the hypotheses, we select a random sample of American males in 2006 and measure their weights. Suppose we have resources available to recruit n=100 men into our sample. We weigh each participant and compute summary statistics on the sample data. Suppose in the sample we determine the following:

Do the sample data support the null or research hypothesis? The sample mean of 197.1 is numerically higher than 191. However, is this difference more than would be expected by chance? In hypothesis testing, we assume that the null hypothesis holds until proven otherwise. We therefore need to determine the likelihood of observing a sample mean of 197.1 or higher when the true population mean is 191 (i.e., if the null hypothesis is true or under the null hypothesis). We can compute this probability using the Central Limit Theorem. Specifically,

(Notice that we use the sample standard deviation in computing the Z score. This is generally an appropriate substitution as long as the sample size is large, n > 30. Thus, there is less than a 1% probability of observing a sample mean as large as 197.1 when the true population mean is 191. Do you think that the null hypothesis is likely true? Based on how unlikely it is to observe a sample mean of 197.1 under the null hypothesis (i.e., <1% probability), we might infer, from our data, that the null hypothesis is probably not true.

Suppose that the sample data had turned out differently. Suppose that we instead observed the following in 2006:

How likely it is to observe a sample mean of 192.1 or higher when the true population mean is 191 (i.e., if the null hypothesis is true)? We can again compute this probability using the Central Limit Theorem. Specifically,

There is a 33.4% probability of observing a sample mean as large as 192.1 when the true population mean is 191. Do you think that the null hypothesis is likely true?

Neither of the sample means that we obtained allows us to know with certainty whether the null hypothesis is true or not. However, our computations suggest that, if the null hypothesis were true, the probability of observing a sample mean >197.1 is less than 1%. In contrast, if the null hypothesis were true, the probability of observing a sample mean >192.1 is about 33%. We can't know whether the null hypothesis is true, but the sample that provided a mean value of 197.1 provides much stronger evidence in favor of rejecting the null hypothesis, than the sample that provided a mean value of 192.1. Note that this does not mean that a sample mean of 192.1 indicates that the null hypothesis is true; it just doesn't provide compelling evidence to reject it.

In essence, hypothesis testing is a procedure to compute a probability that reflects the strength of the evidence (based on a given sample) for rejecting the null hypothesis. In hypothesis testing, we determine a threshold or cut-off point (called the critical value) to decide when to believe the null hypothesis and when to believe the research hypothesis. It is important to note that it is possible to observe any sample mean when the true population mean is true (in this example equal to 191), but some sample means are very unlikely. Based on the two samples above it would seem reasonable to believe the research hypothesis when x̄ = 197.1, but to believe the null hypothesis when x̄ =192.1. What we need is a threshold value such that if x̄ is above that threshold then we believe that H 1 is true and if x̄ is below that threshold then we believe that H 0 is true. The difficulty in determining a threshold for x̄ is that it depends on the scale of measurement. In this example, the threshold, sometimes called the critical value, might be 195 (i.e., if the sample mean is 195 or more then we believe that H 1 is true and if the sample mean is less than 195 then we believe that H 0 is true). Suppose we are interested in assessing an increase in blood pressure over time, the critical value will be different because blood pressures are measured in millimeters of mercury (mmHg) as opposed to in pounds. In the following we will explain how the critical value is determined and how we handle the issue of scale.

First, to address the issue of scale in determining the critical value, we convert our sample data (in particular the sample mean) into a Z score. We know from the module on probability that the center of the Z distribution is zero and extreme values are those that exceed 2 or fall below -2. Z scores above 2 and below -2 represent approximately 5% of all Z values. If the observed sample mean is close to the mean specified in H 0 (here m =191), then Z will be close to zero. If the observed sample mean is much larger than the mean specified in H 0 , then Z will be large.

In hypothesis testing, we select a critical value from the Z distribution. This is done by first determining what is called the level of significance, denoted α ("alpha"). What we are doing here is drawing a line at extreme values. The level of significance is the probability that we reject the null hypothesis (in favor of the alternative) when it is actually true and is also called the Type I error rate.

α = Level of significance = P(Type I error) = P(Reject H 0 | H 0 is true).

Because α is a probability, it ranges between 0 and 1. The most commonly used value in the medical literature for α is 0.05, or 5%. Thus, if an investigator selects α=0.05, then they are allowing a 5% probability of incorrectly rejecting the null hypothesis in favor of the alternative when the null is in fact true. Depending on the circumstances, one might choose to use a level of significance of 1% or 10%. For example, if an investigator wanted to reject the null only if there were even stronger evidence than that ensured with α=0.05, they could choose a =0.01as their level of significance. The typical values for α are 0.01, 0.05 and 0.10, with α=0.05 the most commonly used value.

Suppose in our weight study we select α=0.05. We need to determine the value of Z that holds 5% of the values above it (see below).

Standard normal distribution curve showing an upper tail at z=1.645 where alpha=0.05

The critical value of Z for α =0.05 is Z = 1.645 (i.e., 5% of the distribution is above Z=1.645). With this value we can set up what is called our decision rule for the test. The rule is to reject H 0 if the Z score is 1.645 or more.

With the first sample we have

Because 2.38 > 1.645, we reject the null hypothesis. (The same conclusion can be drawn by comparing the 0.0087 probability of observing a sample mean as extreme as 197.1 to the level of significance of 0.05. If the observed probability is smaller than the level of significance we reject H 0 ). Because the Z score exceeds the critical value, we conclude that the mean weight for men in 2006 is more than 191 pounds, the value reported in 2002. If we observed the second sample (i.e., sample mean =192.1), we would not be able to reject the null hypothesis because the Z score is 0.43 which is not in the rejection region (i.e., the region in the tail end of the curve above 1.645). With the second sample we do not have sufficient evidence (because we set our level of significance at 5%) to conclude that weights have increased. Again, the same conclusion can be reached by comparing probabilities. The probability of observing a sample mean as extreme as 192.1 is 33.4% which is not below our 5% level of significance.

Hypothesis Testing: Upper-, Lower, and Two Tailed Tests

The procedure for hypothesis testing is based on the ideas described above. Specifically, we set up competing hypotheses, select a random sample from the population of interest and compute summary statistics. We then determine whether the sample data supports the null or alternative hypotheses. The procedure can be broken down into the following five steps.

Step 1. Set up hypotheses and select the level of significance α.

H 0 : Null hypothesis (no change, no difference);

H 1 : Research hypothesis (investigator's belief); α =0.05

Upper-tailed, Lower-tailed, Two-tailed Tests

The research or alternative hypothesis can take one of three forms. An investigator might believe that the parameter has increased, decreased or changed. For example, an investigator might hypothesize:

: μ > μ , where μ is the comparator or null value (e.g., μ =191 in our example about weight in men in 2006) and an increase is hypothesized - this type of test is called an ; : μ < μ , where a decrease is hypothesized and this is called a ; or : μ ≠ μ where a difference is hypothesized and this is called a .

The exact form of the research hypothesis depends on the investigator's belief about the parameter of interest and whether it has possibly increased, decreased or is different from the null value. The research hypothesis is set up by the investigator before any data are collected.

Step 2. Select the appropriate test statistic.

The test statistic is a single number that summarizes the sample information. An example of a test statistic is the Z statistic computed as follows:

When the sample size is small, we will use t statistics (just as we did when constructing confidence intervals for small samples). As we present each scenario, alternative test statistics are provided along with conditions for their appropriate use.

Step 3. Set up decision rule.

The decision rule is a statement that tells under what circumstances to reject the null hypothesis. The decision rule is based on specific values of the test statistic (e.g., reject H 0 if Z > 1.645). The decision rule for a specific test depends on 3 factors: the research or alternative hypothesis, the test statistic and the level of significance. Each is discussed below.

The decision rule depends on whether an upper-tailed, lower-tailed, or two-tailed test is proposed. In an upper-tailed test the decision rule has investigators reject H 0 if the test statistic is larger than the critical value. In a lower-tailed test the decision rule has investigators reject H 0 if the test statistic is smaller than the critical value. In a two-tailed test the decision rule has investigators reject H 0 if the test statistic is extreme, either larger than an upper critical value or smaller than a lower critical value.
The exact form of the test statistic is also important in determining the decision rule. If the test statistic follows the standard normal distribution (Z), then the decision rule will be based on the standard normal distribution. If the test statistic follows the t distribution, then the decision rule will be based on the t distribution. The appropriate critical value will be selected from the t distribution again depending on the specific alternative hypothesis and the level of significance.
The third factor is the level of significance. The level of significance which is selected in Step 1 (e.g., α =0.05) dictates the critical value. For example, in an upper tailed Z test, if α =0.05 then the critical value is Z=1.645.

The following figures illustrate the rejection regions defined by the decision rule for upper-, lower- and two-tailed Z tests with α=0.05. Notice that the rejection regions are in the upper, lower and both tails of the curves, respectively. The decision rules are written below each figure.

Rejection Region for Upper-Tailed Z Test (H : μ > μ ) with α=0.05

The decision rule is: Reject H if Z 1.645.


α	Z
0.10	1.282
0.05	1.645
0.025	1.960
0.010	2.326
0.005	2.576
0.001	3.090
0.0001	3.719

Standard normal distribution with lower tail at -1.645 and alpha=0.05

Rejection Region for Lower-Tailed Z Test (H 1 : μ < μ 0 ) with α =0.05

The decision rule is: Reject H 0 if Z < 1.645.


a	Z
0.10	-1.282
0.05	-1.645
0.025	-1.960
0.010	-2.326
0.005	-2.576
0.001	-3.090
0.0001	-3.719

Standard normal distribution with two tails

Rejection Region for Two-Tailed Z Test (H 1 : μ ≠ μ 0 ) with α =0.05

The decision rule is: Reject H 0 if Z < -1.960 or if Z > 1.960.



0.20	1.282
0.10	1.645
0.05	1.960
0.010	2.576
0.001	3.291
0.0001	3.819

The complete table of critical values of Z for upper, lower and two-tailed tests can be found in the table of Z values to the right in "Other Resources."

Critical values of t for upper, lower and two-tailed tests can be found in the table of t values in "Other Resources."

Step 4. Compute the test statistic.

Here we compute the test statistic by substituting the observed sample data into the test statistic identified in Step 2.

Step 5. Conclusion.

The final conclusion is made by comparing the test statistic (which is a summary of the information observed in the sample) to the decision rule. The final conclusion will be either to reject the null hypothesis (because the sample data are very unlikely if the null hypothesis is true) or not to reject the null hypothesis (because the sample data are not very unlikely).

If the null hypothesis is rejected, then an exact significance level is computed to describe the likelihood of observing the sample data assuming that the null hypothesis is true. The exact level of significance is called the p-value and it will be less than the chosen level of significance if we reject H 0 .

Statistical computing packages provide exact p-values as part of their standard output for hypothesis tests. In fact, when using a statistical computing package, the steps outlined about can be abbreviated. The hypotheses (step 1) should always be set up in advance of any analysis and the significance criterion should also be determined (e.g., α =0.05). Statistical computing packages will produce the test statistic (usually reporting the test statistic as t) and a p-value. The investigator can then determine statistical significance using the following: If p < α then reject H 0 .

Step 1. Set up hypotheses and determine level of significance

H 0 : μ = 191 H 1 : μ > 191 α =0.05

The research hypothesis is that weights have increased, and therefore an upper tailed test is used.

Step 2. Select the appropriate test statistic.

Because the sample size is large (n > 30) the appropriate test statistic is

Step 3. Set up decision rule.

In this example, we are performing an upper tailed test (H 1 : μ> 191), with a Z test statistic and selected α =0.05. Reject H 0 if Z > 1.645.

We now substitute the sample data into the formula for the test statistic identified in Step 2.

We reject H 0 because 2.38 > 1.645. We have statistically significant evidence at a =0.05, to show that the mean weight in men in 2006 is more than 191 pounds. Because we rejected the null hypothesis, we now approximate the p-value which is the likelihood of observing the sample data if the null hypothesis is true. An alternative definition of the p-value is the smallest level of significance where we can still reject H 0 . In this example, we observed Z=2.38 and for α=0.05, the critical value was 1.645. Because 2.38 exceeded 1.645 we rejected H 0 . In our conclusion we reported a statistically significant increase in mean weight at a 5% level of significance. Using the table of critical values for upper tailed tests, we can approximate the p-value. If we select α=0.025, the critical value is 1.96, and we still reject H 0 because 2.38 > 1.960. If we select α=0.010 the critical value is 2.326, and we still reject H 0 because 2.38 > 2.326. However, if we select α=0.005, the critical value is 2.576, and we cannot reject H 0 because 2.38 < 2.576. Therefore, the smallest α where we still reject H 0 is 0.010. This is the p-value. A statistical computing package would produce a more precise p-value which would be in between 0.005 and 0.010. Here we are approximating the p-value and would report p < 0.010.

Type I and Type II Errors

In all tests of hypothesis, there are two types of errors that can be committed. The first is called a Type I error and refers to the situation where we incorrectly reject H 0 when in fact it is true. This is also called a false positive result (as we incorrectly conclude that the research hypothesis is true when in fact it is not). When we run a test of hypothesis and decide to reject H 0 (e.g., because the test statistic exceeds the critical value in an upper tailed test) then either we make a correct decision because the research hypothesis is true or we commit a Type I error. The different conclusions are summarized in the table below. Note that we will never know whether the null hypothesis is really true or false (i.e., we will never know which row of the following table reflects reality).

Table - Conclusions in Test of Hypothesis


is True	Correct Decision	Type I Error
is False	Type II Error	Correct Decision

In the first step of the hypothesis test, we select a level of significance, α, and α= P(Type I error). Because we purposely select a small value for α, we control the probability of committing a Type I error. For example, if we select α=0.05, and our test tells us to reject H 0 , then there is a 5% probability that we commit a Type I error. Most investigators are very comfortable with this and are confident when rejecting H 0 that the research hypothesis is true (as it is the more likely scenario when we reject H 0 ).

When we run a test of hypothesis and decide not to reject H 0 (e.g., because the test statistic is below the critical value in an upper tailed test) then either we make a correct decision because the null hypothesis is true or we commit a Type II error. Beta (β) represents the probability of a Type II error and is defined as follows: β=P(Type II error) = P(Do not Reject H 0 | H 0 is false). Unfortunately, we cannot choose β to be small (e.g., 0.05) to control the probability of committing a Type II error because β depends on several factors including the sample size, α, and the research hypothesis. When we do not reject H 0 , it may be very likely that we are committing a Type II error (i.e., failing to reject H 0 when in fact it is false). Therefore, when tests are run and the null hypothesis is not rejected we often make a weak concluding statement allowing for the possibility that we might be committing a Type II error. If we do not reject H 0 , we conclude that we do not have significant evidence to show that H 1 is true. We do not conclude that H 0 is true.

The most common reason for a Type II error is a small sample size.

Tests with One Sample, Continuous Outcome

Hypothesis testing applications with a continuous outcome variable in a single population are performed according to the five-step procedure outlined above. A key component is setting up the null and research hypotheses. The objective is to compare the mean in a single population to known mean (μ 0 ). The known value is generally derived from another study or report, for example a study in a similar, but not identical, population or a study performed some years ago. The latter is called a historical control. It is important in setting up the hypotheses in a one sample test that the mean specified in the null hypothesis is a fair and reasonable comparator. This will be discussed in the examples that follow.

Test Statistics for Testing H 0 : μ= μ 0

if n > 30
if n < 30

Note that statistical computing packages will use the t statistic exclusively and make the necessary adjustments for comparing the test statistic to appropriate values from probability tables to produce a p-value.

The National Center for Health Statistics (NCHS) published a report in 2005 entitled Health, United States, containing extensive information on major trends in the health of Americans. Data are provided for the US population as a whole and for specific ages, sexes and races. The NCHS report indicated that in 2002 Americans paid an average of $3,302 per year on health care and prescription drugs. An investigator hypothesizes that in 2005 expenditures have decreased primarily due to the availability of generic drugs. To test the hypothesis, a sample of 100 Americans are selected and their expenditures on health care and prescription drugs in 2005 are measured. The sample data are summarized as follows: n=100, x̄

=$3,190 and s=$890. Is there statistical evidence of a reduction in expenditures on health care and prescription drugs in 2005? Is the sample mean of $3,190 evidence of a true reduction in the mean or is it within chance fluctuation? We will run the test using the five-step approach.

Step 1. Set up hypotheses and determine level of significance

H 0 : μ = 3,302 H 1 : μ < 3,302 α =0.05

The research hypothesis is that expenditures have decreased, and therefore a lower-tailed test is used.

This is a lower tailed test, using a Z statistic and a 5% level of significance. Reject H 0 if Z < -1.645.

Step 4. Compute the test statistic.

We do not reject H 0 because -1.26 > -1.645. We do not have statistically significant evidence at α=0.05 to show that the mean expenditures on health care and prescription drugs are lower in 2005 than the mean of $3,302 reported in 2002.

Recall that when we fail to reject H 0 in a test of hypothesis that either the null hypothesis is true (here the mean expenditures in 2005 are the same as those in 2002 and equal to $3,302) or we committed a Type II error (i.e., we failed to reject H 0 when in fact it is false). In summarizing this test, we conclude that we do not have sufficient evidence to reject H 0 . We do not conclude that H 0 is true, because there may be a moderate to high probability that we committed a Type II error. It is possible that the sample size is not large enough to detect a difference in mean expenditures.

The NCHS reported that the mean total cholesterol level in 2002 for all adults was 203. Total cholesterol levels in participants who attended the seventh examination of the Offspring in the Framingham Heart Study are summarized as follows: n=3,310, x̄ =200.3, and s=36.8. Is there statistical evidence of a difference in mean cholesterol levels in the Framingham Offspring?

Here we want to assess whether the sample mean of 200.3 in the Framingham sample is statistically significantly different from 203 (i.e., beyond what we would expect by chance). We will run the test using the five-step approach.

H 0 : μ= 203 H 1 : μ≠ 203 α=0.05

The research hypothesis is that cholesterol levels are different in the Framingham Offspring, and therefore a two-tailed test is used.

Step 3. Set up decision rule.

This is a two-tailed test, using a Z statistic and a 5% level of significance. Reject H 0 if Z < -1.960 or is Z > 1.960.

We reject H 0 because -4.22 ≤ -1. .960. We have statistically significant evidence at α=0.05 to show that the mean total cholesterol level in the Framingham Offspring is different from the national average of 203 reported in 2002. Because we reject H 0 , we also approximate a p-value. Using the two-sided significance levels, p < 0.0001.

Statistical Significance versus Clinical (Practical) Significance

This example raises an important concept of statistical versus clinical or practical significance. From a statistical standpoint, the total cholesterol levels in the Framingham sample are highly statistically significantly different from the national average with p < 0.0001 (i.e., there is less than a 0.01% chance that we are incorrectly rejecting the null hypothesis). However, the sample mean in the Framingham Offspring study is 200.3, less than 3 units different from the national mean of 203. The reason that the data are so highly statistically significant is due to the very large sample size. It is always important to assess both statistical and clinical significance of data. This is particularly relevant when the sample size is large. Is a 3 unit difference in total cholesterol a meaningful difference?

Consider again the NCHS-reported mean total cholesterol level in 2002 for all adults of 203. Suppose a new drug is proposed to lower total cholesterol. A study is designed to evaluate the efficacy of the drug in lowering cholesterol. Fifteen patients are enrolled in the study and asked to take the new drug for 6 weeks. At the end of 6 weeks, each patient's total cholesterol level is measured and the sample statistics are as follows: n=15, x̄ =195.9 and s=28.7. Is there statistical evidence of a reduction in mean total cholesterol in patients after using the new drug for 6 weeks? We will run the test using the five-step approach.

H 0 : μ= 203 H 1 : μ< 203 α=0.05

Step 2. Select the appropriate test statistic.

Because the sample size is small (n<30) the appropriate test statistic is

This is a lower tailed test, using a t statistic and a 5% level of significance. In order to determine the critical value of t, we need degrees of freedom, df, defined as df=n-1. In this example df=15-1=14. The critical value for a lower tailed test with df=14 and a =0.05 is -2.145 and the decision rule is as follows: Reject H 0 if t < -2.145.

We do not reject H 0 because -0.96 > -2.145. We do not have statistically significant evidence at α=0.05 to show that the mean total cholesterol level is lower than the national mean in patients taking the new drug for 6 weeks. Again, because we failed to reject the null hypothesis we make a weaker concluding statement allowing for the possibility that we may have committed a Type II error (i.e., failed to reject H 0 when in fact the drug is efficacious).

This example raises an important issue in terms of study design. In this example we assume in the null hypothesis that the mean cholesterol level is 203. This is taken to be the mean cholesterol level in patients without treatment. Is this an appropriate comparator? Alternative and potentially more efficient study designs to evaluate the effect of the new drug could involve two treatment groups, where one group receives the new drug and the other does not, or we could measure each patient's baseline or pre-treatment cholesterol level and then assess changes from baseline to 6 weeks post-treatment. These designs are also discussed here.

Video - Comparing a Sample Mean to Known Population Mean (8:20)

Link to transcript of the video

Tests with One Sample, Dichotomous Outcome

Hypothesis testing applications with a dichotomous outcome variable in a single population are also performed according to the five-step procedure. Similar to tests for means, a key component is setting up the null and research hypotheses. The objective is to compare the proportion of successes in a single population to a known proportion (p 0 ). That known proportion is generally derived from another study or report and is sometimes called a historical control. It is important in setting up the hypotheses in a one sample test that the proportion specified in the null hypothesis is a fair and reasonable comparator.

In one sample tests for a dichotomous outcome, we set up our hypotheses against an appropriate comparator. We select a sample and compute descriptive statistics on the sample data. Specifically, we compute the sample size (n) and the sample proportion which is computed by taking the ratio of the number of successes to the sample size,

We then determine the appropriate test statistic (Step 2) for the hypothesis test. The formula for the test statistic is given below.

Test Statistic for Testing H 0 : p = p 0

if min(np 0 , n(1-p 0 )) > 5

The formula above is appropriate for large samples, defined when the smaller of np 0 and n(1-p 0 ) is at least 5. This is similar, but not identical, to the condition required for appropriate use of the confidence interval formula for a population proportion, i.e.,

Here we use the proportion specified in the null hypothesis as the true proportion of successes rather than the sample proportion. If we fail to satisfy the condition, then alternative procedures, called exact methods must be used to test the hypothesis about the population proportion.

Example:

The NCHS report indicated that in 2002 the prevalence of cigarette smoking among American adults was 21.1%. Data on prevalent smoking in n=3,536 participants who attended the seventh examination of the Offspring in the Framingham Heart Study indicated that 482/3,536 = 13.6% of the respondents were currently smoking at the time of the exam. Suppose we want to assess whether the prevalence of smoking is lower in the Framingham Offspring sample given the focus on cardiovascular health in that community. Is there evidence of a statistically lower prevalence of smoking in the Framingham Offspring study as compared to the prevalence among all Americans?

H 0 : p = 0.211 H 1 : p < 0.211 α=0.05

We must first check that the sample size is adequate. Specifically, we need to check min(np 0 , n(1-p 0 )) = min( 3,536(0.211), 3,536(1-0.211))=min(746, 2790)=746. The sample size is more than adequate so the following formula can be used:

This is a lower tailed test, using a Z statistic and a 5% level of significance. Reject H 0 if Z < -1.645.

We reject H 0 because -10.93 < -1.645. We have statistically significant evidence at α=0.05 to show that the prevalence of smoking in the Framingham Offspring is lower than the prevalence nationally (21.1%). Here, p < 0.0001.

The NCHS report indicated that in 2002, 75% of children aged 2 to 17 saw a dentist in the past year. An investigator wants to assess whether use of dental services is similar in children living in the city of Boston. A sample of 125 children aged 2 to 17 living in Boston are surveyed and 64 reported seeing a dentist over the past 12 months. Is there a significant difference in use of dental services between children living in Boston and the national data?

Calculate this on your own before checking the answer.

Video - Hypothesis Test for One Sample and a Dichotomous Outcome (3:55)

Tests with Two Independent Samples, Continuous Outcome

There are many applications where it is of interest to compare two independent groups with respect to their mean scores on a continuous outcome. Here we compare means between groups, but rather than generating an estimate of the difference, we will test whether the observed difference (increase, decrease or difference) is statistically significant or not. Remember, that hypothesis testing gives an assessment of statistical significance, whereas estimation gives an estimate of effect and both are important.

Here we discuss the comparison of means when the two comparison groups are independent or physically separate. The two groups might be determined by a particular attribute (e.g., sex, diagnosis of cardiovascular disease) or might be set up by the investigator (e.g., participants assigned to receive an experimental treatment or placebo). The first step in the analysis involves computing descriptive statistics on each of the two samples. Specifically, we compute the sample size, mean and standard deviation in each sample and we denote these summary statistics as follows:

for sample 1:

for sample 2:

The designation of sample 1 and sample 2 is arbitrary. In a clinical trial setting the convention is to call the treatment group 1 and the control group 2. However, when comparing men and women, for example, either group can be 1 or 2.

In the two independent samples application with a continuous outcome, the parameter of interest in the test of hypothesis is the difference in population means, μ 1 -μ 2 . The null hypothesis is always that there is no difference between groups with respect to means, i.e.,

The null hypothesis can also be written as follows: H 0 : μ 1 = μ 2 . In the research hypothesis, an investigator can hypothesize that the first mean is larger than the second (H 1 : μ 1 > μ 2 ), that the first mean is smaller than the second (H 1 : μ 1 < μ 2 ), or that the means are different (H 1 : μ 1 ≠ μ 2 ). The three different alternatives represent upper-, lower-, and two-tailed tests, respectively. The following test statistics are used to test these hypotheses.

Test Statistics for Testing H 0 : μ 1 = μ 2

if n 1 > 30 and n 2 > 30
if n 1 < 30 or n 2 < 30

NOTE: The formulas above assume equal variability in the two populations (i.e., the population variances are equal, or s 1 2 = s 2 2 ). This means that the outcome is equally variable in each of the comparison populations. For analysis, we have samples from each of the comparison populations. If the sample variances are similar, then the assumption about variability in the populations is probably reasonable. As a guideline, if the ratio of the sample variances, s 1 2 /s 2 2 is between 0.5 and 2 (i.e., if one variance is no more than double the other), then the formulas above are appropriate. If the ratio of the sample variances is greater than 2 or less than 0.5 then alternative formulas must be used to account for the heterogeneity in variances.

The test statistics include Sp, which is the pooled estimate of the common standard deviation (again assuming that the variances in the populations are similar) computed as the weighted average of the standard deviations in the samples as follows:

Because we are assuming equal variances between groups, we pool the information on variability (sample variances) to generate an estimate of the variability in the population. Note: Because Sp is a weighted average of the standard deviations in the sample, Sp will always be in between s 1 and s 2 .)

Data measured on n=3,539 participants who attended the seventh examination of the Offspring in the Framingham Heart Study are shown below.


Characteristic	n		S	n		s
Systolic Blood Pressure	1,623	128.2	17.5	1,911	126.5	20.1
Diastolic Blood Pressure	1,622	75.6	9.8	1,910	72.6	9.7
Total Serum Cholesterol	1,544	192.4	35.2	1,766	207.1	36.7
Weight	1,612	194.0	33.8	1,894	157.7	34.6
Height	1,545	68.9	2.7	1,781	63.4	2.5
Body Mass Index	1,545	28.8	4.6	1,781	27.6	5.9

Suppose we now wish to assess whether there is a statistically significant difference in mean systolic blood pressures between men and women using a 5% level of significance.

H 0 : μ 1 = μ 2

H 1 : μ 1 ≠ μ 2 α=0.05

Because both samples are large ( > 30), we can use the Z test statistic as opposed to t. Note that statistical computing packages use t throughout. Before implementing the formula, we first check whether the assumption of equality of population variances is reasonable. The guideline suggests investigating the ratio of the sample variances, s 1 2 /s 2 2 . Suppose we call the men group 1 and the women group 2. Again, this is arbitrary; it only needs to be noted when interpreting the results. The ratio of the sample variances is 17.5 2 /20.1 2 = 0.76, which falls between 0.5 and 2 suggesting that the assumption of equality of population variances is reasonable. The appropriate test statistic is

We now substitute the sample data into the formula for the test statistic identified in Step 2. Before substituting, we will first compute Sp, the pooled estimate of the common standard deviation.

Notice that the pooled estimate of the common standard deviation, Sp, falls in between the standard deviations in the comparison groups (i.e., 17.5 and 20.1). Sp is slightly closer in value to the standard deviation in the women (20.1) as there were slightly more women in the sample. Recall, Sp is a weight average of the standard deviations in the comparison groups, weighted by the respective sample sizes.

Now the test statistic:

We reject H 0 because 2.66 > 1.960. We have statistically significant evidence at α=0.05 to show that there is a difference in mean systolic blood pressures between men and women. The p-value is p < 0.010.

Here again we find that there is a statistically significant difference in mean systolic blood pressures between men and women at p < 0.010. Notice that there is a very small difference in the sample means (128.2-126.5 = 1.7 units), but this difference is beyond what would be expected by chance. Is this a clinically meaningful difference? The large sample size in this example is driving the statistical significance. A 95% confidence interval for the difference in mean systolic blood pressures is: 1.7 + 1.26 or (0.44, 2.96). The confidence interval provides an assessment of the magnitude of the difference between means whereas the test of hypothesis and p-value provide an assessment of the statistical significance of the difference.

Above we performed a study to evaluate a new drug designed to lower total cholesterol. The study involved one sample of patients, each patient took the new drug for 6 weeks and had their cholesterol measured. As a means of evaluating the efficacy of the new drug, the mean total cholesterol following 6 weeks of treatment was compared to the NCHS-reported mean total cholesterol level in 2002 for all adults of 203. At the end of the example, we discussed the appropriateness of the fixed comparator as well as an alternative study design to evaluate the effect of the new drug involving two treatment groups, where one group receives the new drug and the other does not. Here, we revisit the example with a concurrent or parallel control group, which is very typical in randomized controlled trials or clinical trials (refer to the EP713 module on Clinical Trials).

A new drug is proposed to lower total cholesterol. A randomized controlled trial is designed to evaluate the efficacy of the medication in lowering cholesterol. Thirty participants are enrolled in the trial and are randomly assigned to receive either the new drug or a placebo. The participants do not know which treatment they are assigned. Each participant is asked to take the assigned treatment for 6 weeks. At the end of 6 weeks, each patient's total cholesterol level is measured and the sample statistics are as follows.

Treatment
New Drug	15	195.9	28.7
Placebo	15	227.4	30.3

Is there statistical evidence of a reduction in mean total cholesterol in patients taking the new drug for 6 weeks as compared to participants taking placebo? We will run the test using the five-step approach.

H 0 : μ 1 = μ 2 H 1 : μ 1 < μ 2 α=0.05

Because both samples are small (< 30), we use the t test statistic. Before implementing the formula, we first check whether the assumption of equality of population variances is reasonable. The ratio of the sample variances, s 1 2 /s 2 2 =28.7 2 /30.3 2 = 0.90, which falls between 0.5 and 2, suggesting that the assumption of equality of population variances is reasonable. The appropriate test statistic is:

This is a lower-tailed test, using a t statistic and a 5% level of significance. The appropriate critical value can be found in the t Table (in More Resources to the right). In order to determine the critical value of t we need degrees of freedom, df, defined as df=n 1 +n 2 -2 = 15+15-2=28. The critical value for a lower tailed test with df=28 and α=0.05 is -1.701 and the decision rule is: Reject H 0 if t < -1.701.

Now the test statistic,

We reject H 0 because -2.92 < -1.701. We have statistically significant evidence at α=0.05 to show that the mean total cholesterol level is lower in patients taking the new drug for 6 weeks as compared to patients taking placebo, p < 0.005.

The clinical trial in this example finds a statistically significant reduction in total cholesterol, whereas in the previous example where we had a historical control (as opposed to a parallel control group) we did not demonstrate efficacy of the new drug. Notice that the mean total cholesterol level in patients taking placebo is 217.4 which is very different from the mean cholesterol reported among all Americans in 2002 of 203 and used as the comparator in the prior example. The historical control value may not have been the most appropriate comparator as cholesterol levels have been increasing over time. In the next section, we present another design that can be used to assess the efficacy of the new drug.

Video - Comparison of Two Independent Samples With a Continuous Outcome (8:02)

Tests with Matched Samples, Continuous Outcome

In the previous section we compared two groups with respect to their mean scores on a continuous outcome. An alternative study design is to compare matched or paired samples. The two comparison groups are said to be dependent, and the data can arise from a single sample of participants where each participant is measured twice (possibly before and after an intervention) or from two samples that are matched on specific characteristics (e.g., siblings). When the samples are dependent, we focus on difference scores in each participant or between members of a pair and the test of hypothesis is based on the mean difference, μ d . The null hypothesis again reflects "no difference" and is stated as H 0 : μ d =0 . Note that there are some instances where it is of interest to test whether there is a difference of a particular magnitude (e.g., μ d =5) but in most instances the null hypothesis reflects no difference (i.e., μ d =0).

The appropriate formula for the test of hypothesis depends on the sample size. The formulas are shown below and are identical to those we presented for estimating the mean of a single sample presented (e.g., when comparing against an external or historical control), except here we focus on difference scores.

Test Statistics for Testing H 0 : μ d =0

A new drug is proposed to lower total cholesterol and a study is designed to evaluate the efficacy of the drug in lowering cholesterol. Fifteen patients agree to participate in the study and each is asked to take the new drug for 6 weeks. However, before starting the treatment, each patient's total cholesterol level is measured. The initial measurement is a pre-treatment or baseline value. After taking the drug for 6 weeks, each patient's total cholesterol level is measured again and the data are shown below. The rightmost column contains difference scores for each patient, computed by subtracting the 6 week cholesterol level from the baseline level. The differences represent the reduction in total cholesterol over 4 weeks. (The differences could have been computed by subtracting the baseline total cholesterol level from the level measured at 6 weeks. The way in which the differences are computed does not affect the outcome of the analysis only the interpretation.)


1	215	205	10
2	190	156	34
3	230	190	40
4	220	180	40
5	214	201	13
6	240	227	13
7	210	197	13
8	193	173	20
9	210	204	6
10	230	217	13
11	180	142	38
12	260	262	-2
13	210	207	3
14	190	184	6
15	200	193	7

Because the differences are computed by subtracting the cholesterols measured at 6 weeks from the baseline values, positive differences indicate reductions and negative differences indicate increases (e.g., participant 12 increases by 2 units over 6 weeks). The goal here is to test whether there is a statistically significant reduction in cholesterol. Because of the way in which we computed the differences, we want to look for an increase in the mean difference (i.e., a positive reduction). In order to conduct the test, we need to summarize the differences. In this sample, we have

The calculations are shown below.


1	10	100
2	34	1156
3	40	1600
4	40	1600
5	13	169
6	13	169
7	13	169
8	20	400
9	6	36
10	13	169
11	38	1444
12	-2	4
13	3	9
14	6	36
15	7	49

Is there statistical evidence of a reduction in mean total cholesterol in patients after using the new medication for 6 weeks? We will run the test using the five-step approach.

H 0 : μ d = 0 H 1 : μ d > 0 α=0.05

NOTE: If we had computed differences by subtracting the baseline level from the level measured at 6 weeks then negative differences would have reflected reductions and the research hypothesis would have been H 1 : μ d < 0.

Step 2 . Select the appropriate test statistic.

This is an upper-tailed test, using a t statistic and a 5% level of significance. The appropriate critical value can be found in the t Table at the right, with df=15-1=14. The critical value for an upper-tailed test with df=14 and α=0.05 is 2.145 and the decision rule is Reject H 0 if t > 2.145.

We now substitute the sample data into the formula for the test statistic identified in Step 2.

We reject H 0 because 4.61 > 2.145. We have statistically significant evidence at α=0.05 to show that there is a reduction in cholesterol levels over 6 weeks.

Here we illustrate the use of a matched design to test the efficacy of a new drug to lower total cholesterol. We also considered a parallel design (randomized clinical trial) and a study using a historical comparator. It is extremely important to design studies that are best suited to detect a meaningful difference when one exists. There are often several alternatives and investigators work with biostatisticians to determine the best design for each application. It is worth noting that the matched design used here can be problematic in that observed differences may only reflect a "placebo" effect. All participants took the assigned medication, but is the observed reduction attributable to the medication or a result of these participation in a study.

Video - Hypothesis Testing With a Matched Sample and a Continuous Outcome (3:11)

Tests with Two Independent Samples, Dichotomous Outcome

There are several approaches that can be used to test hypotheses concerning two independent proportions. Here we present one approach - the chi-square test of independence is an alternative, equivalent, and perhaps more popular approach to the same analysis. Hypothesis testing with the chi-square test is addressed in the third module in this series: BS704_HypothesisTesting-ChiSquare.

In tests of hypothesis comparing proportions between two independent groups, one test is performed and results can be interpreted to apply to a risk difference, relative risk or odds ratio. As a reminder, the risk difference is computed by taking the difference in proportions between comparison groups, the risk ratio is computed by taking the ratio of proportions, and the odds ratio is computed by taking the ratio of the odds of success in the comparison groups. Because the null values for the risk difference, the risk ratio and the odds ratio are different, the hypotheses in tests of hypothesis look slightly different depending on which measure is used. When performing tests of hypothesis for the risk difference, relative risk or odds ratio, the convention is to label the exposed or treated group 1 and the unexposed or control group 2.

For example, suppose a study is designed to assess whether there is a significant difference in proportions in two independent comparison groups. The test of interest is as follows:

H 0 : p 1 = p 2 versus H 1 : p 1 ≠ p 2 .

The following are the hypothesis for testing for a difference in proportions using the risk difference, the risk ratio and the odds ratio. First, the hypotheses above are equivalent to the following:

For the risk difference, H 0 : p 1 - p 2 = 0 versus H 1 : p 1 - p 2 ≠ 0 which are, by definition, equal to H 0 : RD = 0 versus H 1 : RD ≠ 0.
If an investigator wants to focus on the risk ratio, the equivalent hypotheses are H 0 : RR = 1 versus H 1 : RR ≠ 1.
If the investigator wants to focus on the odds ratio, the equivalent hypotheses are H 0 : OR = 1 versus H 1 : OR ≠ 1.

Suppose a test is performed to test H 0 : RD = 0 versus H 1 : RD ≠ 0 and the test rejects H 0 at α=0.05. Based on this test we can conclude that there is significant evidence, α=0.05, of a difference in proportions, significant evidence that the risk difference is not zero, significant evidence that the risk ratio and odds ratio are not one. The risk difference is analogous to the difference in means when the outcome is continuous. Here the parameter of interest is the difference in proportions in the population, RD = p 1 -p 2 and the null value for the risk difference is zero. In a test of hypothesis for the risk difference, the null hypothesis is always H 0 : RD = 0. This is equivalent to H 0 : RR = 1 and H 0 : OR = 1. In the research hypothesis, an investigator can hypothesize that the first proportion is larger than the second (H 1 : p 1 > p 2 , which is equivalent to H 1 : RD > 0, H 1 : RR > 1 and H 1 : OR > 1), that the first proportion is smaller than the second (H 1 : p 1 < p 2 , which is equivalent to H 1 : RD < 0, H 1 : RR < 1 and H 1 : OR < 1), or that the proportions are different (H 1 : p 1 ≠ p 2 , which is equivalent to H 1 : RD ≠ 0, H 1 : RR ≠ 1 and H 1 : OR ≠

1). The three different alternatives represent upper-, lower- and two-tailed tests, respectively.

The formula for the test of hypothesis for the difference in proportions is given below.

Test Statistics for Testing H 0 : p 1 = p

The formula above is appropriate for large samples, defined as at least 5 successes (np > 5) and at least 5 failures (n(1-p > 5)) in each of the two samples. If there are fewer than 5 successes or failures in either comparison group, then alternative procedures, called exact methods must be used to estimate the difference in population proportions.

The following table summarizes data from n=3,799 participants who attended the fifth examination of the Offspring in the Framingham Heart Study. The outcome of interest is prevalent CVD and we want to test whether the prevalence of CVD is significantly higher in smokers as compared to non-smokers.

	Free of CVD	History of CVD	Total
Non-Smoker	2,757	298	3,055
Current Smoker	663	81	744
Total	3,420	379	3,799

The prevalence of CVD (or proportion of participants with prevalent CVD) among non-smokers is 298/3,055 = 0.0975 and the prevalence of CVD among current smokers is 81/744 = 0.1089. Here smoking status defines the comparison groups and we will call the current smokers group 1 (exposed) and the non-smokers (unexposed) group 2. The test of hypothesis is conducted below using the five step approach.

H 0 : p 1 = p 2 H 1 : p 1 ≠ p 2 α=0.05

Step 2. Select the appropriate test statistic.

We must first check that the sample size is adequate. Specifically, we need to ensure that we have at least 5 successes and 5 failures in each comparison group. In this example, we have more than enough successes (cases of prevalent CVD) and failures (persons free of CVD) in each comparison group. The sample size is more than adequate so the following formula can be used:

Reject H 0 if Z < -1.960 or if Z > 1.960.

We now substitute the sample data into the formula for the test statistic identified in Step 2. We first compute the overall proportion of successes:

We now substitute to compute the test statistic.

Step 5. Conclusion.

We do not reject H 0 because -1.960 < 0.927 < 1.960. We do not have statistically significant evidence at α=0.05 to show that there is a difference in prevalent CVD between smokers and non-smokers.

A 95% confidence interval for the difference in prevalent CVD (or risk difference) between smokers and non-smokers as 0.0114 + 0.0247, or between -0.0133 and 0.0361. Because the 95% confidence interval for the risk difference includes zero we again conclude that there is no statistically significant difference in prevalent CVD between smokers and non-smokers.

Smoking has been shown over and over to be a risk factor for cardiovascular disease. What might explain the fact that we did not observe a statistically significant difference using data from the Framingham Heart Study? HINT: Here we consider prevalent CVD, would the results have been different if we considered incident CVD?

A randomized trial is designed to evaluate the effectiveness of a newly developed pain reliever designed to reduce pain in patients following joint replacement surgery. The trial compares the new pain reliever to the pain reliever currently in use (called the standard of care). A total of 100 patients undergoing joint replacement surgery agreed to participate in the trial. Patients were randomly assigned to receive either the new pain reliever or the standard pain reliever following surgery and were blind to the treatment assignment. Before receiving the assigned treatment, patients were asked to rate their pain on a scale of 0-10 with higher scores indicative of more pain. Each patient was then given the assigned treatment and after 30 minutes was again asked to rate their pain on the same scale. The primary outcome was a reduction in pain of 3 or more scale points (defined by clinicians as a clinically meaningful reduction). The following data were observed in the trial.

New Pain Reliever

0.46

Standard Pain Reliever

0.22

We now test whether there is a statistically significant difference in the proportions of patients reporting a meaningful reduction (i.e., a reduction of 3 or more scale points) using the five step approach.

H 0 : p 1 = p 2 H 1 : p 1 ≠ p 2 α=0.05

Here the new or experimental pain reliever is group 1 and the standard pain reliever is group 2.

We must first check that the sample size is adequate. Specifically, we need to ensure that we have at least 5 successes and 5 failures in each comparison group, i.e.,

In this example, we have min(50(0.46), 50(1-0.46), 50(0.22), 50(1-0.22)) = min(23, 27, 11, 39) = 11. The sample size is adequate so the following formula can be used

We reject H 0 because 2.526 > 1960. We have statistically significant evidence at a =0.05 to show that there is a difference in the proportions of patients on the new pain reliever reporting a meaningful reduction (i.e., a reduction of 3 or more scale points) as compared to patients on the standard pain reliever.

A 95% confidence interval for the difference in proportions of patients on the new pain reliever reporting a meaningful reduction (i.e., a reduction of 3 or more scale points) as compared to patients on the standard pain reliever is 0.24 + 0.18 or between 0.06 and 0.42. Because the 95% confidence interval does not include zero we concluded that there was a statistically significant difference in proportions which is consistent with the test of hypothesis result.

Again, the procedures discussed here apply to applications where there are two independent comparison groups and a dichotomous outcome. There are other applications in which it is of interest to compare a dichotomous outcome in matched or paired samples. For example, in a clinical trial we might wish to test the effectiveness of a new antibiotic eye drop for the treatment of bacterial conjunctivitis. Participants use the new antibiotic eye drop in one eye and a comparator (placebo or active control treatment) in the other. The success of the treatment (yes/no) is recorded for each participant for each eye. Because the two assessments (success or failure) are paired, we cannot use the procedures discussed here. The appropriate test is called McNemar's test (sometimes called McNemar's test for dependent proportions).

Vide0 - Hypothesis Testing With Two Independent Samples and a Dichotomous Outcome (2:55)

Here we presented hypothesis testing techniques for means and proportions in one and two sample situations. Tests of hypothesis involve several steps, including specifying the null and alternative or research hypothesis, selecting and computing an appropriate test statistic, setting up a decision rule and drawing a conclusion. There are many details to consider in hypothesis testing. The first is to determine the appropriate test. We discussed Z and t tests here for different applications. The appropriate test depends on the distribution of the outcome variable (continuous or dichotomous), the number of comparison groups (one, two) and whether the comparison groups are independent or dependent. The following table summarizes the different tests of hypothesis discussed here.

Continuous Outcome, One Sample: H0: μ = μ0
Continuous Outcome, Two Independent Samples: H0: μ1 = μ2
Continuous Outcome, Two Matched Samples: H0: μd = 0
Dichotomous Outcome, One Sample: H0: p = p 0
Dichotomous Outcome, Two Independent Samples: H0: p1 = p2, RD=0, RR=1, OR=1

Once the type of test is determined, the details of the test must be specified. Specifically, the null and alternative hypotheses must be clearly stated. The null hypothesis always reflects the "no change" or "no difference" situation. The alternative or research hypothesis reflects the investigator's belief. The investigator might hypothesize that a parameter (e.g., a mean, proportion, difference in means or proportions) will increase, will decrease or will be different under specific conditions (sometimes the conditions are different experimental conditions and other times the conditions are simply different groups of participants). Once the hypotheses are specified, data are collected and summarized. The appropriate test is then conducted according to the five step approach. If the test leads to rejection of the null hypothesis, an approximate p-value is computed to summarize the significance of the findings. When tests of hypothesis are conducted using statistical computing packages, exact p-values are computed. Because the statistical tables in this textbook are limited, we can only approximate p-values. If the test fails to reject the null hypothesis, then a weaker concluding statement is made for the following reason.

In hypothesis testing, there are two types of errors that can be committed. A Type I error occurs when a test incorrectly rejects the null hypothesis. This is referred to as a false positive result, and the probability that this occurs is equal to the level of significance, α. The investigator chooses the level of significance in Step 1, and purposely chooses a small value such as α=0.05 to control the probability of committing a Type I error. A Type II error occurs when a test fails to reject the null hypothesis when in fact it is false. The probability that this occurs is equal to β. Unfortunately, the investigator cannot specify β at the outset because it depends on several factors including the sample size (smaller samples have higher b), the level of significance (β decreases as a increases), and the difference in the parameter under the null and alternative hypothesis.

We noted in several examples in this chapter, the relationship between confidence intervals and tests of hypothesis. The approaches are different, yet related. It is possible to draw a conclusion about statistical significance by examining a confidence interval. For example, if a 95% confidence interval does not contain the null value (e.g., zero when analyzing a mean difference or risk difference, one when analyzing relative risks or odds ratios), then one can conclude that a two-sided test of hypothesis would reject the null at α=0.05. It is important to note that the correspondence between a confidence interval and test of hypothesis relates to a two-sided test and that the confidence level corresponds to a specific level of significance (e.g., 95% to α=0.05, 90% to α=0.10 and so on). The exact significance of the test, the p-value, can only be determined using the hypothesis testing approach and the p-value provides an assessment of the strength of the evidence and not an estimate of the effect.

Answers to Selected Problems

Dental services problem - bottom of page 5.

Step 1: Set up hypotheses and determine the level of significance.

α=0.05

Step 2: Select the appropriate test statistic.

First, determine whether the sample size is adequate.

Therefore the sample size is adequate, and we can use the following formula:

Step 3: Set up the decision rule.

Reject H0 if Z is less than or equal to -1.96 or if Z is greater than or equal to 1.96.

Step 4: Compute the test statistic
Step 5: Conclusion.

We reject the null hypothesis because -6.15<-1.96. Therefore there is a statistically significant difference in the proportion of children in Boston using dental services compated to the national proportion.

FOR INSTRUCTOR
FOR INSTRUCTORS

8.4.3 Hypothesis Testing for the Mean

$\quad$ $H_0$: $\mu=\mu_0$, $\quad$ $H_1$: $\mu \neq \mu_0$.

$\quad$ $H_0$: $\mu \leq \mu_0$, $\quad$ $H_1$: $\mu > \mu_0$.

$\quad$ $H_0$: $\mu \geq \mu_0$, $\quad$ $H_1$: $\mu \lt \mu_0$.

Two-sided Tests for the Mean:

Therefore, we can suggest the following test. Choose a threshold, and call it $c$. If $|W| \leq c$, accept $H_0$, and if $|W|>c$, accept $H_1$. How do we choose $c$? If $\alpha$ is the required significance level, we must have

As discussed above, we let \begin{align}%\label{} W(X_1,X_2, \cdots,X_n)=\frac{\overline{X}-\mu_0}{\sigma / \sqrt{n}}. \end{align} Note that, assuming $H_0$, $W \sim N(0,1)$. We will choose a threshold, $c$. If $|W| \leq c$, we accept $H_0$, and if $|W|>c$, accept $H_1$. To choose $c$, we let \begin{align} P(|W| > c \; | \; H_0) =\alpha. \end{align} Since the standard normal PDF is symmetric around $0$, we have \begin{align} P(|W| > c \; | \; H_0) = 2 P(W>c | \; H_0). \end{align} Thus, we conclude $P(W>c | \; H_0)=\frac{\alpha}{2}$. Therefore, \begin{align} c=z_{\frac{\alpha}{2}}. \end{align} Therefore, we accept $H_0$ if \begin{align} \left|\frac{\overline{X}-\mu_0}{\sigma / \sqrt{n}} \right| \leq z_{\frac{\alpha}{2}}, \end{align} and reject it otherwise.
We have \begin{align} \beta (\mu) &=P(\textrm{type II error}) = P(\textrm{accept }H_0 \; | \; \mu) \\ &= P\left(\left|\frac{\overline{X}-\mu_0}{\sigma / \sqrt{n}} \right| \lt z_{\frac{\alpha}{2}}\; | \; \mu \right). \end{align} If $X_i \sim N(\mu,\sigma^2)$, then $\overline{X} \sim N(\mu, \frac{\sigma^2}{n})$. Thus, \begin{align} \beta (\mu)&=P\left(\left|\frac{\overline{X}-\mu_0}{\sigma / \sqrt{n}} \right| \lt z_{\frac{\alpha}{2}}\; | \; \mu \right)\\ &=P\left(\mu_0- z_{\frac{\alpha}{2}} \frac{\sigma}{\sqrt{n}} \leq \overline{X} \leq \mu_0+ z_{\frac{\alpha}{2}} \frac{\sigma}{\sqrt{n}}\right)\\ &=\Phi\left(z_{\frac{\alpha}{2}}+\frac{\mu_0-\mu}{\sigma / \sqrt{n}}\right)-\Phi\left(-z_{\frac{\alpha}{2}}+\frac{\mu_0-\mu}{\sigma / \sqrt{n}}\right). \end{align}
Let $S^2$ be the sample variance for this random sample. Then, the random variable $W$ defined as \begin{equation} W(X_1,X_2, \cdots, X_n)=\frac{\overline{X}-\mu_0}{S / \sqrt{n}} \end{equation} has a $t$-distribution with $n-1$ degrees of freedom, i.e., $W \sim T(n-1)$. Thus, we can repeat the analysis of Example 8.24 here. The only difference is that we need to replace $\sigma$ by $S$ and $z_{\frac{\alpha}{2}}$ by $t_{\frac{\alpha}{2},n-1}$. Therefore, we accept $H_0$ if \begin{align} |W| \leq t_{\frac{\alpha}{2},n-1}, \end{align} and reject it otherwise. Let us look at a numerical example of this case.

$\quad$ $H_0$: $\mu=170$, $\quad$ $H_1$: $\mu \neq 170$.

Let's first compute the sample mean and the sample standard deviation. The sample mean is \begin{align}%\label{} \overline{X}&=\frac{X_1+X_2+X_3+X_4+X_5+X_6+X_7+X_8+X_9}{9}\\ &=165.8 \end{align} The sample variance is given by \begin{align}%\label{} {S}^2=\frac{1}{9-1} \sum_{k=1}^9 (X_k-\overline{X})^2&=68.01 \end{align} The sample standard deviation is given by \begin{align}%\label{} S&= \sqrt{S^2}=8.25 \end{align} The following MATLAB code can be used to obtain these values: x=[176.2,157.9,160.1,180.9,165.1,167.2,162.9,155.7,166.2]; m=mean(x); v=var(x); s=std(x); Now, our test statistic is \begin{align} W(X_1,X_2, \cdots, X_9)&=\frac{\overline{X}-\mu_0}{S / \sqrt{n}}\\ &=\frac{165.8-170}{8.25 / 3}=-1.52 \end{align} Thus, $|W|=1.52$. Also, we have \begin{align} t_{\frac{\alpha}{2},n-1} = t_{0.025,8} \approx 2.31 \end{align} The above value can be obtained in MATLAB using the command $\mathtt{tinv(0.975,8)}$. Thus, we conclude \begin{align} |W| \leq t_{\frac{\alpha}{2},n-1}. \end{align} Therefore, we accept $H_0$. In other words, we do not have enough evidence to conclude that the average height in the city is different from the average height in the country.

Let us summarize what we have obtained for the two-sided test for the mean.

Case	Test Statistic	Acceptance Region
$X_i \sim N(\mu, \sigma^2)$, $\sigma$ known	$W=\frac{\overline{X}-\mu_0}{\sigma / \sqrt{n}}$	$\|W\| \leq z_{\frac{\alpha}{2}}$
$n$ large, $X_i$ non-normal	$W=\frac{\overline{X}-\mu_0}{S / \sqrt{n}}$	$\|W\| \leq z_{\frac{\alpha}{2}}$
$X_i \sim N(\mu, \sigma^2)$, $\sigma$ unknown	$W=\frac{\overline{X}-\mu_0}{S / \sqrt{n}}$	$\|W\| \leq t_{\frac{\alpha}{2},n-1}$

One-sided Tests for the Mean:

As before, we define the test statistic as \begin{align}%\label{} W(X_1,X_2, \cdots,X_n)=\frac{\overline{X}-\mu_0}{\sigma / \sqrt{n}}. \end{align} If $H_0$ is true (i.e., $\mu \leq \mu_0$), we expect $\overline{X}$ (and thus $W$) to be relatively small, while if $H_1$ is true, we expect $\overline{X}$ (and thus $W$) to be larger. This suggests the following test: Choose a threshold, and call it $c$. If $W \leq c$, accept $H_0$, and if $W>c$, accept $H_1$. How do we choose $c$? If $\alpha$ is the required significance level, we must have \begin{align} P(\textrm{type I error}) &= P(\textrm{Reject }H_0 \; | \; H_0) \\ &= P(W > c \; | \; \mu \leq \mu_0) \leq \alpha. \end{align} Here, the probability of type I error depends on $\mu$. More specifically, for any $\mu \leq \mu_0$, we can write \begin{align} P(\textrm{type I error} \; | \; \mu) &= P(\textrm{Reject }H_0 \; | \; \mu) \\ &= P(W > c \; | \; \mu)\\ &=P \left(\frac{\overline{X}-\mu_0}{\sigma / \sqrt{n}}> c \; | \; \mu\right)\\ &=P \left(\frac{\overline{X}-\mu}{\sigma / \sqrt{n}}+\frac{\mu-\mu_0}{\sigma / \sqrt{n}}> c \; | \; \mu\right)\\ &=P \left(\frac{\overline{X}-\mu}{\sigma / \sqrt{n}}> c+\frac{\mu_0-\mu}{\sigma / \sqrt{n}} \; | \; \mu\right)\\ &\leq P \left(\frac{\overline{X}-\mu}{\sigma / \sqrt{n}}> c \; | \; \mu\right) \quad (\textrm{ since }\mu \leq \mu_0)\\ &=1-\Phi(c) \quad \big(\textrm{ since given }\mu, \frac{\overline{X}-\mu}{\sigma / \sqrt{n}} \sim N(0,1) \big). \end{align} Thus, we can choose $\alpha=1-\Phi(c)$, which results in \begin{align} c=z_{\alpha}. \end{align} Therefore, we accept $H_0$ if \begin{align} \frac{\overline{X}-\mu_0}{\sigma / \sqrt{n}} \leq z_{\alpha}, \end{align} and reject it otherwise.

Case	Test Statistic	Acceptance Region
$X_i \sim N(\mu, \sigma^2)$, $\sigma$ known	$W=\frac{\overline{X}-\mu_0}{\sigma / \sqrt{n}}$	$W \leq z_{\alpha}$
$n$ large, $X_i$ non-normal	$W=\frac{\overline{X}-\mu_0}{S / \sqrt{n}}$	$W \leq z_{\alpha}$
$X_i \sim N(\mu, \sigma^2)$, $\sigma$ unknown	$W=\frac{\overline{X}-\mu_0}{S / \sqrt{n}}$	$W \leq t_{\alpha,n-1}$

$\quad$ $H_0$: $\mu \geq \mu_0$, $\quad$ $H_1$: $\mu \lt \mu_0$,

Case	Test Statistic	Acceptance Region
$X_i \sim N(\mu, \sigma^2)$, $\sigma$ known	$W=\frac{\overline{X}-\mu_0}{\sigma / \sqrt{n}}$	$W \geq -z_{\alpha}$
$n$ large, $X_i$ non-normal	$W=\frac{\overline{X}-\mu_0}{S / \sqrt{n}}$	$W \geq -z_{\alpha}$
$X_i \sim N(\mu, \sigma^2)$, $\sigma$ unknown	$W=\frac{\overline{X}-\mu_0}{S / \sqrt{n}}$	$W \geq -t_{\alpha,n-1}$

The print version of the book is available on .

Hypothesis Testing

Hypothesis testing is a tool for making statistical inferences about the population data. It is an analysis tool that tests assumptions and determines how likely something is within a given standard of accuracy. Hypothesis testing provides a way to verify whether the results of an experiment are valid.

A null hypothesis and an alternative hypothesis are set up before performing the hypothesis testing. This helps to arrive at a conclusion regarding the sample obtained from the population. In this article, we will learn more about hypothesis testing, its types, steps to perform the testing, and associated examples.

1.
2.
3.
4.
5.
6.
7.
8.

What is Hypothesis Testing in Statistics?

Hypothesis testing uses sample data from the population to draw useful conclusions regarding the population probability distribution . It tests an assumption made about the data using different types of hypothesis testing methodologies. The hypothesis testing results in either rejecting or not rejecting the null hypothesis.

Hypothesis Testing Definition

Hypothesis testing can be defined as a statistical tool that is used to identify if the results of an experiment are meaningful or not. It involves setting up a null hypothesis and an alternative hypothesis. These two hypotheses will always be mutually exclusive. This means that if the null hypothesis is true then the alternative hypothesis is false and vice versa. An example of hypothesis testing is setting up a test to check if a new medicine works on a disease in a more efficient manner.

Null Hypothesis

The null hypothesis is a concise mathematical statement that is used to indicate that there is no difference between two possibilities. In other words, there is no difference between certain characteristics of data. This hypothesis assumes that the outcomes of an experiment are based on chance alone. It is denoted as $H_{0}$. Hypothesis testing is used to conclude if the null hypothesis can be rejected or not. Suppose an experiment is conducted to check if girls are shorter than boys at the age of 5. The null hypothesis will say that they are the same height.

Alternative Hypothesis

The alternative hypothesis is an alternative to the null hypothesis. It is used to show that the observations of an experiment are due to some real effect. It indicates that there is a statistical significance between two possible outcomes and can be denoted as $H_{1}$ or $H_{a}$. For the above-mentioned example, the alternative hypothesis would be that girls are shorter than boys at the age of 5.

Hypothesis Testing P Value

In hypothesis testing, the p value is used to indicate whether the results obtained after conducting a test are statistically significant or not. It also indicates the probability of making an error in rejecting or not rejecting the null hypothesis.This value is always a number between 0 and 1. The p value is compared to an alpha level, $\alpha$ or significance level. The alpha level can be defined as the acceptable risk of incorrectly rejecting the null hypothesis. The alpha level is usually chosen between 1% to 5%.

Hypothesis Testing Critical region

All sets of values that lead to rejecting the null hypothesis lie in the critical region. Furthermore, the value that separates the critical region from the non-critical region is known as the critical value.

Hypothesis Testing Formula

Depending upon the type of data available and the size, different types of hypothesis testing are used to determine whether the null hypothesis can be rejected or not. The hypothesis testing formula for some important test statistics are given below:

z = $\frac{\overline{x}-\mu}{\frac{\sigma}{\sqrt{n}}}$. $\overline{x}$ is the sample mean, $\mu$ is the population mean, $\sigma$ is the population standard deviation and n is the size of the sample.
t = $\frac{\overline{x}-\mu}{\frac{s}{\sqrt{n}}}$. s is the sample standard deviation.
$\chi ^{2} = \sum \frac{(O_{i}-E_{i})^{2}}{E_{i}}$. $O_{i}$ is the observed value and $E_{i}$ is the expected value.

We will learn more about these test statistics in the upcoming section.

Types of Hypothesis Testing

Selecting the correct test for performing hypothesis testing can be confusing. These tests are used to determine a test statistic on the basis of which the null hypothesis can either be rejected or not rejected. Some of the important tests used for hypothesis testing are given below.

Hypothesis Testing Z Test

A z test is a way of hypothesis testing that is used for a large sample size (n ≥ 30). It is used to determine whether there is a difference between the population mean and the sample mean when the population standard deviation is known. It can also be used to compare the mean of two samples. It is used to compute the z test statistic. The formulas are given as follows:

One sample: z = $\frac{\overline{x}-\mu}{\frac{\sigma}{\sqrt{n}}}$.
Two samples: z = $\frac{(\overline{x_{1}}-\overline{x_{2}})-(\mu_{1}-\mu_{2})}{\sqrt{\frac{\sigma_{1}^{2}}{n_{1}}+\frac{\sigma_{2}^{2}}{n_{2}}}}$.

Hypothesis Testing t Test

The t test is another method of hypothesis testing that is used for a small sample size (n < 30). It is also used to compare the sample mean and population mean. However, the population standard deviation is not known. Instead, the sample standard deviation is known. The mean of two samples can also be compared using the t test.

One sample: t = $\frac{\overline{x}-\mu}{\frac{s}{\sqrt{n}}}$.
Two samples: t = $\frac{(\overline{x_{1}}-\overline{x_{2}})-(\mu_{1}-\mu_{2})}{\sqrt{\frac{s_{1}^{2}}{n_{1}}+\frac{s_{2}^{2}}{n_{2}}}}$.

Hypothesis Testing Chi Square

The Chi square test is a hypothesis testing method that is used to check whether the variables in a population are independent or not. It is used when the test statistic is chi-squared distributed.

One Tailed Hypothesis Testing

One tailed hypothesis testing is done when the rejection region is only in one direction. It can also be known as directional hypothesis testing because the effects can be tested in one direction only. This type of testing is further classified into the right tailed test and left tailed test.

Right Tailed Hypothesis Testing

The right tail test is also known as the upper tail test. This test is used to check whether the population parameter is greater than some value. The null and alternative hypotheses for this test are given as follows:

$H_{0}$: The population parameter is ≤ some value

$H_{1}$: The population parameter is > some value.

If the test statistic has a greater value than the critical value then the null hypothesis is rejected

Left Tailed Hypothesis Testing

The left tail test is also known as the lower tail test. It is used to check whether the population parameter is less than some value. The hypotheses for this hypothesis testing can be written as follows:

$H_{0}$: The population parameter is ≥ some value

$H_{1}$: The population parameter is < some value.

The null hypothesis is rejected if the test statistic has a value lesser than the critical value.

Two Tailed Hypothesis Testing

In this hypothesis testing method, the critical region lies on both sides of the sampling distribution. It is also known as a non - directional hypothesis testing method. The two-tailed test is used when it needs to be determined if the population parameter is assumed to be different than some value. The hypotheses can be set up as follows:

$H_{0}$: the population parameter = some value

$H_{1}$: the population parameter ≠ some value

The null hypothesis is rejected if the test statistic has a value that is not equal to the critical value.

Hypothesis Testing Steps

Hypothesis testing can be easily performed in five simple steps. The most important step is to correctly set up the hypotheses and identify the right method for hypothesis testing. The basic steps to perform hypothesis testing are as follows:

Step 1: Set up the null hypothesis by correctly identifying whether it is the left-tailed, right-tailed, or two-tailed hypothesis testing.
Step 2: Set up the alternative hypothesis.
Step 3: Choose the correct significance level, $\alpha$, and find the critical value.
Step 4: Calculate the correct test statistic (z, t or $\chi$) and p-value.
Step 5: Compare the test statistic with the critical value or compare the p-value with $\alpha$ to arrive at a conclusion. In other words, decide if the null hypothesis is to be rejected or not.

Hypothesis Testing Example

The best way to solve a problem on hypothesis testing is by applying the 5 steps mentioned in the previous section. Suppose a researcher claims that the mean average weight of men is greater than 100kgs with a standard deviation of 15kgs. 30 men are chosen with an average weight of 112.5 Kgs. Using hypothesis testing, check if there is enough evidence to support the researcher's claim. The confidence interval is given as 95%.

Step 1: This is an example of a right-tailed test. Set up the null hypothesis as $H_{0}$: $\mu$ = 100.

Step 2: The alternative hypothesis is given by $H_{1}$: $\mu$ > 100.

Step 3: As this is a one-tailed test, $\alpha$ = 100% - 95% = 5%. This can be used to determine the critical value.

1 - $\alpha$ = 1 - 0.05 = 0.95

0.95 gives the required area under the curve. Now using a normal distribution table, the area 0.95 is at z = 1.645. A similar process can be followed for a t-test. The only additional requirement is to calculate the degrees of freedom given by n - 1.

Step 4: Calculate the z test statistic. This is because the sample size is 30. Furthermore, the sample and population means are known along with the standard deviation.

z = $\frac{\overline{x}-\mu}{\frac{\sigma}{\sqrt{n}}}$.

$\mu$ = 100, $\overline{x}$ = 112.5, n = 30, $\sigma$ = 15

z = $\frac{112.5-100}{\frac{15}{\sqrt{30}}}$ = 4.56

Step 5: Conclusion. As 4.56 > 1.645 thus, the null hypothesis can be rejected.

Hypothesis Testing and Confidence Intervals

Confidence intervals form an important part of hypothesis testing. This is because the alpha level can be determined from a given confidence interval. Suppose a confidence interval is given as 95%. Subtract the confidence interval from 100%. This gives 100 - 95 = 5% or 0.05. This is the alpha value of a one-tailed hypothesis testing. To obtain the alpha value for a two-tailed hypothesis testing, divide this value by 2. This gives 0.05 / 2 = 0.025.

Probability and Statistics
Data Handling

Important Notes on Hypothesis Testing

Hypothesis testing is a technique that is used to verify whether the results of an experiment are statistically significant.
It involves the setting up of a null hypothesis and an alternate hypothesis.
There are three types of tests that can be conducted under hypothesis testing - z test, t test, and chi square test.
Hypothesis testing can be classified as right tail, left tail, and two tail tests.

Examples on Hypothesis Testing

Example 1: The average weight of a dumbbell in a gym is 90lbs. However, a physical trainer believes that the average weight might be higher. A random sample of 5 dumbbells with an average weight of 110lbs and a standard deviation of 18lbs. Using hypothesis testing check if the physical trainer's claim can be supported for a 95% confidence level. Solution: As the sample size is lesser than 30, the t-test is used. $H_{0}$: $\mu$ = 90, $H_{1}$: $\mu$ > 90 $\overline{x}$ = 110, $\mu$ = 90, n = 5, s = 18. $\alpha$ = 0.05 Using the t-distribution table, the critical value is 2.132 t = $\frac{\overline{x}-\mu}{\frac{s}{\sqrt{n}}}$ t = 2.484 As 2.484 > 2.132, the null hypothesis is rejected. Answer: The average weight of the dumbbells may be greater than 90lbs
Example 2: The average score on a test is 80 with a standard deviation of 10. With a new teaching curriculum introduced it is believed that this score will change. On random testing, the score of 38 students, the mean was found to be 88. With a 0.05 significance level, is there any evidence to support this claim? Solution: This is an example of two-tail hypothesis testing. The z test will be used. $H_{0}$: $\mu$ = 80, $H_{1}$: $\mu$ ≠ 80 $\overline{x}$ = 88, $\mu$ = 80, n = 36, $\sigma$ = 10. $\alpha$ = 0.05 / 2 = 0.025 The critical value using the normal distribution table is 1.96 z = $\frac{\overline{x}-\mu}{\frac{\sigma}{\sqrt{n}}}$ z = $\frac{88-80}{\frac{10}{\sqrt{36}}}$ = 4.8 As 4.8 > 1.96, the null hypothesis is rejected. Answer: There is a difference in the scores after the new curriculum was introduced.
Example 3: The average score of a class is 90. However, a teacher believes that the average score might be lower. The scores of 6 students were randomly measured. The mean was 82 with a standard deviation of 18. With a 0.05 significance level use hypothesis testing to check if this claim is true. Solution: The t test will be used. $H_{0}$: $\mu$ = 90, $H_{1}$: $\mu$ < 90 $\overline{x}$ = 110, $\mu$ = 90, n = 6, s = 18 The critical value from the t table is -2.015 t = $\frac{\overline{x}-\mu}{\frac{s}{\sqrt{n}}}$ t = $\frac{82-90}{\frac{18}{\sqrt{6}}}$ t = -1.088 As -1.088 > -2.015, we fail to reject the null hypothesis. Answer: There is not enough evidence to support the claim.

go to slide go to slide go to slide

Book a Free Trial Class

FAQs on Hypothesis Testing

What is hypothesis testing.

Hypothesis testing in statistics is a tool that is used to make inferences about the population data. It is also used to check if the results of an experiment are valid.

What is the z Test in Hypothesis Testing?

The z test in hypothesis testing is used to find the z test statistic for normally distributed data . The z test is used when the standard deviation of the population is known and the sample size is greater than or equal to 30.

What is the t Test in Hypothesis Testing?

The t test in hypothesis testing is used when the data follows a student t distribution . It is used when the sample size is less than 30 and standard deviation of the population is not known.

What is the formula for z test in Hypothesis Testing?

The formula for a one sample z test in hypothesis testing is z = $\frac{\overline{x}-\mu}{\frac{\sigma}{\sqrt{n}}}$ and for two samples is z = $\frac{(\overline{x_{1}}-\overline{x_{2}})-(\mu_{1}-\mu_{2})}{\sqrt{\frac{\sigma_{1}^{2}}{n_{1}}+\frac{\sigma_{2}^{2}}{n_{2}}}}$.

What is the p Value in Hypothesis Testing?

The p value helps to determine if the test results are statistically significant or not. In hypothesis testing, the null hypothesis can either be rejected or not rejected based on the comparison between the p value and the alpha level.

What is One Tail Hypothesis Testing?

When the rejection region is only on one side of the distribution curve then it is known as one tail hypothesis testing. The right tail test and the left tail test are two types of directional hypothesis testing.

What is the Alpha Level in Two Tail Hypothesis Testing?

To get the alpha level in a two tail hypothesis testing divide $\alpha$ by 2. This is done as there are two rejection regions in the curve.

Statistics Tutorial

Descriptive statistics, inferential statistics, stat reference, statistics - hypothesis testing a mean.

A population mean is an average of value a population.

Hypothesis tests are used to check a claim about the size of that population mean.

Hypothesis Testing a Mean

The following steps are used for a hypothesis test:

Check the conditions
Define the claims
Decide the significance level
Calculate the test statistic

For example:

Population : Nobel Prize winners
Category : Age when they received the prize.

And we want to check the claim:

"The average age of Nobel Prize winners when they received the prize is more than 55"

By taking a sample of 30 randomly selected Nobel Prize winners we could find that:

The mean age in the sample ($\bar{x}$) is 62.1

The standard deviation of age in the sample ($s$) is 13.46

From this sample data we check the claim with the steps below.

1. Checking the Conditions

The conditions for calculating a confidence interval for a proportion are:

The sample is randomly selected
The population data is normally distributed
Sample size is large enough

A moderately large sample size, like 30, is typically large enough.

In the example, the sample size was 30 and it was randomly selected, so the conditions are fulfilled.

Note: Checking if the data is normally distributed can be done with specialized statistical tests.

2. Defining the Claims

We need to define a null hypothesis ($H_{0}$) and an alternative hypothesis ($H_{1}$) based on the claim we are checking.

The claim was:

In this case, the parameter is the mean age of Nobel Prize winners when they received the prize ($\mu$).

The null and alternative hypothesis are then:

Null hypothesis : The average age was 55.

Alternative hypothesis : The average age was more than 55.

Which can be expressed with symbols as:

$H_{0}$: $\mu = 55 $

$H_{1}$: $\mu > 55 $

This is a ' right tailed' test, because the alternative hypothesis claims that the proportion is more than in the null hypothesis.

If the data supports the alternative hypothesis, we reject the null hypothesis and accept the alternative hypothesis.

3. Deciding the Significance Level

The significance level ($\alpha$) is the uncertainty we accept when rejecting the null hypothesis in a hypothesis test.

The significance level is a percentage probability of accidentally making the wrong conclusion.

Typical significance levels are:

$\alpha = 0.1$ (10%)
$\alpha = 0.05$ (5%)
$\alpha = 0.01$ (1%)

A lower significance level means that the evidence in the data needs to be stronger to reject the null hypothesis.

There is no "correct" significance level - it only states the uncertainty of the conclusion.

Note: A 5% significance level means that when we reject a null hypothesis:

We expect to reject a true null hypothesis 5 out of 100 times.

4. Calculating the Test Statistic

The test statistic is used to decide the outcome of the hypothesis test.

The test statistic is a standardized value calculated from the sample.

The formula for the test statistic (TS) of a population mean is:

$\displaystyle \frac{\bar{x} - \mu}{s} \cdot \sqrt{n} $

$\bar{x}-\mu$ is the difference between the sample mean ($\bar{x}$) and the claimed population mean ($\mu$).

$s$ is the sample standard deviation .

$n$ is the sample size.

In our example:

The claimed ($H_{0}$) population mean ($\mu$) was $ 55 $

The sample mean ($\bar{x}$) was $62.1$

The sample standard deviation ($s$) was $13.46$

The sample size ($n$) was $30$

So the test statistic (TS) is then:

$\displaystyle \frac{62.1-55}{13.46} \cdot \sqrt{30} = \frac{7.1}{13.46} \cdot \sqrt{30} \approx 0.528 \cdot 5.477 = \underline{2.889}$

You can also calculate the test statistic using programming language functions:

With Python use the scipy and math libraries to calculate the test statistic.

With R use built-in math and statistics functions to calculate the test statistic.

5. Concluding

There are two main approaches for making the conclusion of a hypothesis test:

The critical value approach compares the test statistic with the critical value of the significance level.
The P-value approach compares the P-value of the test statistic and with the significance level.

Note: The two approaches are only different in how they present the conclusion.

The Critical Value Approach

For the critical value approach we need to find the critical value (CV) of the significance level ($\alpha$).

For a population mean test, the critical value (CV) is a T-value from a student's t-distribution .

This critical T-value (CV) defines the rejection region for the test.

The rejection region is an area of probability in the tails of the standard normal distribution.

Because the claim is that the population mean is more than 55, the rejection region is in the right tail:

The student's t-distribution is adjusted for the uncertainty from smaller samples.

This adjustment is called degrees of freedom (df), which is the sample size $(n) - 1$

In this case the degrees of freedom (df) is: $30 - 1 = \underline{29} $

Choosing a significance level ($\alpha$) of 0.01, or 1%, we can find the critical T-value from a T-table , or with a programming language function:

With Python use the Scipy Stats library t.ppf() function find the T-Value for an $\alpha$ = 0.01 at 29 degrees of freedom (df).

With R use the built-in qt() function to find the t-value for an $\alpha$ = 0.01 at 29 degrees of freedom (df).

Using either method we can find that the critical T-Value is $\approx \underline{2.462}$

For a right tailed test we need to check if the test statistic (TS) is bigger than the critical value (CV).

If the test statistic is bigger than the critical value, the test statistic is in the rejection region .

When the test statistic is in the rejection region, we reject the null hypothesis ($H_{0}$).

Here, the test statistic (TS) was $\approx \underline{2.889}$ and the critical value was $\approx \underline{2.462}$

Here is an illustration of this test in a graph:

Since the test statistic was bigger than the critical value we reject the null hypothesis.

This means that the sample data supports the alternative hypothesis.

And we can summarize the conclusion stating:

The sample data supports the claim that "The average age of Nobel Prize winners when they received the prize is more than 55" at a 1% significance level .

The P-Value Approach

For the P-value approach we need to find the P-value of the test statistic (TS).

If the P-value is smaller than the significance level ($\alpha$), we reject the null hypothesis ($H_{0}$).

The test statistic was found to be $ \approx \underline{2.889} $

For a population proportion test, the test statistic is a T-Value from a student's t-distribution .

Because this is a right tailed test, we need to find the P-value of a t-value bigger than 2.889.

The student's t-distribution is adjusted according to degrees of freedom (df), which is the sample size $(30) - 1 = \underline{29}$

We can find the P-value using a T-table , or with a programming language function:

With Python use the Scipy Stats library t.cdf() function find the P-value of a T-value bigger than 2.889 at 29 degrees of freedom (df):

With R use the built-in pt() function find the P-value of a T-Value bigger than 2.889 at 29 degrees of freedom (df):

Using either method we can find that the P-value is $\approx \underline{0.0036}$

This tells us that the significance level ($\alpha$) would need to be bigger than 0.0036, or 0.36%, to reject the null hypothesis.

This P-value is smaller than any of the common significance levels (10%, 5%, 1%).

So the null hypothesis is rejected at all of these significance levels.

The sample data supports the claim that "The average age of Nobel Prize winners when they received the prize is more than 55" at a 10%, 5%, or 1% significance level .

Note: An outcome of an hypothesis test that rejects the null hypothesis with a p-value of 0.36% means:

For this p-value, we only expect to reject a true null hypothesis 36 out of 10000 times.

Calculating a P-Value for a Hypothesis Test with Programming

Many programming languages can calculate the P-value to decide outcome of a hypothesis test.

Using software and programming to calculate statistics is more common for bigger sets of data, as calculating manually becomes difficult.

The P-value calculated here will tell us the lowest possible significance level where the null-hypothesis can be rejected.

With Python use the scipy and math libraries to calculate the P-value for a right tailed hypothesis test for a mean.

Here, the sample size is 30, the sample mean is 62.1, the sample standard deviation is 13.46, and the test is for a mean bigger than 55.

With R use built-in math and statistics functions find the P-value for a right tailed hypothesis test for a mean.

Left-Tailed and Two-Tailed Tests

This was an example of a right tailed test, where the alternative hypothesis claimed that parameter is bigger than the null hypothesis claim.

You can check out an equivalent step-by-step guide for other types here:

Left-Tailed Test
Two-Tailed Test

COLOR PICKER

Contact Sales

If you want to use W3Schools services as an educational institution, team or enterprise, send us an e-mail: [email protected]

Report Error

If you want to report an error, or if you want to make a suggestion, send us an e-mail: [email protected]

Tutorial Playlist

Statistics tutorial, everything you need to know about the probability density function in statistics, the best guide to understand central limit theorem, an in-depth guide to measures of central tendency : mean, median and mode, the ultimate guide to understand conditional probability.

A Comprehensive Look at Percentile in Statistics

The Best Guide to Understand Bayes Theorem

Everything you need to know about the normal distribution, an in-depth explanation of cumulative distribution function, chi-square test, what is hypothesis testing in statistics types and examples, understanding the fundamentals of arithmetic and geometric progression, the definitive guide to understand spearman’s rank correlation, mean squared error: overview, examples, concepts and more, all you need to know about the empirical rule in statistics, the complete guide to skewness and kurtosis, a holistic look at bernoulli distribution.

All You Need to Know About Bias in Statistics

A Complete Guide to Get a Grasp of Time Series Analysis

The Key Differences Between Z-Test Vs. T-Test

The Complete Guide to Understand Pearson's Correlation

A complete guide on the types of statistical studies, everything you need to know about poisson distribution, your best guide to understand correlation vs. regression, the most comprehensive guide for beginners on what is correlation, hypothesis testing in statistics - types | examples.

Lesson 10 of 24 By Avijeet Biswal

What Is Hypothesis Testing in Statistics? Types and Examples

In today’s data-driven world, decisions are based on data all the time. Hypothesis plays a crucial role in that process, whether it may be making business decisions, in the health sector, academia, or in quality improvement. Without hypothesis & hypothesis tests, you risk drawing the wrong conclusions and making bad decisions. In this tutorial, you will look at Hypothesis Testing in Statistics.

The Ultimate Ticket to Top Data Science Job Roles

What Is Hypothesis Testing in Statistics?

Hypothesis Testing is a type of statistical analysis in which you put your assumptions about a population parameter to the test. It is used to estimate the relationship between 2 statistical variables.

Let's discuss few examples of statistical hypothesis from real-life -

A teacher assumes that 60% of his college's students come from lower-middle-class families.
A doctor believes that 3D (Diet, Dose, and Discipline) is 90% effective for diabetic patients.

Now that you know about hypothesis testing, look at the two types of hypothesis testing in statistics.

Hypothesis Testing Formula

Z = ( x̅ – μ0 ) / (σ /√n)

Here, x̅ is the sample mean,
μ0 is the population mean,
σ is the standard deviation,
n is the sample size.

How Hypothesis Testing Works?

An analyst performs hypothesis testing on a statistical sample to present evidence of the plausibility of the null hypothesis. Measurements and analyses are conducted on a random sample of the population to test a theory. Analysts use a random population sample to test two hypotheses: the null and alternative hypotheses.

The null hypothesis is typically an equality hypothesis between population parameters; for example, a null hypothesis may claim that the population means return equals zero. The alternate hypothesis is essentially the inverse of the null hypothesis (e.g., the population means the return is not equal to zero). As a result, they are mutually exclusive, and only one can be correct. One of the two possibilities, however, will always be correct.

Your Dream Career is Just Around The Corner!

Null Hypothesis and Alternative Hypothesis

The Null Hypothesis is the assumption that the event will not occur. A null hypothesis has no bearing on the study's outcome unless it is rejected.

H0 is the symbol for it, and it is pronounced H-naught.

The Alternate Hypothesis is the logical opposite of the null hypothesis. The acceptance of the alternative hypothesis follows the rejection of the null hypothesis. H1 is the symbol for it.

Let's understand this with an example.

A sanitizer manufacturer claims that its product kills 95 percent of germs on average.

To put this company's claim to the test, create a null and alternate hypothesis.

H0 (Null Hypothesis): Average = 95%.

Alternative Hypothesis (H1): The average is less than 95%.

Another straightforward example to understand this concept is determining whether or not a coin is fair and balanced. The null hypothesis states that the probability of a show of heads is equal to the likelihood of a show of tails. In contrast, the alternate theory states that the probability of a show of heads and tails would be very different.

Become a Data Scientist with Hands-on Training!

Hypothesis Testing Calculation With Examples

Let's consider a hypothesis test for the average height of women in the United States. Suppose our null hypothesis is that the average height is 5'4". We gather a sample of 100 women and determine that their average height is 5'5". The standard deviation of population is 2.

To calculate the z-score, we would use the following formula:

z = ( x̅ – μ0 ) / (σ /√n)

z = (5'5" - 5'4") / (2" / √100)

z = 0.5 / (0.045)

We will reject the null hypothesis as the z-score of 11.11 is very large and conclude that there is evidence to suggest that the average height of women in the US is greater than 5'4".

Steps in Hypothesis Testing

Hypothesis testing is a statistical method to determine if there is enough evidence in a sample of data to infer that a certain condition is true for the entire population. Here’s a breakdown of the typical steps involved in hypothesis testing:

Formulate Hypotheses

Null Hypothesis (H0): This hypothesis states that there is no effect or difference, and it is the hypothesis you attempt to reject with your test.
Alternative Hypothesis (H1 or Ha): This hypothesis is what you might believe to be true or hope to prove true. It is usually considered the opposite of the null hypothesis.

Choose the Significance Level (α)

The significance level, often denoted by alpha (α), is the probability of rejecting the null hypothesis when it is true. Common choices for α are 0.05 (5%), 0.01 (1%), and 0.10 (10%).

Select the Appropriate Test

Choose a statistical test based on the type of data and the hypothesis. Common tests include t-tests, chi-square tests, ANOVA, and regression analysis. The selection depends on data type, distribution, sample size, and whether the hypothesis is one-tailed or two-tailed.

Collect Data

Gather the data that will be analyzed in the test. This data should be representative of the population to infer conclusions accurately.

Calculate the Test Statistic

Based on the collected data and the chosen test, calculate a test statistic that reflects how much the observed data deviates from the null hypothesis.

Determine the p-value

The p-value is the probability of observing test results at least as extreme as the results observed, assuming the null hypothesis is correct. It helps determine the strength of the evidence against the null hypothesis.

Make a Decision

Compare the p-value to the chosen significance level:

If the p-value ≤ α: Reject the null hypothesis, suggesting sufficient evidence in the data supports the alternative hypothesis.
If the p-value > α: Do not reject the null hypothesis, suggesting insufficient evidence to support the alternative hypothesis.

Report the Results

Present the findings from the hypothesis test, including the test statistic, p-value, and the conclusion about the hypotheses.

Perform Post-hoc Analysis (if necessary)

Depending on the results and the study design, further analysis may be needed to explore the data more deeply or to address multiple comparisons if several hypotheses were tested simultaneously.

Types of Hypothesis Testing

To determine whether a discovery or relationship is statistically significant, hypothesis testing uses a z-test. It usually checks to see if two means are the same (the null hypothesis). Only when the population standard deviation is known and the sample size is 30 data points or more, can a z-test be applied.

A statistical test called a t-test is employed to compare the means of two groups. To determine whether two groups differ or if a procedure or treatment affects the population of interest, it is frequently used in hypothesis testing.

Chi-Square

You utilize a Chi-square test for hypothesis testing concerning whether your data is as predicted. To determine if the expected and observed results are well-fitted, the Chi-square test analyzes the differences between categorical variables from a random sample. The test's fundamental premise is that the observed values in your data should be compared to the predicted values that would be present if the null hypothesis were true.

Hypothesis Testing and Confidence Intervals

Both confidence intervals and hypothesis tests are inferential techniques that depend on approximating the sample distribution. Data from a sample is used to estimate a population parameter using confidence intervals. Data from a sample is used in hypothesis testing to examine a given hypothesis. We must have a postulated parameter to conduct hypothesis testing.

Bootstrap distributions and randomization distributions are created using comparable simulation techniques. The observed sample statistic is the focal point of a bootstrap distribution, whereas the null hypothesis value is the focal point of a randomization distribution.

A variety of feasible population parameter estimates are included in confidence ranges. In this lesson, we created just two-tailed confidence intervals. There is a direct connection between these two-tail confidence intervals and these two-tail hypothesis tests. The results of a two-tailed hypothesis test and two-tailed confidence intervals typically provide the same results. In other words, a hypothesis test at the 0.05 level will virtually always fail to reject the null hypothesis if the 95% confidence interval contains the predicted value. A hypothesis test at the 0.05 level will nearly certainly reject the null hypothesis if the 95% confidence interval does not include the hypothesized parameter.

Become a Data Scientist through hands-on learning with hackathons, masterclasses, webinars, and Ask-Me-Anything! Start learning now!

Simple and Composite Hypothesis Testing

Depending on the population distribution, you can classify the statistical hypothesis into two types.

Simple Hypothesis: A simple hypothesis specifies an exact value for the parameter.

Composite Hypothesis: A composite hypothesis specifies a range of values.

A company is claiming that their average sales for this quarter are 1000 units. This is an example of a simple hypothesis.

Suppose the company claims that the sales are in the range of 900 to 1000 units. Then this is a case of a composite hypothesis.

One-Tailed and Two-Tailed Hypothesis Testing

The One-Tailed test, also called a directional test, considers a critical region of data that would result in the null hypothesis being rejected if the test sample falls into it, inevitably meaning the acceptance of the alternate hypothesis.

In a one-tailed test, the critical distribution area is one-sided, meaning the test sample is either greater or lesser than a specific value.

In two tails, the test sample is checked to be greater or less than a range of values in a Two-Tailed test, implying that the critical distribution area is two-sided.

If the sample falls within this range, the alternate hypothesis will be accepted, and the null hypothesis will be rejected.

Become a Data Scientist With Real-World Experience

Right Tailed Hypothesis Testing

If the larger than (>) sign appears in your hypothesis statement, you are using a right-tailed test, also known as an upper test. Or, to put it another way, the disparity is to the right. For instance, you can contrast the battery life before and after a change in production. Your hypothesis statements can be the following if you want to know if the battery life is longer than the original (let's say 90 hours):

The null hypothesis is (H0 <= 90) or less change.
A possibility is that battery life has risen (H1) > 90.

The crucial point in this situation is that the alternate hypothesis (H1), not the null hypothesis, decides whether you get a right-tailed test.

Left Tailed Hypothesis Testing

Alternative hypotheses that assert the true value of a parameter is lower than the null hypothesis are tested with a left-tailed test; they are indicated by the asterisk "<".

Suppose H0: mean = 50 and H1: mean not equal to 50

According to the H1, the mean can be greater than or less than 50. This is an example of a Two-tailed test.

In a similar manner, if H0: mean >=50, then H1: mean <50

Here the mean is less than 50. It is called a One-tailed test.

Type 1 and Type 2 Error

A hypothesis test can result in two types of errors.

Type 1 Error: A Type-I error occurs when sample results reject the null hypothesis despite being true.

Type 2 Error: A Type-II error occurs when the null hypothesis is not rejected when it is false, unlike a Type-I error.

Suppose a teacher evaluates the examination paper to decide whether a student passes or fails.

H0: Student has passed

H1: Student has failed

Type I error will be the teacher failing the student [rejects H0] although the student scored the passing marks [H0 was true].

Type II error will be the case where the teacher passes the student [do not reject H0] although the student did not score the passing marks [H1 is true].

Our Data Scientist Master's Program covers core topics such as R, Python, Machine Learning, Tableau, Hadoop, and Spark. Get started on your journey today!

Limitations of Hypothesis Testing

Hypothesis testing has some limitations that researchers should be aware of:

It cannot prove or establish the truth: Hypothesis testing provides evidence to support or reject a hypothesis, but it cannot confirm the absolute truth of the research question.
Results are sample-specific: Hypothesis testing is based on analyzing a sample from a population, and the conclusions drawn are specific to that particular sample.
Possible errors: During hypothesis testing, there is a chance of committing type I error (rejecting a true null hypothesis) or type II error (failing to reject a false null hypothesis).
Assumptions and requirements: Different tests have specific assumptions and requirements that must be met to accurately interpret results.

Learn All The Tricks Of The BI Trade

After reading this tutorial, you would have a much better understanding of hypothesis testing, one of the most important concepts in the field of Data Science . The majority of hypotheses are based on speculation about observed behavior, natural phenomena, or established theories.

If you are interested in statistics of data science and skills needed for such a career, you ought to explore the Post Graduate Program in Data Science.

If you have any questions regarding this ‘Hypothesis Testing In Statistics’ tutorial, do share them in the comment section. Our subject matter expert will respond to your queries. Happy learning!

1. What is hypothesis testing in statistics with example?

Hypothesis testing is a statistical method used to determine if there is enough evidence in a sample data to draw conclusions about a population. It involves formulating two competing hypotheses, the null hypothesis (H0) and the alternative hypothesis (Ha), and then collecting data to assess the evidence. An example: testing if a new drug improves patient recovery (Ha) compared to the standard treatment (H0) based on collected patient data.

2. What is H0 and H1 in statistics?

In statistics, H0 and H1 represent the null and alternative hypotheses. The null hypothesis, H0, is the default assumption that no effect or difference exists between groups or conditions. The alternative hypothesis, H1, is the competing claim suggesting an effect or a difference. Statistical tests determine whether to reject the null hypothesis in favor of the alternative hypothesis based on the data.

3. What is a simple hypothesis with an example?

A simple hypothesis is a specific statement predicting a single relationship between two variables. It posits a direct and uncomplicated outcome. For example, a simple hypothesis might state, "Increased sunlight exposure increases the growth rate of sunflowers." Here, the hypothesis suggests a direct relationship between the amount of sunlight (independent variable) and the growth rate of sunflowers (dependent variable), with no additional variables considered.

4. What are the 3 major types of hypothesis?

The three major types of hypotheses are:

Null Hypothesis (H0): Represents the default assumption, stating that there is no significant effect or relationship in the data.
Alternative Hypothesis (Ha): Contradicts the null hypothesis and proposes a specific effect or relationship that researchers want to investigate.
Nondirectional Hypothesis: An alternative hypothesis that doesn't specify the direction of the effect, leaving it open for both positive and negative possibilities.

Find our PL-300 Microsoft Power BI Certification Training Online Classroom training classes in top cities:

Name	Date	Place
	21 Sep -6 Oct 2024, Weekend batch	Your City
	12 Oct -27 Oct 2024, Weekend batch	Your City
	26 Oct -10 Nov 2024, Weekend batch	Your City

About the Author

Avijeet is a Senior Research Analyst at Simplilearn. Passionate about Data Analytics, Machine Learning, and Deep Learning, Avijeet is also interested in politics, cricket, and football.

Recommended Resources

Free eBook: Top Programming Languages For A Data Scientist

Normality Test in Minitab: Minitab with Statistics

Machine Learning Career Guide: A Playbook to Becoming a Machine Learning Engineer

PMP, PMI, PMBOK, CAPM, PgMP, PfMP, ACP, PBA, RMP, SP, and OPM3 are registered marks of the Project Management Institute, Inc.

9.3 Probability Distribution Needed for Hypothesis Testing

Earlier in the course, we discussed sampling distributions. Particular distributions are associated with various types of hypothesis testing.

The following table summarizes various hypothesis tests and corresponding probability distributions that will be used to conduct the test (based on the assumptions shown below):

Type of Hypothesis Test	Population Parameter	Estimated value (point estimate)	Probability Distribution Used
Hypothesis test for the mean, when the population standard deviation is known	Population mean	Sample mean	Normal distribution,
Hypothesis test for the mean, when the population standard deviation is unknown and the distribution of the sample mean is approximately normal	Population mean	Sample mean	Student’s t-distribution,
Hypothesis test for proportions	Population proportion	Sample proportion	Normal distribution,

Assumptions

When you perform a hypothesis test of a single population mean μ using a normal distribution (often called a z-test), you take a simple random sample from the population. The population you are testing is normally distributed , or your sample size is sufficiently large. You know the value of the population standard deviation , which, in reality, is rarely known.

When you perform a hypothesis test of a single population mean μ using a Student's t-distribution (often called a t -test), there are fundamental assumptions that need to be met in order for the test to work properly. Your data should be a simple random sample that comes from a population that is approximately normally distributed. You use the sample standard deviation to approximate the population standard deviation. (Note that if the sample size is sufficiently large, a t -test will work even if the population is not approximately normally distributed).

When you perform a hypothesis test of a single population proportion p , you take a simple random sample from the population. You must meet the conditions for a binomial distribution : there are a certain number n of independent trials, the outcomes of any trial are success or failure, and each trial has the same probability of a success p . The shape of the binomial distribution needs to be similar to the shape of the normal distribution. To ensure this, the quantities np and nq must both be greater than five ( n p > 5 n p > 5 and n q > 5 n q > 5 ). Then the binomial distribution of a sample (estimated) proportion can be approximated by the normal distribution with μ = p μ = p and σ = p q n σ = p q n . Remember that q = 1 - p q q = 1 - p q .

Hypothesis Test for the Mean

Going back to the standardizing formula we can derive the test statistic for testing hypotheses concerning means.

The standardizing formula cannot be solved as it is because we do not have μ, the population mean. However, if we substitute in the hypothesized value of the mean, μ 0 in the formula as above, we can compute a Z value. This is the test statistic for a test of hypothesis for a mean and is presented in Figure 9.3 . We interpret this Z value as the associated probability that a sample with a sample mean of X ¯ X ¯ could have come from a distribution with a population mean of H 0 and we call this Z value Z c for “calculated”. Figure 9.3 and Figure 9.4 show this process.

In Figure 9.3 two of the three possible outcomes are presented. X ¯ 1 X ¯ 1 and X ¯ 3 X ¯ 3 are in the tails of the hypothesized distribution of H 0 . Notice that the horizontal axis in the top panel is labeled X ¯ X ¯ 's. This is the same theoretical distribution of X ¯ X ¯ 's, the sampling distribution, that the Central Limit Theorem tells us is normally distributed. This is why we can draw it with this shape. The horizontal axis of the bottom panel is labeled Z and is the standard normal distribution. Z α 2 Z α 2 and -Z α 2 -Z α 2 , called the critical values , are marked on the bottom panel as the Z values associated with the probability the analyst has set as the level of significance in the test, (α). The probabilities in the tails of both panels are, therefore, the same.

Notice that for each X ¯ X ¯ there is an associated Z c , called the calculated Z, that comes from solving the equation above. This calculated Z is nothing more than the number of standard deviations that the hypothesized mean is from the sample mean. If the sample mean falls "too many" standard deviations from the hypothesized mean we conclude that the sample mean could not have come from the distribution with the hypothesized mean, given our pre-set required level of significance. It could have come from H 0 , but it is deemed just too unlikely. In Figure 9.3 both X ¯ 1 X ¯ 1 and X ¯ 3 X ¯ 3 are in the tails of the distribution. They are deemed "too far" from the hypothesized value of the mean given the chosen level of alpha. If in fact this sample mean it did come from H 0 , but from in the tail, we have made a Type I error: we have rejected a good null. Our only real comfort is that we know the probability of making such an error, α, and we can control the size of α.

Figure 9.4 shows the third possibility for the location of the sample mean, x _ x _ . Here the sample mean is within the two critical values. That is, within the probability of (1-α) and we cannot reject the null hypothesis.

This gives us the decision rule for testing a hypothesis for a two-tailed test:

Decision rule: two-tail test
If < : then do not REJECT
If > : then REJECT

This rule will always be the same no matter what hypothesis we are testing or what formulas we are using to make the test. The only change will be to change the Z c to the appropriate symbol for the test statistic for the parameter being tested. Stating the decision rule another way: if the sample mean is unlikely to have come from the distribution with the hypothesized mean we cannot accept the null hypothesis. Here we define "unlikely" as having a probability less than alpha of occurring.

P-Value Approach

An alternative decision rule can be developed by calculating the probability that a sample mean could be found that would give a test statistic larger than the test statistic found from the current sample data assuming that the null hypothesis is true. Here the notion of "likely" and "unlikely" is defined by the probability of drawing a sample with a mean from a population with the hypothesized mean that is either larger or smaller than that found in the sample data. Simply stated, the p-value approach compares the desired significance level, α, to the p-value which is the probability of drawing a sample mean further from the hypothesized value than the actual sample mean. A large p -value calculated from the data indicates that we should not reject the null hypothesis . The smaller the p -value, the more unlikely the outcome, and the stronger the evidence is against the null hypothesis. We would reject the null hypothesis if the evidence is strongly against it. The relationship between the decision rule of comparing the calculated test statistics, Z c , and the Critical Value, Z α , and using the p -value can be seen in Figure 9.5 .

The calculated value of the test statistic is Z c in this example and is marked on the bottom graph of the standard normal distribution because it is a Z value. In this case the calculated value is in the tail and thus we cannot accept the null hypothesis, the associated X ¯ X ¯ is just too unusually large to believe that it came from the distribution with a mean of µ 0 with a significance level of α.

If we use the p -value decision rule we need one more step. We need to find in the standard normal table the probability associated with the calculated test statistic, Z c . We then compare that to the α associated with our selected level of confidence. In Figure 9.5 we see that the p -value is less than α and therefore we cannot accept the null. We know that the p -value is less than α because the area under the p-value is smaller than α/2. It is important to note that two researchers drawing randomly from the same population may find two different P-values from their samples. This occurs because the P-value is calculated as the probability in the tail beyond the sample mean assuming that the null hypothesis is correct. Because the sample means will in all likelihood be different this will create two different P-values. Nevertheless, the conclusions as to the null hypothesis should be different with only the level of probability of α.

Here is a systematic way to make a decision of whether you cannot accept or cannot reject a null hypothesis if using the p -value and a preset or preconceived α (the " significance level "). A preset α is the probability of a Type I error (rejecting the null hypothesis when the null hypothesis is true). It may or may not be given to you at the beginning of the problem. In any case, the value of α is the decision of the analyst. When you make a decision to reject or not reject H 0 , do as follows:

If α > p -value, cannot accept H 0 . The results of the sample data are significant. There is sufficient evidence to conclude that H 0 is an incorrect belief and that the alternative hypothesis , H a , may be correct.
If α ≤ p -value, cannot reject H 0 . The results of the sample data are not significant. There is not sufficient evidence to conclude that the alternative hypothesis, H a , may be correct. In this case the status quo stands.
When you "cannot reject H 0 ", it does not mean that you should believe that H 0 is true. It simply means that the sample data have failed to provide sufficient evidence to cast serious doubt about the truthfulness of H 0 . Remember that the null is the status quo and it takes high probability to overthrow the status quo. This bias in favor of the null hypothesis is what gives rise to the statement "tyranny of the status quo" when discussing hypothesis testing and the scientific method.

Both decision rules will result in the same decision and it is a matter of preference which one is used.

One and Two-tailed Tests

The discussion of Figure 9.3 - Figure 9.5 was based on the null and alternative hypothesis presented in Figure 9.3 . This was called a two-tailed test because the alternative hypothesis allowed that the mean could have come from a population which was either larger or smaller than the hypothesized mean in the null hypothesis. This could be seen by the statement of the alternative hypothesis as μ ≠ 100, in this example.

It may be that the analyst has no concern about the value being "too" high or "too" low from the hypothesized value. If this is the case, it becomes a one-tailed test and all of the alpha probability is placed in just one tail and not split into α/2 as in the above case of a two-tailed test. Any test of a claim will be a one-tailed test. For example, a car manufacturer claims that their Model 17B provides gas mileage of greater than 25 miles per gallon. The null and alternative hypothesis would be:

H 0 : µ ≤ 25
H a : µ > 25

The claim would be in the alternative hypothesis. The burden of proof in hypothesis testing is carried in the alternative. This is because failing to reject the null, the status quo, must be accomplished with 90 or 95 percent confidence that it cannot be maintained. Said another way, we want to have only a 5 or 10 percent probability of making a Type I error, rejecting a good null; overthrowing the status quo.

This is a one-tailed test and all of the alpha probability is placed in just one tail and not split into α/2 as in the above case of a two-tailed test.

Figure 9.6 shows the two possible cases and the form of the null and alternative hypothesis that give rise to them.

where μ 0 is the hypothesized value of the population mean.

Sample size	Test statistic
< 30 (σ unknown)
< 30 (σ known)
> 30 (σ unknown)
> 30 (σ known)

Effects of Sample Size on Test Statistic

In developing the confidence intervals for the mean from a sample, we found that most often we would not have the population standard deviation, σ. If the sample size were less than 30, we could simply substitute the point estimate for σ, the sample standard deviation, s, and use the student's t -distribution to correct for this lack of information.

When testing hypotheses we are faced with this same problem and the solution is exactly the same. Namely: If the population standard deviation is unknown, and the sample size is less than 30, substitute s, the point estimate for the population standard deviation, σ, in the formula for the test statistic and use the student's t -distribution. All the formulas and figures above are unchanged except for this substitution and changing the Z distribution to the student's t -distribution on the graph. Remember that the student's t -distribution can only be computed knowing the proper degrees of freedom for the problem. In this case, the degrees of freedom is computed as before with confidence intervals: df = (n-1). The calculated t-value is compared to the t-value associated with the pre-set level of confidence required in the test, t α , df found in the student's t tables. If we do not know σ, but the sample size is 30 or more, we simply substitute s for σ and use the normal distribution.

Table 9.5 summarizes these rules.

A Systematic Approach for Testing a Hypothesis

A systematic approach to hypothesis testing follows the following steps and in this order. This template will work for all hypotheses that you will ever test.

Set up the null and alternative hypothesis. This is typically the hardest part of the process. Here the question being asked is reviewed. What parameter is being tested, a mean, a proportion, differences in means, etc. Is this a one-tailed test or two-tailed test?

Decide the level of significance required for this particular case and determine the critical value. These can be found in the appropriate statistical table. The levels of confidence typical for businesses are 80, 90, 95, 98, and 99. However, the level of significance is a policy decision and should be based upon the risk of making a Type I error, rejecting a good null. Consider the consequences of making a Type I error.

Next, on the basis of the hypotheses and sample size, select the appropriate test statistic and find the relevant critical value: Z α , t α , etc. Drawing the relevant probability distribution and marking the critical value is always big help. Be sure to match the graph with the hypothesis, especially if it is a one-tailed test.

Take a sample(s) and calculate the relevant parameters: sample mean, standard deviation, or proportion. Using the formula for the test statistic from above in step 2, now calculate the test statistic for this particular case using the parameters you have just calculated.
The test statistic is in the tail: Cannot Accept the null, the probability that this sample mean (proportion) came from the hypothesized distribution is too small to believe that it is the real home of these sample data.
The test statistic is not in the tail: Cannot Reject the null, the sample data are compatible with the hypothesized population parameter.
Reach a conclusion. It is best to articulate the conclusion two different ways. First a formal statistical conclusion such as “With a 5 % level of significance we cannot accept the null hypotheses that the population mean is equal to XX (units of measurement)”. The second statement of the conclusion is less formal and states the action, or lack of action, required. If the formal conclusion was that above, then the informal one might be, “The machine is broken and we need to shut it down and call for repairs”.

All hypotheses tested will go through this same process. The only changes are the relevant formulas and those are determined by the hypothesis required to answer the original question.

This book may not be used in the training of large language models or otherwise be ingested into large language models or generative AI offerings without OpenStax's permission.

Want to cite, share, or modify this book? This book uses the Creative Commons Attribution License and you must attribute OpenStax.

Access for free at https://openstax.org/books/introductory-business-statistics-2e/pages/1-introduction

Authors: Alexander Holmes, Barbara Illowsky, Susan Dean
Publisher/website: OpenStax
Book title: Introductory Business Statistics 2e
Publication date: Dec 13, 2023
Location: Houston, Texas
Book URL: https://openstax.org/books/introductory-business-statistics-2e/pages/1-introduction
Section URL: https://openstax.org/books/introductory-business-statistics-2e/pages/9-3-probability-distribution-needed-for-hypothesis-testing

© Jul 18, 2024 OpenStax. Textbook content produced by OpenStax is licensed under a Creative Commons Attribution License . The OpenStax name, OpenStax logo, OpenStax book covers, OpenStax CNX name, and OpenStax CNX logo are not subject to the Creative Commons license and may not be reproduced without the prior and express written consent of Rice University.

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

Knowledge Base

Hypothesis Testing | A Step-by-Step Guide with Easy Examples

Published on November 8, 2019 by Rebecca Bevans . Revised on June 22, 2023.

Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics . It is most often used by scientists to test specific predictions, called hypotheses, that arise from theories.

There are 5 main steps in hypothesis testing:

State your research hypothesis as a null hypothesis and alternate hypothesis (H o ) and (H a or H 1 ).
Collect data in a way designed to test the hypothesis.
Perform an appropriate statistical test .
Decide whether to reject or fail to reject your null hypothesis.
Present the findings in your results and discussion section.

Though the specific details might vary, the procedure you will use when testing a hypothesis will always follow some version of these steps.

Step 1: state your null and alternate hypothesis, step 2: collect data, step 3: perform a statistical test, step 4: decide whether to reject or fail to reject your null hypothesis, step 5: present your findings, other interesting articles, frequently asked questions about hypothesis testing.

After developing your initial research hypothesis (the prediction that you want to investigate), it is important to restate it as a null (H o ) and alternate (H a ) hypothesis so that you can test it mathematically.

The alternate hypothesis is usually your initial hypothesis that predicts a relationship between variables. The null hypothesis is a prediction of no relationship between the variables you are interested in.

H 0 : Men are, on average, not taller than women. H a : Men are, on average, taller than women.

Here's why students love Scribbr's proofreading services

Discover proofreading & editing

For a statistical test to be valid , it is important to perform sampling and collect data in a way that is designed to test your hypothesis. If your data are not representative, then you cannot make statistical inferences about the population you are interested in.

There are a variety of statistical tests available, but they are all based on the comparison of within-group variance (how spread out the data is within a category) versus between-group variance (how different the categories are from one another).

If the between-group variance is large enough that there is little or no overlap between groups, then your statistical test will reflect that by showing a low p -value . This means it is unlikely that the differences between these groups came about by chance.

Alternatively, if there is high within-group variance and low between-group variance, then your statistical test will reflect that with a high p -value. This means it is likely that any difference you measure between groups is due to chance.

Your choice of statistical test will be based on the type of variables and the level of measurement of your collected data .

an estimate of the difference in average height between the two groups.
a p -value showing how likely you are to see this difference if the null hypothesis of no difference is true.

Based on the outcome of your statistical test, you will have to decide whether to reject or fail to reject your null hypothesis.

In most cases you will use the p -value generated by your statistical test to guide your decision. And in most cases, your predetermined level of significance for rejecting the null hypothesis will be 0.05 – that is, when there is a less than 5% chance that you would see these results if the null hypothesis were true.

In some cases, researchers choose a more conservative level of significance, such as 0.01 (1%). This minimizes the risk of incorrectly rejecting the null hypothesis ( Type I error ).

The results of hypothesis testing will be presented in the results and discussion sections of your research paper , dissertation or thesis .

In the results section you should give a brief summary of the data and a summary of the results of your statistical test (for example, the estimated difference between group means and associated p -value). In the discussion , you can discuss whether your initial hypothesis was supported by your results or not.

In the formal language of hypothesis testing, we talk about rejecting or failing to reject the null hypothesis. You will probably be asked to do this in your statistics assignments.

However, when presenting research results in academic papers we rarely talk this way. Instead, we go back to our alternate hypothesis (in this case, the hypothesis that men are on average taller than women) and state whether the result of our test did or did not support the alternate hypothesis.

If your null hypothesis was rejected, this result is interpreted as “supported the alternate hypothesis.”

These are superficial differences; you can see that they mean the same thing.

You might notice that we don’t say that we reject or fail to reject the alternate hypothesis . This is because hypothesis testing is not designed to prove or disprove anything. It is only designed to test whether a pattern we measure could have arisen spuriously, or by chance.

If we reject the null hypothesis based on our research (i.e., we find that it is unlikely that the pattern arose by chance), then we can say our test lends support to our hypothesis . But if the pattern does not pass our decision rule, meaning that it could have arisen by chance, then we say the test is inconsistent with our hypothesis .

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

Normal distribution
Descriptive statistics
Measures of central tendency
Correlation coefficient

Methodology

Cluster sampling
Stratified sampling
Types of interviews
Cohort study
Thematic analysis

Research bias

Implicit bias
Cognitive bias
Survivorship bias
Availability heuristic
Nonresponse bias
Regression to the mean

Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics. It is used by scientists to test specific predictions, called hypotheses , by calculating how likely it is that a pattern or relationship between variables could have arisen by chance.

A hypothesis states your predictions about what your research will find. It is a tentative answer to your research question that has not yet been tested. For some research projects, you might have to write several hypotheses that address different aspects of your research question.

A hypothesis is not just a guess — it should be based on existing theories and knowledge. It also has to be testable, which means you can support or refute it through scientific research methods (such as experiments, observations and statistical analysis of data).

Null and alternative hypotheses are used in statistical hypothesis testing . The null hypothesis of a test always predicts no effect or no relationship between variables, while the alternative hypothesis states your research prediction of an effect or relationship.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Bevans, R. (2023, June 22). Hypothesis Testing | A Step-by-Step Guide with Easy Examples. Scribbr. Retrieved September 3, 2024, from https://www.scribbr.com/statistics/hypothesis-testing/

Is this article helpful?

Rebecca Bevans

Other students also liked, choosing the right statistical test | types & examples, understanding p values | definition and examples, what is your plagiarism score.

Skip to secondary menu
Skip to main content
Skip to primary sidebar

Statistics By Jim

Making statistics intuitive

Test Statistic: Definition, Types & Formulas

By Jim Frost 10 Comments

What is a Test Statistic?

A test statistic assesses how consistent your sample data are with the null hypothesis in a hypothesis test. Test statistic calculations take your sample data and boil them down to a single number that quantifies how much your sample diverges from the null hypothesis. As a test statistic value becomes more extreme, it indicates larger differences between your sample data and the null hypothesis.

When your test statistic indicates a sufficiently large incompatibility with the null hypothesis, you can reject the null and state that your results are statistically significant—your data support the notion that the sample effect exists in the population . To use a test statistic to evaluate statistical significance, you either compare it to a critical value or use it to calculate the p-value .

Statisticians named the hypothesis tests after the test statistics because they’re the quantity that the tests actually evaluate. For example, t-tests assess t-values, F-tests evaluate F-values, and chi-square tests use, you guessed it, chi-square values.

In this post, learn about test statistics, how to calculate them, interpret them, and evaluate statistical significance using the critical value and p-value methods.

How to Find Test Statistics

Each test statistic has its own formula. I present several common test statistics examples below. To see worked examples for each one, click the links to my more detailed articles.

Formulas for Test Statistics


T-value for 1-sample t-test		Take the sample mean, subtract the hypothesized mean, and divide by the .
T-value for 2-sample t-test		Take one sample mean, subtract the other, and divide by the pooled standard deviation.
F-value for F-tests and ANOVA		Calculate the ratio of two .
Chi-squared value (χ ) for a Chi-squared test		Sum the squared differences between observed and expected values divided by the expected values.

Understanding the Null Values and the Test Statistic Formulas

In the formulas above, it’s helpful to understand the null condition and the test statistic value that occurs when your sample data match that condition exactly. Also, it’s worthwhile knowing what causes the test statistics to move further away from the null value, potentially becoming significant. Test statistics are statistically significant when they exceed a critical value.

All these test statistics are ratios, which helps you understand their null values.

T-Tests, Null = 0

When a t-value equals 0, it indicates that your sample data match the null hypothesis exactly.

For a 1-sample t-test, when the sample mean equals the hypothesized mean, the numerator is zero, which causes the entire t-value ratio to equal zero. As the sample mean moves away from the hypothesized mean in either the positive or negative direction, the test statistic moves away from zero in the same direction.

A similar case exists for 2-sample t-tests. When the two sample means are equal, the numerator is zero, and the entire test statistic ratio is zero. As the two sample means become increasingly different, the absolute value of the numerator increases, and the t-value becomes more positive or negative.

F-tests including ANOVA, Null = 1

When an F-value equals 1, it indicates that the two variances in the numerator and denominator are equal, matching the null hypothesis.

As the numerator and denominator become less and less similar, the F-value moves away from one in either direction.

Chi-squared Tests, Null = 0

When a chi-squared value equals 0, it indicates that the observed values always match the expected values. This condition causes the numerator to equal zero, making the chi-squared value equal zero.

As the observed values progressively fail to match the expected values, the numerator increases, causing the test statistic to rise from zero.

Related post : How a Chi-Squared Test Works

You’ll never see a test statistic that equals the null value precisely in practice. However, trivial differences been sample values and the null value are not uncommon.

Interpreting Test Statistics

Test statistics are unitless. This fact can make them difficult to interpret on their own. You know they evaluate how well your data agree with the null hypothesis. If your test statistic is extreme enough, your data are so incompatible with the null hypothesis that you can reject it and conclude that your results are statistically significant. But how does that translate to specific values of your test statistic? Where do you draw the line?

For instance, t-values of zero match the null value. But how far from zero should your t-value be to be statistically significant? Is 1 enough? 2? 3? If your t-value is 2, what does it mean anyway? In this case, we know that the sample mean doesn’t equal the null value, but how exceptional is it? To complicate matters, the dividing line changes depending on your sample size and other study design issues.

Similar types of questions apply to the other test statistics too.

To interpret individual values of a test statistic, we need to place them in a larger context. Towards this end, let me introduce you to sampling distributions for test statistics!

Sampling Distributions for Test Statistics

Performing a hypothesis test on a sample produces a single test statistic. Now, imagine you carry out the following process:

Assume the null hypothesis is true in the population.
Repeat your study many times by drawing many random samples of the same size from this population.
Perform the same hypothesis test on all these samples and save the test statistics.
Plot the distribution of the test statistics.

This process produces the distribution of test statistic values that occurs when the effect does not exist in the population (i.e., the null hypothesis is true). Statisticians refer to this type of distribution as a sampling distribution, a kind of probability distribution.

Why would we need this type of distribution?

It provides the larger context required for interpreting a test statistic. More specifically, it allows us to compare our study’s single test statistic to values likely to occur when the null is true. We can quantify our sample statistic’s rareness while assuming the effect does not exist in the population. Now that’s helpful!

Fortunately, we don’t need to collect many random samples to create this distribution! Statisticians have developed formulas allowing us to estimate sampling distributions for test statistics using the sample data.

To evaluate your data’s compatibility with the null hypothesis, place your study’s test statistic in the distribution.

Related post : Understanding Probability Distributions

Example of a Test Statistic in a Sampling Distribution

Suppose our t-test produces a t-value of two. That’s our test statistic. Let’s see where it fits in.

The sampling distribution below shows a t-distribution with 20 degrees of freedom, equating to a 1-sample t-test with a sample size of 21. The distribution centers on zero because it assumes the null hypothesis is correct. When the null is true, your analysis is most likely to obtain a t-value near zero and less likely to produce t-values further from zero in either direction.

Sampling distribution for the t-value test statistic.

The sampling distribution indicates that our test statistic is somewhat rare when we assume the null hypothesis is correct. However, the chances of observing t-values from -2 to +2 are not totally inconceivable. We need a way to quantify the likelihood.

From this point, we need to use the sampling distributions’ ability to calculate probabilities for test statistics.

Related post : Sampling Distributions Explained

Test Statistics and Critical Values

The significance level uses critical values to define how far the test statistic must be from the null value to reject the null hypothesis. When the test statistic exceeds a critical value, the results are statistically significant.

The percentage of the area beneath the sampling distribution curve that is shaded represents the probability that the test statistic will fall in those regions when the null is true. Consequently, to depict a significance level of 0.05, I’ll shade 5% of the sampling distribution furthest away from the null value.

The two shaded areas are equidistant from the null value in the center. Each region has a likelihood of 0.025, which sums to our significance level of 0.05. These shaded areas are the critical regions for a two-tailed hypothesis test. Let’s return to our example t-value of 2.

Related post : What are Critical Values?

Sampling distribution that displays the critical values for our t-value.

In this example, the critical values are -2.086 and +2.086. Our test statistic of 2 is not statistically significant because it does not exceed the critical value.

Other hypothesis tests have their own test statistics and sampling distributions, but their processes for critical values are generally similar.

Learn how to find critical values for test statistics using tables:

T-distribution table
Chi-square table

Related post : Understanding Significance Levels

Using Test Statistics to Find P-values

P-values are the probability of observing an effect at least as extreme as your sample’s effect if you assume no effect exists in the population.

Test statistics represent effect sizes in hypothesis tests because they denote the difference between your sample effect and no effect —the null hypothesis. Consequently, you use the test statistic to calculate the p-value for your hypothesis test.

The above p-value definition is a bit tortuous. Fortunately, it’s much easier to understand how test statistics and p-values work together using a sampling distribution graph.

Let’s use our hypothetical test statistic t-value of 2 for this example. However, because I’m displaying the results of a two-tailed test, I need to use t-values of +2 and -2 to cover both tails.

Related post : One-tailed vs. Two-Tailed Hypothesis Tests

The graph below displays the probability of t-values less than -2 and greater than +2 using the area under the curve. This graph is specific to our t-test design (1-sample t-test with N = 21).

Graph of t-distribution that displays the probability for a t-value of 2.

The sampling distribution indicates that each of the two shaded regions has a probability of 0.02963—for a total of 0.05926. That’s the p-value! The graph shows that the test statistic falls within these areas almost 6% of the time when the null hypothesis is true in the population.

While this likelihood seems small, it’s not low enough to justify rejecting the null under the standard significance level of 0.05. P-value results are always consistent with the critical value method. Learn more about using test statistics to find p values .

While test statistics are a crucial part of hypothesis testing, you’ll probably let your statistical software calculate the p-value for the test. However, understanding test statistics will boost your comprehension of what a hypothesis test actually assesses.

Reader Interactions

July 5, 2024 at 8:21 am

“As the observed values progressively fail to match the observed values, the numerator increases, causing the test statistic to rise from zero”.

Sir, this sentence is written in the Chi-squared Test heading. There the observed value is written twice. I think the second one to be replaced with ‘expected values’.

July 5, 2024 at 4:10 pm

Thanks so much, Dr. Raj. You’re correct about the typo and I’ve made the correction.

May 9, 2024 at 1:40 am

Thank you very much (great page on one and two-tailed tests)!

May 6, 2024 at 12:17 pm

I would like to ask a question. If only positive numbers are the possible values in a sample (e.g. absolute values without 0), is it meaningful to test if the sample is significantly different from zero (using for example a one sample t-test or a Wilcoxon signed-rank test) or can I assume that if given a large enough sample, the result will by definition be significant (even if a small or very variable sample results in a non-significant hypothesis test).

Thank you very much,

May 6, 2024 at 4:35 pm

If you’re talking about the raw values you’re assessing using a one-sample t-test, it doesn’t make sense to compare them to zero given your description of the data. You know that the mean can’t possibly equal zero. The mean must be some positive value. Yes, in this scenario, if you have a large enough sample size, you should get statistically significant results. So, that t-test isn’t tell you anything that you don’t already know!

However, you should be aware of several things. The 1-sample test can compare your sample mean to values other than zero. Typically, you’ll need to specify the value of the null hypothesis for your software. This value is the comparison value. The test determines whether your sample data provide enough evidence to conclude that the population mean does not equal the null hypothesis value you specify. You’ll need to specify the value because there is no obvious default value to use. Every 1-sample t-test has its subject-area context with a value that makes sense for its null hypothesis value and it is frequently not zero.

I suspect that you’re getting tripped up with the fact that t-tests use a t-value of zero for its null hypothesis value. That doesn’t mean your 1-sample t-test is comparing your sample mean to zero. The test converts your data to a single t-value and compares the t-value to zero. But your actual null hypothesis value can be something else. It’s just converting your sample to a standardized value to use for testing. So, while the t-test compares your sample’s t-value to zero, you can actually compare your sample mean to any value you specify. You need to use a value that makes sense for your subject area.

I hope that makes sense!

May 8, 2024 at 8:37 am

Thank you very much Jim, this helps a lot! Actually, the value I would like to compare my sample to is zero, but I just couldn’t find the right way to test it apparently (it’s about EEG data). The original data was a sample of numbers between -1 and +1, with the question if they are significantly different from zero in either direction (in which case a one sample t-test makes sense I guess, since the sample mean can in fact be zero). However, since a sample mean of 0 can also occur if half of the sample differs in the negative, and the other half in the positive direction, I also wanted to test if there is a divergence from 0 in ‘absolute’ terms – that’s how the absolute valued numbers came about (I know that absolute values can also be zero, but in this specific case, they were all positive numbers) And a special thanks for the last paragraph – I will definitely keep in mind, it is a potential point of confusion.

May 8, 2024 at 8:33 pm

You can use a 1-sample t test for both cases but you’ll need to set them up slightly different. To detect a positive or negative difference from zero, use a 2-tailed test. For the case with absolute values, use a one-tailed test with a critical region in the positive end. To learn more, read about One- and Two-Tailed Tests Explained . Use zero for the comparison value in both cases.

February 12, 2024 at 1:00 am

Very helpful and well articulated! Thanks Jim 🙂

September 18, 2023 at 10:01 am

Thank you for brief explanation.

July 25, 2022 at 8:32 am

the content was helpful to me. thank you

Comments and Questions Cancel reply

Save 10% on All AnalystPrep 2024 Study Packages with Coupon Code BLOG10 .

Payment Plans
Product List
Partnerships

Try Free Trial
Study Packages
Levels I, II & III Lifetime Package
Video Lessons
Study Notes
Practice Questions
Levels II & III Lifetime Package
About the Exam
About our Instructor
Part I Study Packages
Part I & Part II Lifetime Package
Part II Study Packages
About your Instructor
Exams P & FM Lifetime Package
Quantitative Questions
Verbal Questions
Data Insight Questions
Live Tutoring
EA Practice Questions
Data Sufficiency Questions
Integrated Reasoning Questions

Master the Numbers: CFA Level I Quantitative Methods Explained for 2025 Candidates

If you’re preparing for the CFA Level I exam in 2025, you already know that Quantitative Methods is one of those topics that can make or break your preparation. It’s the kind of subject that, while initially intimidating, forms the backbone of everything you need to understand in the financial world. Whether numbers are your forte or you tend to lean towards more qualitative aspects, there’s no sidestepping the significance of Quantitative Methods. But here’s the good news: with the right approach, you can not only understand it but also master it, setting yourself up for success not just in the exam but in your entire CFA journey.

Why Quantitative Methods is the Cornerstone of CFA Level I

Let’s cut to the chase: Why does Quantitative Methods matter so much? You might find yourself asking, “Why should I dive deep into statistical and analytical tools when my day-to-day job doesn’t involve crunching numbers?” The answer is simple yet profound: the concepts and tools you’ll encounter in Quantitative Methods are the bedrock of sound investment decisions. Even if you don’t directly use these techniques daily, having a solid grasp of them ensures you can interpret data, understand trends, and make informed choices—skills that are invaluable in the finance industry.

Now, consider this: Quantitative Methods makes up about 6%-9% of the CFA Level I exam. That might sound like a modest slice, but don’t let the numbers fool you. The concepts you learn here are woven into other sections of the exam. Take the Time Value of Money, for example—a concept you’ll see cropping up in areas like Equity Investments and Fixed Income. So, mastering Quantitative Methods isn’t just about ticking off one section; it’s about laying a foundation that supports your entire exam performance.

How to Conquer Quantitative Methods in CFA Level I

So, how do you go about conquering this critical section? It’s not just about hard work; it’s about working smart. Here’s a strategic roadmap that will guide you through mastering Quantitative Methods for CFA Level I.

1. Start with the Learning Outcome Statements (LOS)

Every topic in the CFA curriculum is accompanied by Learning Outcome Statements (LOS). Think of these as your personal blueprint—they tell you exactly what you need to know and what you’ll be tested on. When it comes to Quantitative Methods, each LOS highlights specific skills, whether it’s calculating the future value of cash flows or interpreting the results of hypothesis testing. Your first step in mastering this topic should be a thorough review of each LOS before you dive into the readings.

For example, you might come across an LOS that asks you to describe the use of bootstrap resampling in conducting a simulation based on observed data in investment applications. What does this mean for you? It means you need to understand not just what bootstrap resampling is, but how it applies in real-world investment scenarios. Don’t just gloss over the LOS—take the time to understand fully what’s expected of you. This approach ensures that you’re not just passively reading but actively engaging with the material.

2. Break Down the Material

Quantitative Methods covers a wide spectrum of topics, from the basics of the Time Value of Money to more complex subjects like Hypothesis Testing and Linear Regression. Here’s a common pitfall: trying to rush through these readings in an attempt to cover more ground. Resist that urge. Instead, break down the material, and tackle each reading with patience. Make sure you truly understand one concept before moving on to the next.

For instance, when you reach Hypothesis Testing, don’t just focus on memorizing the formulas. Understand when and how to apply them. What does the LOS require you to know? Is it about recognizing the different types of errors, or is it about understanding the significance levels in hypothesis testing? Spend quality time with each concept—watch AnalystPrep’s detailed video lessons , go through the study notes meticulously, and ensure you’re comfortable with the material before attempting practice questions. This deliberate approach will pay off in the long run.

3. Practice Until You Can’t Get It Wrong

When it comes to mastering Quantitative Methods, practice is non-negotiable. Theoretical knowledge is essential, but it’s the application that will truly prepare you for the exam. Dive into AnalystPrep’s Qbank and practice as many questions as you can get your hands on. But here’s the kicker—don’t just practice for the sake of it. Analyze every mistake you make. For each incorrect answer, ask yourself: “What went wrong here, and how can I avoid this mistake in the future?” This reflective practice is crucial. It’s not about getting it right once; it’s about practicing until you can’t get it wrong.

Mock exams are another invaluable tool in your preparation. They simulate the real exam environment, helping you gauge your readiness. Pay particular attention to your performance in the Quantitative Methods section. If you find yourself consistently struggling with a particular type of question, don’t brush it off. Go back, revisit the topic, and strengthen your understanding. Your goal is to turn every weak spot into a strength before exam day.

4. Stay Organized and Keep Track of Your Progress

Preparing for the CFA Level I exam is more of a marathon than a sprint. With so much material to cover, it’s easy to lose track of your progress. That’s why organization is key. Create a study schedule that maps out your progress through the Quantitative Methods readings. Monitor how much time you’re spending on each topic, and regularly review your notes to reinforce what you’ve learned.

Consider keeping a dedicated study journal. Use it to jot down important points, questions that arise as you study, and areas where you need further clarification. This journal will be a valuable resource as the exam approaches, allowing you to quickly review key concepts and address any lingering doubts.

Deep Dive into Key Quantitative Methods Readings

Now that we’ve established a solid strategy, it’s time to roll up our sleeves and dig into the heart of Quantitative Methods. Each reading in this section of the CFA Level I exam brings its own set of challenges and demands a tailored approach. Let’s explore the critical areas that will not only test your understanding but also sharpen your financial acumen.

The Time Value of Money

The Time Value of Money (TVoM) is more than just a cornerstone of finance; it’s the bedrock on which financial decision-making stands. At its core, the concept is simple: a dollar today holds more value than a dollar in the future because of its potential earning capacity. But don’t be fooled by its apparent simplicity. The calculations involved can quickly become intricate, especially when different compounding periods enter the equation.

To truly master TVoM, begin by internalizing the fundamental formulas for future value (FV) and present value (PV). These are the tools that will help you quantify the concept. Once you have these down, challenge yourself with scenarios where compounding varies—whether it’s annual, semi-annual, or even continuous. This is where your financial calculator becomes not just a tool but an extension of your analytical mind. Master the TVoM functions on your BAII Plus calculator; doing so will not only save you time during the exam but also reduce the risk of errors.

But understanding TVoM goes beyond memorizing formulas. It’s about seeing the bigger picture. How does TVoM influence the valuation of bonds? What role does it play in calculating annuities? Grasping these applications will deepen your comprehension and prepare you for the nuanced questions the CFA exam is known for.

Organizing, Visualizing, and Describing Data

In finance, data isn’t just numbers on a page; it’s the lifeblood of informed decision-making. This reading is your gateway to understanding how to manage, visualize, and interpret data—skills you’ll need daily as a finance professional.

You’ll start by exploring different types of data—nominal, ordinal, interval, and ratio—and the best ways to visualize them, from bar charts to scatter plots. But here’s the key: Don’t just memorize the definitions. Dive into the why and when . Why would you choose a scatter plot over a histogram? When is it more effective to use a bar chart? Your ability to make these decisions is what will set you apart.

This section also introduces you to skewness and kurtosis—terms that describe the shape of a data distribution. These concepts may seem abstract, but they’re critical for interpreting financial data accurately. Make sure you grasp them thoroughly, as they’ll resurface in more advanced readings and across the CFA curriculum.

Probability Concepts

Probability might seem like familiar territory at first glance—most of us have encountered it in school. But the CFA exam takes it to a higher level, requiring you to apply complex formulas and concepts to real-world financial scenarios.

Bayes’ formula is a standout in this reading. It’s not just a formula; it’s a powerful tool for updating probabilities as new information emerges—something that’s invaluable in investment decision-making. Practice applying Bayes’ formula in varied scenarios until it becomes second nature.

You’ll also need to navigate conditional probability and the law of total probability. These topics can trip up even the most prepared candidates, especially when wrapped in complex word problems. The key here is to break down each problem methodically, applying the relevant formulas step by step. With practice, you’ll find that what once seemed daunting becomes manageable, even intuitive.

Common Probability Distributions

Understanding probability distributions is crucial because they underpin much of the analysis you’ll perform in finance. The binomial and normal distributions, in particular, are fundamental.

The normal distribution, with its bell curve, is a cornerstone of statistical analysis in finance. It’s essential for constructing confidence intervals, conducting hypothesis tests, and much more. You need to be comfortable with its properties—like the empirical rule (68-95-99.7 rule)—and proficient at calculating probabilities and z-scores.

This reading also introduces Monte Carlo simulation, a technique that may seem peripheral now but will become increasingly important as you progress to Level II. Monte Carlo simulation models the probability of different outcomes in processes influenced by random variables, providing powerful insights into risk and uncertainty.

Sampling and Estimation

Sampling and estimation form the backbone of many decisions in finance. These concepts allow you to make informed conclusions about a population based on a smaller, more manageable sample.

The Importance of Sampling

Sampling isn’t just about selecting data points; it’s about choosing them wisely to minimize bias and maximize insight. Whether you’re using simple random sampling to ensure every population member has an equal chance of selection or stratified sampling to gain more accurate insights, the method you choose can significantly impact your results.

Estimation in Finance

Once you have your sample, estimation techniques allow you to make educated guesses about the broader population. Whether estimating an investment’s average return or assessing market volatility, it’s crucial to understand that all estimates carry some degree of uncertainty. But by using appropriate sampling methods and acknowledging the limitations of your data, you can make decisions that are both informed and strategically sound.

Hypothesis Testing

Hypothesis testing is more than just a statistical tool; it’s a method for making data-driven decisions. In finance, this could mean anything from evaluating a new investment strategy to testing the impact of economic policies.

Understanding Hypotheses

At the heart of hypothesis testing are two competing hypotheses: the null hypothesis (no effect) and the alternative hypothesis (the effect you’re testing for). The challenge lies in analyzing your sample data to determine whether there’s enough evidence to reject the null hypothesis in favor of the alternative. This isn’t just about crunching numbers—it’s about weighing the strength of evidence and understanding the risks of potential errors.

Real-World Applications

In practice, hypothesis testing helps you navigate the complex world of finance with confidence. Whether you’re assessing a new stock-picking strategy or analyzing the impact of macroeconomic changes, mastering this tool will give you a significant edge.

Linear Regression

Linear regression is your gateway to understanding relationships between variables in finance. Whether you’re exploring how a stock’s return correlates with market movements or predicting future trends, linear regression provides the quantitative backbone.

Exploring Relationships

With linear regression, you can quantify the relationship between two variables—like how a stock’s return relates to the market return. This relationship is encapsulated in the stock’s beta, a critical measure of risk and volatility.

Practical Uses in Finance

Linear regression isn’t just theoretical; it’s applied in everything from portfolio management to economic forecasting. By understanding how to use linear regression qualitatively, you can extract meaningful insights from financial data, even if you’re not a math whiz. This skill is invaluable, helping you make more informed decisions in a highly quantitative field.

To truly excel in the CFA Level I exam, it’s essential to transform your understanding of Quantitative Methods from a mere academic exercise into a practical toolkit that can drive real-world financial decisions. This section isn’t just about mastering formulas or memorizing concepts—it’s about cultivating the ability to think analytically and make informed judgments that will serve you throughout your finance career.

As you prepare, remember that the journey to mastery is just as critical as the destination. Each topic you delve into within Quantitative Methods builds a foundational skill that will be integral not only to passing the exam but also to your broader financial acumen. Whether it’s assessing the time value of money, interpreting complex data, or applying statistical tools to real-world scenarios, these are the skills that will differentiate you as a professional in the financial industry.

The key is to approach your studies with curiosity and an eagerness to apply what you learn to actual financial challenges. Embrace the process of learning, practice consistently, and seek out resources that deepen your understanding and sharpen your skills. With dedication and the right strategies, you’ll not only pass the CFA Level I exam but also lay the groundwork for a successful and impactful career in finance.

To further enhance your preparation, consider exploring these additional resources:

Topics in the 2025 CFA Level I Exam : Dive deeper into the full spectrum of topics you’ll encounter, ensuring you have a comprehensive understanding of the entire CFA Level I curriculum. Read more here .
2025 CFA Exam Rules for Refunds, Deferrals, and Exam Day : Familiarize yourself with the latest exam policies and procedures to avoid any surprises on exam day. Get the details here .
Your 2025 CFA Exam Results Guide : Learn how to interpret your exam results and strategize your next steps, ensuring you stay on track with your CFA journey. Discover more here .
How Supporting CFA Candidates Can Transform Your Organization’s Success and Drive Growth : For those involved in supporting CFA candidates, explore how these efforts can contribute to organizational success and growth. Read the article here .

These articles will help you stay informed and prepared as you approach your CFA Level I exam, offering valuable insights and practical advice to support your success. Best of luck with your studies, and remember that mastering Quantitative Methods is just the beginning of your CFA journey.

Offered by AnalystPrep

Comprehensive Guide to Mastering Ethical Standards for CFA Level I 2025 Success

Comprehensive guide to mastering ethic ....

As you set out on your CFA journey, there’s one area you simply... Read More

When is the New 2021 CFA® Exam Window ...

At the beginning of 2021, the CFA Institute offered February 2021 level I... Read More

Beta and CAPM

Beta Beta is a measure of systematic risk, which refers to the risk... Read More

Portfolio Management – Level I CFA® ...

Reading 51 – Portfolio Management: An Overview Portfolio management is about creating a... Read More

IMAGES

Hypothesis Testing in Business and Steps Involved in it
Hypothesis Test for Population Mean Example
Calculating test statistics for means and proportions for one- and two
PPT
Hypothesis Testing in Statistics (Formula)
PPT

VIDEO

Hypothesis Testing Population Mean (1-sample)
Proportion Hypothesis Testing, example 2
Hypothesis Testing Population Proportion LEC 155
Steps for Hypothesis Testing (Population Mean)
10.2a Hypothesis tests: for a population mean, sigma known (part 1)
Hypothesis Testing Population Mean Using Statcrunch Example 1

COMMENTS

10.26: Hypothesis Test for a Population Mean (5 of 5)
The mean pregnancy length is 266 days. We test the following hypotheses. H 0: μ = 266. H a: μ < 266. Suppose a random sample of 40 women who smoke during their pregnancy have a mean pregnancy length of 260 days with a standard deviation of 21 days. The P-value is 0.04.
8.7 Hypothesis Tests for a Population Mean with Unknown Population
The p-value for a hypothesis test on a population mean is the area in the tail(s) of the distribution of the sample mean. When the population standard deviation is unknown, use the [latex]t[/latex]-distribution to find the p-value.. If the p-value is the area in the left-tail: Use the t.dist function to find the p-value. In the t.dist(t-score, degrees of freedom, logic operator) function:
8.6: Hypothesis Test of a Single Population Mean with Examples
Answer. Set up the hypothesis test: A 5% level of significance means that α = 0.05 α = 0.05. This is a test of a single population mean. H0: μ = 65 Ha: μ > 65 H 0: μ = 65 H a: μ > 65. Since the instructor thinks the average score is higher, use a " > > ". The " > > " means the test is right-tailed.
8.6 Hypothesis Tests for a Population Mean with Known Population
The hypothesis test for a population mean is a well established process: Write down the null and alternative hypotheses in terms of the population mean [latex]\mu[/latex]. Include appropriate units with the values of the mean. Use the form of the alternative hypothesis to determine if the test is left-tailed, right-tailed, or two-tailed.
8.4: Small Sample Tests for a Population Mean
where μ denotes the mean distance between the holes. Step 2. The sample is small and the population standard deviation is unknown. Thus the test statistic is. T = x¯ −μ0 s/ n−−√. and has the Student t -distribution with n − 1 = 4 − 1 = 3 degrees of freedom. Step 3. From the data we compute x¯ = 0.02075 and s = 0.00171.
8.2: Large Sample Tests for a Population Mean
There are two formulas for the test statistic in testing hypotheses about a population mean with large samples. Both test statistics follow the standard normal distribution. The population standard deviation is used if it is known, otherwise the sample standard deviation is used. The same five-step procedure is used with either test statistic.
Hypothesis Test for a Population Mean (1 of 5)
The P-value is 0.0015. Step 4: State a conclusion. Here the logic is the same as for other hypothesis tests. We use the P-value to make a decision. The P-value helps us determine if the difference we see between the data and the hypothesized value of µ is statistically significant or due to chance.
Hypothesis Test for a Mean
Solution: The solution to this problem takes four steps: (1) state the hypotheses, (2) formulate an analysis plan, (3) analyze sample data, and (4) interpret results. We work through those steps below: State the hypotheses. The first step is to state the null hypothesis and an alternative hypothesis.
Hypothesis Test for a Population Mean (5 of 5)
In this "Hypothesis Test for a Population Mean," we looked at the four steps of a hypothesis test as they relate to a claim about a population mean. Step 1: Determine the hypotheses. The hypotheses are claims about the population mean, µ. The null hypothesis is a hypothesis that the mean equals a specific value, µ 0.
Hypothesis Testing for Means & Proportions
It is important to note that it is possible to observe any sample mean when the true population mean is true (in this example equal to 191), but some sample means are very unlikely. ... (Step 2) for the hypothesis test. The formula for the test statistic is given below. Test Statistic for Testing H 0: p = p 0. if min(np 0, n(1-p 0))> 5.
Hypothesis Testing for the Mean
Table 8.3: One-sided hypothesis testing for the mean: H0: μ ≤ μ0, H1: μ > μ0. Note that the tests mentioned in Table 8.3 remain valid if we replace the null hypothesis by μ = μ0. The reason for this is that in choosing the threshold c, we assumed the worst case scenario, i.e, μ = μ0 .
Z Test: Uses, Formula & Examples
One-Sample Z Test Hypotheses. Null hypothesis (H 0): The population mean equals a hypothesized value (µ = µ 0). Alternative hypothesis (H A): The population mean DOES NOT equal a hypothesized value (µ ≠ µ 0). When the p-value is less or equal to your significance level (e.g., 0.05), reject the null hypothesis. The difference between your ...
PDF Hypothesis Testing for population mean
10) The mean GPA at a certain university is 2.80 with a population standard deviation of 0.3. A random sample of 16 business students from this university had a mean of 2.91. Test to determine whether the mean GPA for business students is greater than the university mean at the 0.10 level of significance. Show all steps. 1) H o
Hypothesis Testing
The hypothesis testing formula for some important test statistics are given below: z = $\frac{\overline{x}-\mu}{\frac{\sigma}{\sqrt{n}}}$. $\overline{x}$ is the sample mean, $\mu$ is the population mean, $\sigma$ is the population standard deviation and n is the size of the sample. ... It is also used to compare the sample mean and ...
10.29: Hypothesis Test for a Difference in Two Population Means (1 of 2
Step 1: Determine the hypotheses. The hypotheses for a difference in two population means are similar to those for a difference in two population proportions. The null hypothesis, H 0, is again a statement of "no effect" or "no difference.". H 0: μ 1 - μ 2 = 0, which is the same as H 0: μ 1 = μ 2. The alternative hypothesis, H a ...
Statistics
The formula for the test statistic (TS) of a population mean is: x ¯ − μ s ⋅ n. x ¯ − μ is the difference between the sample mean ( x ¯) and the claimed population mean ( μ ). s is the sample standard deviation. n is the sample size. In our example: The claimed ( H 0) population mean ( μ) was 55.
3.2: Hypothesis Test about the Population Mean when the Population
Hypothesis Test about the Population Mean (μ) when the Population Standard Deviation (σ) is Known. We are going to examine two equivalent ways to perform a hypothesis test: the classical approach and the p-value approach. The classical approach is based on standard deviations. This method compares the test statistic (Z-score) to a critical ...
Hypothesis Testing in Statistics
Hypothesis Testing Formula. Z = ( x̅ - μ0 ) / (σ /√n) Here, x̅ is the sample mean, μ0 is the population mean, σ is the standard deviation, n is the sample size. How Hypothesis Testing Works? An analyst performs hypothesis testing on a statistical sample to present evidence of the plausibility of the null hypothesis.
9.3 Probability Distribution Needed for Hypothesis Testing
The standardizing formula cannot be solved as it is because we do not have μ, the population mean. However, if we substitute in the hypothesized value of the mean, μ 0 in the formula as above, we can compute a Z value. This is the test statistic for a test of hypothesis for a mean and is presented in Figure 9.3.
Hypothesis Testing
Table of contents. Step 1: State your null and alternate hypothesis. Step 2: Collect data. Step 3: Perform a statistical test. Step 4: Decide whether to reject or fail to reject your null hypothesis. Step 5: Present your findings. Other interesting articles. Frequently asked questions about hypothesis testing.
Introduction to Hypothesis Testing
3. Find the test statistic. Find the test statistic and the corresponding p-value. Often we are analyzing a population mean or proportion and the general formula to find the test statistic is: (sample statistic - population parameter) / (standard deviation of statistic) 4. Reject or fail to reject the null hypothesis.
Test Statistic: Definition, Types & Formulas
T-Tests, Null = 0. When a t-value equals 0, it indicates that your sample data match the null hypothesis exactly. For a 1-sample t-test, when the sample mean equals the hypothesized mean, the numerator is zero, which causes the entire t-value ratio to equal zero.
Master the Numbers: CFA Level I Quantitative Methods ...
Hypothesis Testing. Hypothesis testing is more than just a statistical tool; it's a method for making data-driven decisions. In finance, this could mean anything from evaluating a new investment strategy to testing the impact of economic policies. Understanding Hypotheses
27: Hypothesis Test for a Population Mean Given Statistics Calculator
hypothesis test for a population mean given statistics calculator. Select if the population standard deviation, σ σ, is known or unknown. Then fill in the standard deviation, the sample mean, x¯ x ¯, the sample size, n n, the hypothesized population mean μ0 μ 0, and indicate if the test is left tailed, <, right tailed, >, or two tailed ...

8.6 Hypothesis Tests for a Population Mean with Known Population Standard Deviation

Steps to Conduct a Hypothesis Test for a Population Mean with Known Population Standard Deviation

USING EXCEL TO CALCULE THE P -VALUE FOR A HYPOTHESIS TEST ON A POPULATION MEAN WITH KNOWN POPULATION STANDARD DEVIATION

Concept Review

Attribution

Hypothesis Test for a Mean

State the Hypotheses

Formulate an Analysis Plan

Analyze Sample Data

Sample Size Calculator

Interpret Results

Test Your Understanding

Module: Inference for Means

Review of the Meaning of the P-value

Birth Weights in a Town

What Does the P-Value of 0.082 Tell Us?

Learn By Doing

Review of Type I and Type II Errors

Let’s Summarize

Other Hypothesis Testing Notes

Introduction

Learning Objectives

Introduction to Hypothesis Testing

General Approach: A Simple Example

Hypothesis Testing: Upper-, Lower, and Two Tailed Tests

Type I and Type II Errors

Tests with One Sample, Continuous Outcome

Statistical Significance versus Clinical (Practical) Significance

Tests with One Sample, Dichotomous Outcome

Tests with Two Independent Samples, Continuous Outcome

Tests with Matched Samples, Continuous Outcome

Tests with Two Independent Samples, Dichotomous Outcome

Answers to Selected Problems

8.4.3 Hypothesis Testing for the Mean

Two-sided Tests for the Mean:

One-sided Tests for the Mean:

Hypothesis Testing

What is Hypothesis Testing in Statistics?

Hypothesis Testing Definition

Null Hypothesis

Alternative Hypothesis

Hypothesis Testing P Value

Hypothesis Testing Critical region

Hypothesis Testing Formula

Types of Hypothesis Testing

Hypothesis Testing Z Test

Hypothesis Testing t Test

Hypothesis Testing Chi Square

One Tailed Hypothesis Testing

Two Tailed Hypothesis Testing

Hypothesis Testing Steps

Hypothesis Testing Example

Hypothesis Testing and Confidence Intervals

Examples on Hypothesis Testing

FAQs on Hypothesis Testing

What is the z Test in Hypothesis Testing?

What is the t Test in Hypothesis Testing?

What is the formula for z test in Hypothesis Testing?

What is the p Value in Hypothesis Testing?

What is One Tail Hypothesis Testing?

What is the Alpha Level in Two Tail Hypothesis Testing?

Statistics Tutorial

Hypothesis Testing a Mean

1. Checking the Conditions

2. Defining the Claims

3. Deciding the Significance Level

4. Calculating the Test Statistic

5. Concluding

The Critical Value Approach

The P-Value Approach

Calculating a P-Value for a Hypothesis Test with Programming

Left-Tailed and Two-Tailed Tests

COLOR PICKER

Contact Sales

Report Error

Top Tutorials

Tutorial Playlist

The Best Guide to Understand Bayes Theorem

A Complete Guide to Get a Grasp of Time Series Analysis

The Complete Guide to Understand Pearson's Correlation