{
"cells": [
{
"cell_type": "markdown",
"id": "b7f57073-44ef-4bbf-a4ff-17a803e78d53",
"metadata": {},
"source": [
"# Hypothesis testing for single population mean when the variance $\\sigma^{2}$ is known "
]
},
{
"cell_type": "markdown",
"id": "7450df2c-9da8-4e75-b87d-85a21881bc07",
"metadata": {},
"source": [
"- Assume that we are interested in **making an inference about the mean of a normally distributed single population with known variance** and \n",
"we have **a random sample** of size n drawn from this population:\n",
"\n",
"$$ \n",
"X_{1}, X_{2}, \\dots, X_{n} \\sim N(\\mu, \\sigma^{2}). \n",
"$$\n",
"\n",
"- When the sample is coming from a normally distributed population with known variance $\\sigma^{2}$, under $H_{0}$, the test concerning the population mean $\\mu$, is based on the following **test statistic**:\n",
"\n",
"$$\n",
"\\begin{aligned}\n",
"Z^{*} &=\\frac{\\bar{X}-\\mu_{0}}{\\sigma/\\sqrt{n}}\\sim N(0,1) \n",
"\\end{aligned}\n",
"$$\n",
"\n",
"- where $\\bar{X}$ is the sample mean, $\\mu_{0}$ is the hypothesized population mean value, and $\\sigma$ is the population standard deviation, and $n$ is the sample size.\n",
"\n",
"- For a **two-sided test**:\n",
"\n",
"$$\n",
"\\begin{aligned}\n",
"H_{0} &: \\mu = \\mu_{0} \\nonumber \\\\ \n",
"H_{1} &: \\mu \\neq \\mu_{0} \\nonumber\n",
"\\end{aligned}\n",
"$$\n",
"\n",
"- Reject $H_{0}$ if $z^{*} \\leq -z(1-\\alpha/2)$ OR if $z^{*} \\geq z(1-\\alpha/2)$, otherwise fail to reject $H_{0}$.\n",
"\n",
"- Here, $z(1-\\alpha/2)$ refers to the upper $1-\\alpha/2$ percentage point of the standard normal distribution.\n",
"\n",
"- For a **one-sided upper-tail test**:\n",
"\n",
"$$\n",
"\\begin{aligned}\n",
"H_{0} &: \\mu \\leq \\mu_{0} \\nonumber \\\\\n",
"H_{1} &: \\mu > \\mu_{0} \\nonumber\n",
"\\end{aligned}\n",
"$$\n",
"\n",
"- Reject $H_{0}$ if $z^{*} \\geq z(1-\\alpha)$, otherwise fail to reject $H_{0}$.\n",
"- Here, $z(1-\\alpha)$ refers to the upper $\\alpha$ percentage point of the standard normal distribution.\n",
"\n",
"- For a **one-sided lower-tail test**:\n",
"\n",
"$$\n",
"\\begin{aligned}\n",
"H_{0} &: \\mu \\geq \\mu_{0} \\nonumber \\\\ \n",
"H_{1} &: \\mu < \\mu_{0} \\nonumber\n",
"\\end{aligned}\n",
"$$\n",
"\n",
"- Reject $H_{0}$ if $z^{*} \\leq z(\\alpha)=-z(1-\\alpha)$, otherwise fail to reject $H_{0}$.\n",
"\n",
"- The P-value is the probability that we would observe a more extreme statistic than we did if the null hypothesis were true. \n",
"- If $z^{*}$ is the **computed value of the test statistic**, the P-value is:\n",
"\n",
"$$\n",
"P-value = \\begin{cases} \n",
" 2[1-\\Phi(|z^{*}|)] & \\quad {for} \\quad H_{0} : \\mu = \\mu_{0} \\quad \\text{vs} \\quad H_{1} : \\mu \\neq \\mu_{0} \\nonumber \\\\\n",
" 1-\\Phi(z^{*}) & \\quad {for} \\quad H_{0} : \\mu \\leq \\mu_{0} \\quad \\text{vs} \\quad H_{1} : \\mu > \\mu_{0} \\nonumber \\\\\n",
" \\Phi(z^{*}) & \\quad {for} \\quad H_{0} : \\mu \\geq \\mu_{0} \\quad \\text{vs} \\quad H_{1} : \\mu < \\mu_{0} \\nonumber \n",
" \\end{cases}\n",
"$$\n",
"\n",
"- where $\\Phi (z^{*}) = Pr(Z \\leq z^{*})$ is the standard normal cumulative distribution function.\n",
"- Decision rule: If $P-value$ \\leq $\\alpha$, then reject $H_{0}$, otherwise fail to reject $H_{0}$."
]
},
{
"cell_type": "markdown",
"id": "3a10f2d2-39ff-48b7-a9fc-7650548339a7",
"metadata": {},
"source": [
"## Example\n",
"- At a certain production facility that assembles computer keyboards, from past experience the assembly time is known to follow a normal distribution with mean ($\\mu$) of 130 seconds and standard deviation ($\\sigma$) of 15 seconds. \n",
"- The new production supervisor suspects that the average time to assemble the keyboards does not quite follow the specified value. \n",
"- To examine this problem, he measures the times for 100 assemblies and found that the sample assembly time average ($\\bar{X}$) is $126.8$ seconds. \n",
"- Can the supervisor conclude at the $5\\%$ level of significance that the mean assembly time of 130 seconds is incorrect?"
]
},
{
"cell_type": "code",
"execution_count": 25,
"id": "29d4bce6-ed77-4731-807e-64cefa18ea1c",
"metadata": {},
"outputs": [],
"source": [
"# define the hypothesis testing function for testing single normal population\n",
"# mean when pop. variance is known\n",
"\n",
"import numpy as np\n",
"import scipy.stats as stats\n",
" \n",
"def OneSampZ(Xbar, sigma1, n1, mu0 = 0, alpha = 0.05, direction = \"two_sided\"):\n",
" \n",
" z_star = (Xbar-mu0)/np.sqrt(sigma1**2/n1)\n",
"\n",
" if (direction == \"two_sided\"): \n",
" p_val = 2*(1 - stats.norm.cdf(abs(z_star)))\n",
" elif (direction == \"one_sided_upper_tail\"): \n",
" p_val = (1 - stats.norm.cdf(z_star))\n",
" else:\n",
" p_val = stats.norm.cdf(z_star)\n",
" \n",
" if (p_val < alpha):\n",
" Hypothesis_Status = 'Reject Null Hypothesis : Significant' \n",
" else:\n",
" Hypothesis_Status = 'Do not reject Null Hypothesis : Not Significant'\n",
" \n",
" print (Hypothesis_Status)\n",
"\n",
" z_star = np.round(z_star,4)\n",
" p_val = np.round(p_val,4)\n",
" \n",
" return z_star, p_val"
]
},
{
"cell_type": "code",
"execution_count": 27,
"id": "2b2412e6-fbd3-4609-ac07-627c968e8d20",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Reject Null Hypothesis : Significant\n"
]
},
{
"data": {
"text/plain": [
"(-2.1333, 0.0329)"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"OneSampZ(Xbar = 126.8, sigma1 = 15, n1 = 100, mu0 = 130, alpha = 0.05, direction = \"two_sided\")"
]
},
{
"cell_type": "markdown",
"id": "ba5b554e-fccd-4e4b-940b-2d0f95fefa90",
"metadata": {},
"source": [
"# Hypothesis testing for single population mean when the variance $\\sigma^{2}$ is unknown "
]
},
{
"cell_type": "markdown",
"id": "91f71216-af38-4544-95c9-8bf2314a895c",
"metadata": {},
"source": [
"- When the sample is coming from a normally distributed population with **unknown variance $\\sigma^{2}$**, the test concerning the population mean $\\mu$, is based on the following **test statistic**, under $H_{0}$:\n",
"$$\n",
"\\begin{aligned}\n",
"t^{*} =\\frac{\\bar{X}-\\mu_{0}}{s/\\sqrt{n}}\\sim t_{n-1} \n",
"\\end{aligned}\n",
"$$\n",
"\n",
"- where $\\bar{X}$ is the sample mean, $\\mu_{0}$ is the hypothesized population mean value, and $s$ is the sample standard deviation, and $n$ is the sample size.\n",
"\n",
"- For a **two-sided test**:\n",
"\n",
"$$\n",
"\\begin{aligned}\n",
"H_{0} &: \\mu = \\mu_{0} \\nonumber \\\\ \n",
"H_{1} &: \\mu \\neq \\mu_{0} \\nonumber\n",
"\\end{aligned}\n",
"$$\n",
"\n",
"- Reject $H_{0}$ if $t^{*} \\leq -t_{(1-\\alpha/2,n-1)}$ OR if $t^{*} \\geq t_{(1-\\alpha/2,n-1)}$, otherwise fail to reject $H_{0}$.\n",
"\n",
"- Here, $t(1-\\alpha/2, n-1)$ refers to the upper $1-\\alpha/2$ percentage point of the t distribution with $n-1$ degrees of freedom.\n",
"\n",
"- For a **one-sided upper-tail test**:\n",
"$$\n",
"\\begin{aligned}\n",
"H_{0} &: \\mu \\leq \\mu_{0} \\nonumber \\\\\n",
"H_{1} &: \\mu > \\mu_{0} \\nonumber\n",
"\\end{aligned}\n",
"$$\n",
"\n",
"- Reject $H_{0}$ if $t^{*} \\geq t_{(1-\\alpha,n-1)}$, otherwise fail to reject $H_{0}$.\n",
"\n",
"- Here, $t(1-\\alpha, n-1)$ refers to the upper $1-\\alpha$ percentage point of the t distribution with $n-1$ degrees of freedom. \n",
"\n",
"- For a **one-sided lower-tail test**:\n",
"\n",
"$$\n",
"\\begin{aligned}\n",
"H_{0} &: \\mu \\geq \\mu_{0} \\nonumber \\\\ \n",
"H_{1} &: \\mu < \\mu_{0} \\nonumber\n",
"\\end{aligned}\n",
"$$\n",
"\n",
"- Reject $H_{0}$ if $t^{*} \\leq t_{(\\alpha,n-1)}=-t_{(1-\\alpha,n-1)}$, otherwise fail to reject $H_{0}$.\n",
"- Here, $t(1-\\alpha, n-1)$ refers to the upper $1-\\alpha$ percentage point of the t distribution with $n-1$ degrees of freedom. \n"
]
},
{
"cell_type": "markdown",
"id": "e623d54f-b5cb-46d6-a480-d82509e7e851",
"metadata": {},
"source": [
"## Example\n",
"\n",
"- In past summers in a large library system, the mean number of books borrowed per cardholder was 8.50. \n",
"- The library administration would like to test whether the mean number of books borrowed per cardholder this summer under modified loan arrangements differs from the level of past summers ($H_{1}$) or not ($H_{0}$). \n",
"- A random sample of 25 cardholders showed the following results for borrowing this summer: $\\bar{X}=9.34$ books and $s=3.31$ books.\n",
"- Conduct hypothesis test to test the claim, controlling the significance level at $0.05$."
]
},
{
"cell_type": "code",
"execution_count": 35,
"id": "20046079-e791-409f-a693-1789ac1816b9",
"metadata": {},
"outputs": [],
"source": [
"# define the hypothesis testing function for testing single normal population\n",
"# mean when pop. variance is known\n",
"\n",
"import numpy as np\n",
"import scipy.stats as stats\n",
" \n",
"def OneSampt(Xbar, sd1, n1, mu0 = 0, alpha = 0.05, direction = \"two_sided\"):\n",
" \n",
" t_star = (Xbar-mu0)/np.sqrt(sd1**2/n1)\n",
"\n",
" if (direction == \"two_sided\"): \n",
" p_val = 2*(1 - stats.t.cdf(abs(t_star), df = n1-1))\n",
" elif (direction == \"one_sided_upper_tail\"): \n",
" p_val = (1 - stats.t.cdf(t_star, df = n1-1))\n",
" else:\n",
" p_val = stats.t.cdf(t_star, df = n1-1)\n",
" \n",
" if (p_val < alpha):\n",
" Hypothesis_Status = 'Reject Null Hypothesis : Significant' \n",
" else:\n",
" Hypothesis_Status = 'Do not reject Null Hypothesis : Not Significant'\n",
" \n",
" print (Hypothesis_Status)\n",
"\n",
" t_star = np.round(t_star,4)\n",
" p_val = np.round(p_val,4)\n",
" \n",
" return t_star, p_val"
]
},
{
"cell_type": "code",
"execution_count": 36,
"id": "40c949ca-736d-459f-ac21-9b640d2cfca6",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Do not reject Null Hypothesis : Not Significant\n"
]
},
{
"data": {
"text/plain": [
"(1.2689, 0.2167)"
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"OneSampt(Xbar = 9.34, sd1 = 3.31, n1 = 25, mu0 = 8.5, alpha = 0.05, direction = \"two_sided\")"
]
},
{
"cell_type": "markdown",
"id": "7bd53f11-95ef-4260-a0ba-c3dd3d96ca92",
"metadata": {},
"source": [
"## Example\n",
"\n",
"- The developer of a decision-support software package wishes to test whether users consider a color graphics enhancement to be beneficial, on balance, given its list price of \\$800. \n",
"- A random sample of $10$ users of the package is invited to try out the enhancement and rate it on a scale ranging from $-5$ (completely useless) and $5$ (very beneficial). \n",
"- The results are as follows: the sample mean $\\bar{X}=0.535$ and the sample standard deviation $s=2.3$. \n",
"- Test the hypothesis $H_{0}:\\mu \\leq 0$ and $H_{1}:\\mu > 0$, where $\\mu$ denotes the mean rating of users at the significance level of $0.01$. \n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 37,
"id": "4ce1520b-21c1-4f5b-8bad-5cffbb59acb0",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Do not reject Null Hypothesis : Not Significant\n"
]
},
{
"data": {
"text/plain": [
"(0.7356, 0.2404)"
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"OneSampt(Xbar = 0.535, sd1 = 2.3, n1 = 10, mu0 = 0, alpha = 0.01, direction = \"one_sided_upper_tail\")"
]
},
{
"cell_type": "markdown",
"id": "5ed9082c",
"metadata": {},
"source": [
"#### Session Info"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "09fde6b9",
"metadata": {},
"outputs": [],
"source": [
"import session_info\n",
"session_info.show()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.12"
}
},
"nbformat": 4,
"nbformat_minor": 5
}