Regression Analysis¶

Let’s apply regression analysis on this data.

import warnings
warnings.filterwarnings("ignore")
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import scipy.stats as stats

#present data
data = {'Education': [11,12,13,15,8,10,11,12,17,11],
        'Income': [25,27,30,41,18,23,26,24,48,26]} 
data

{'Education': [11, 12, 13, 15, 8, 10, 11, 12, 17, 11],
 'Income': [25, 27, 30, 41, 18, 23, 26, 24, 48, 26]}

import statsmodels.api as sm

# convert into a data frame
df = pd.DataFrame(data,columns=['Education','Income']) 

# get predictor and response
X = df['Education'] 
Y = df['Income']

# Add the intercept to the design matrix
X = sm.add_constant(X) 

model = sm.OLS(Y, X).fit()
predictions = model.predict(X) 

print_model = model.summary()
print(print_model)

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                 Income   R-squared:                       0.932
Model:                            OLS   Adj. R-squared:                  0.923
Method:                 Least Squares   F-statistic:                     108.9
Date:                Sat, 28 May 2022   Prob (F-statistic):           6.18e-06
Time:                        16:28:35   Log-Likelihood:                -22.203
No. Observations:                  10   AIC:                             48.41
Df Residuals:                       8   BIC:                             49.01
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const        -12.1655      4.004     -3.038      0.016     -21.400      -2.931
Education      3.4138      0.327     10.434      0.000       2.659       4.168
==============================================================================
Omnibus:                        2.110   Durbin-Watson:                   2.085
Prob(Omnibus):                  0.348   Jarque-Bera (JB):                1.025
Skew:                          -0.771   Prob(JB):                        0.599
Kurtosis:                       2.710   Cond. No.                         62.6
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Spring 2022

Regression Analysis

Contents

Regression Analysis¶

Interpretation of the output¶

Session Info¶