Regression for Exponential Growth - Applied to the Corona Virus

In [1]:
import statsmodels.api as sm
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

Import the data

In [2]:
data = pd.read_csv('MyPath//full_data_2.csv', sep = ';')
data.head(10)
Out[2]:
Time Infections
0 0 1
1 1 1
2 2 2
3 3 2
4 4 5
5 5 5
6 6 5
7 7 5
8 8 6
9 9 7

Apply log transformation to the number of infections

In [3]:
data['logInfections'] = np.log(data.Infections)
data.head(10)
Out[3]:
Time Infections logInfections
0 0 1 0.000000
1 1 1 0.000000
2 2 2 0.693147
3 3 2 0.693147
4 4 5 1.609438
5 5 5 1.609438
6 6 5 1.609438
7 7 5 1.609438
8 8 6 1.791759
9 9 7 1.945910

Statsmodels Linear Regression

In [4]:
X = data.Time
X = sm.add_constant(X)
In [5]:
y = data.logInfections
In [6]:
mod = sm.OLS(y, X)
res = mod.fit()
print(res.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:          logInfections   R-squared:                       0.908
Model:                            OLS   Adj. R-squared:                  0.906
Method:                 Least Squares   F-statistic:                     510.4
Date:                Tue, 17 Mar 2020   Prob (F-statistic):           1.51e-28
Time:                        18:26:17   Log-Likelihood:                -45.433
No. Observations:                  54   AIC:                             94.87
Df Residuals:                      52   BIC:                             98.84
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          0.4480      0.154      2.918      0.005       0.140       0.756
Time           0.1128      0.005     22.591      0.000       0.103       0.123
==============================================================================
Omnibus:                        4.104   Durbin-Watson:                   0.172
Prob(Omnibus):                  0.128   Jarque-Bera (JB):                2.726
Skew:                           0.370   Prob(JB):                        0.256
Kurtosis:                       2.186   Cond. No.                         60.7
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Reading the table to make predictions

In [7]:
# log initial = 0.4480
# initial = 
np.exp(0.4480)
Out[7]:
1.5651786956535216
In [8]:
# log 1 + r = 0.1128
# real 1 + r = 
np.exp(0.1128)
Out[8]:
1.119408028953142
In [9]:
# actual function = 
#y = np.exp(0.4480) * np.exp(0.1128) ** t
In [10]:
def linear_predictions(t):
    return np.exp(0.4480) * np.exp(0.1128) ** t
In [11]:
data['Predictions'] = data.Time.apply(linear_predictions)
data.head(10)
Out[11]:
Time Infections logInfections Predictions
0 0 1 0.000000 1.565179
1 1 1 0.000000 1.752074
2 2 2 0.693147 1.961285
3 3 2 0.693147 2.195478
4 4 5 1.609438 2.457636
5 5 5 1.609438 2.751098
6 6 5 1.609438 3.079601
7 7 5 1.609438 3.447330
8 8 6 1.791759 3.858969
9 9 7 1.945910 4.319761
In [12]:
plt.plot(data.Time, data.Infections, 'red')
plt.plot(data.Time, data.Predictions, 'blue')
plt.title('Predicted number of cases vs real number of cases')
plt.xlabel('Time')
plt.ylabel('Infections')
plt.legend()
Out[12]:
<matplotlib.legend.Legend at 0xf87c1b0>