{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.7.6"
    },
    "toc": {
      "base_numbering": 1,
      "nav_menu": {},
      "number_sections": true,
      "sideBar": true,
      "skip_h1_title": false,
      "title_cell": "Table of Contents",
      "title_sidebar": "Contents",
      "toc_cell": false,
      "toc_position": {},
      "toc_section_display": true,
      "toc_window_display": true
    },
    "colab": {
      "name": "models_fitting.ipynb",
      "provenance": [],
      "collapsed_sections": [],
      "toc_visible": true
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "y2CRKHBqUe4D",
        "colab_type": "text"
      },
      "source": [
        "### Import modules"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "6asAdwA_Ue4J",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "import numpy as np\n",
        "import pandas as pd\n",
        "import matplotlib.pyplot as plt"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "TdbGZojnUe3-",
        "colab_type": "text"
      },
      "source": [
        "# Introduction\n",
        "This practical session is intended to explore tumor growth data and interpret it using mathematical models. It is divided into three parts (only the first two parts will be covered in this session):\n",
        "1. Analysis of the data by basic plots\n",
        "2. Fitting and comparing tumor growth models to the data in order to understand methods of single-individual parameter estimation (nonlinear regression). Use these tools to understand **tumor growth laws**\n",
        "3. Using the model(s) to **predict** future tumor growth with only a limited number of initial data points\n",
        "\n",
        "The data provided consists of measurements of tumor volumes from tumors implanted subcutaneously in the back of mice. The cells are from a murine lung cancer cell line (Lewis Lung Carcinoma). The volumes were computed from two one dimensional measures recorded using a caliper (the length $L$ and width $w$) and using the formula $V = \\frac{1}{2}L\\times w^2$. Volumes are given in mm$^3$ as a function of days following injection of the cells ($10^6$ cells $\\simeq$ 1 mm$^3$ injected on day 0).\n",
        "\n",
        "Are you ready to start your exploration? \n",
        "\n",
        "Good luck on your adventure! :)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "lskM2vDiUe4C",
        "colab_type": "text"
      },
      "source": [
        "# 1. Data analysis"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "mnzRMfMRUe4M",
        "colab_type": "text"
      },
      "source": [
        "Load the data in a pandas dataframe and display it"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "YFgKEFocbtkQ",
        "colab_type": "code",
        "outputId": "e97a0f78-bf2a-4e22-af79-465aba3d2a82",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 514
        }
      },
      "source": [
        "url_data = \"http://benzekry.perso.math.cnrs.fr/DONNEES/data_table.xlsx\"\n",
        "df       = pd.read_excel(url_data, index_col=0)\n",
        "df"
      ],
      "execution_count": 0,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>1</th>\n",
              "      <th>2</th>\n",
              "      <th>3</th>\n",
              "      <th>4</th>\n",
              "      <th>5</th>\n",
              "      <th>6</th>\n",
              "      <th>7</th>\n",
              "      <th>8</th>\n",
              "      <th>9</th>\n",
              "      <th>10</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>5</th>\n",
              "      <td>NaN</td>\n",
              "      <td>54.886377</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>68.809022</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>6</th>\n",
              "      <td>NaN</td>\n",
              "      <td>59.408036</td>\n",
              "      <td>89.231295</td>\n",
              "      <td>55.348590</td>\n",
              "      <td>59.853289</td>\n",
              "      <td>46.502387</td>\n",
              "      <td>94.727972</td>\n",
              "      <td>NaN</td>\n",
              "      <td>18.567329</td>\n",
              "      <td>26.880016</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>7</th>\n",
              "      <td>50.021795</td>\n",
              "      <td>85.662278</td>\n",
              "      <td>157.351502</td>\n",
              "      <td>56.175814</td>\n",
              "      <td>58.988502</td>\n",
              "      <td>55.155193</td>\n",
              "      <td>126.094176</td>\n",
              "      <td>NaN</td>\n",
              "      <td>73.293694</td>\n",
              "      <td>NaN</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>11</th>\n",
              "      <td>309.519493</td>\n",
              "      <td>261.572775</td>\n",
              "      <td>221.698580</td>\n",
              "      <td>103.330909</td>\n",
              "      <td>179.664759</td>\n",
              "      <td>251.222209</td>\n",
              "      <td>333.136619</td>\n",
              "      <td>201.165288</td>\n",
              "      <td>126.820889</td>\n",
              "      <td>144.151300</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>12</th>\n",
              "      <td>324.895878</td>\n",
              "      <td>412.265650</td>\n",
              "      <td>327.492254</td>\n",
              "      <td>155.320341</td>\n",
              "      <td>309.787647</td>\n",
              "      <td>341.175444</td>\n",
              "      <td>538.680724</td>\n",
              "      <td>316.382967</td>\n",
              "      <td>222.661158</td>\n",
              "      <td>193.401925</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>13</th>\n",
              "      <td>450.842120</td>\n",
              "      <td>488.450738</td>\n",
              "      <td>461.437963</td>\n",
              "      <td>167.958671</td>\n",
              "      <td>470.571322</td>\n",
              "      <td>438.335387</td>\n",
              "      <td>669.929621</td>\n",
              "      <td>338.240373</td>\n",
              "      <td>244.726914</td>\n",
              "      <td>281.171233</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>14</th>\n",
              "      <td>572.450814</td>\n",
              "      <td>618.854795</td>\n",
              "      <td>641.444986</td>\n",
              "      <td>219.378690</td>\n",
              "      <td>480.930314</td>\n",
              "      <td>859.952765</td>\n",
              "      <td>762.527617</td>\n",
              "      <td>411.958788</td>\n",
              "      <td>333.629836</td>\n",
              "      <td>294.886207</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>15</th>\n",
              "      <td>664.336606</td>\n",
              "      <td>798.997212</td>\n",
              "      <td>746.868414</td>\n",
              "      <td>412.378378</td>\n",
              "      <td>488.777983</td>\n",
              "      <td>854.727952</td>\n",
              "      <td>923.717646</td>\n",
              "      <td>586.667016</td>\n",
              "      <td>367.475268</td>\n",
              "      <td>391.884141</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>18</th>\n",
              "      <td>1151.693754</td>\n",
              "      <td>1218.721058</td>\n",
              "      <td>1359.458389</td>\n",
              "      <td>584.192211</td>\n",
              "      <td>1021.721046</td>\n",
              "      <td>1143.279505</td>\n",
              "      <td>NaN</td>\n",
              "      <td>991.881984</td>\n",
              "      <td>805.778850</td>\n",
              "      <td>744.954870</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>19</th>\n",
              "      <td>1338.383299</td>\n",
              "      <td>1415.471856</td>\n",
              "      <td>1626.874371</td>\n",
              "      <td>762.074018</td>\n",
              "      <td>1278.530775</td>\n",
              "      <td>1645.406820</td>\n",
              "      <td>NaN</td>\n",
              "      <td>1219.899900</td>\n",
              "      <td>1030.034281</td>\n",
              "      <td>990.331786</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>20</th>\n",
              "      <td>1522.807849</td>\n",
              "      <td>1410.149208</td>\n",
              "      <td>2063.912472</td>\n",
              "      <td>880.325721</td>\n",
              "      <td>1377.381845</td>\n",
              "      <td>1950.482691</td>\n",
              "      <td>NaN</td>\n",
              "      <td>1833.096551</td>\n",
              "      <td>1272.818884</td>\n",
              "      <td>1085.314905</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>21</th>\n",
              "      <td>1897.073737</td>\n",
              "      <td>1524.126579</td>\n",
              "      <td>NaN</td>\n",
              "      <td>1040.074430</td>\n",
              "      <td>1538.922047</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>2131.605693</td>\n",
              "      <td>1555.359077</td>\n",
              "      <td>1331.189667</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>22</th>\n",
              "      <td>NaN</td>\n",
              "      <td>1935.415344</td>\n",
              "      <td>NaN</td>\n",
              "      <td>1335.136464</td>\n",
              "      <td>1958.560253</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>1671.148523</td>\n",
              "      <td>1641.333918</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>24</th>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>1850.447333</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>1992.067465</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>25</th>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>2079.445167</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "      <td>NaN</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>"
            ],
            "text/plain": [
              "             1            2   ...           9            10\n",
              "5           NaN    54.886377  ...          NaN          NaN\n",
              "6           NaN    59.408036  ...    18.567329    26.880016\n",
              "7     50.021795    85.662278  ...    73.293694          NaN\n",
              "11   309.519493   261.572775  ...   126.820889   144.151300\n",
              "12   324.895878   412.265650  ...   222.661158   193.401925\n",
              "13   450.842120   488.450738  ...   244.726914   281.171233\n",
              "14   572.450814   618.854795  ...   333.629836   294.886207\n",
              "15   664.336606   798.997212  ...   367.475268   391.884141\n",
              "18  1151.693754  1218.721058  ...   805.778850   744.954870\n",
              "19  1338.383299  1415.471856  ...  1030.034281   990.331786\n",
              "20  1522.807849  1410.149208  ...  1272.818884  1085.314905\n",
              "21  1897.073737  1524.126579  ...  1555.359077  1331.189667\n",
              "22          NaN  1935.415344  ...  1671.148523  1641.333918\n",
              "24          NaN          NaN  ...          NaN  1992.067465\n",
              "25          NaN          NaN  ...          NaN          NaN\n",
              "\n",
              "[15 rows x 10 columns]"
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 4
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "sbiF6ZyJbtkV",
        "colab_type": "text"
      },
      "source": [
        "Get the time vector. It is in days"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "eTdTOaOTbtka",
        "colab_type": "text"
      },
      "source": [
        "Plot the growth of the first three mice."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "7TK3QDh2btkf",
        "colab_type": "text"
      },
      "source": [
        "Plot all tumor growth on the same panel"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "RAAlJtmxbtkj",
        "colab_type": "text"
      },
      "source": [
        "Plot the average of the data with error bars as standard deviations"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "zc8rMfN5btkn",
        "colab_type": "text"
      },
      "source": [
        "From the individual plots, what tumor growth pattern/equation would you suggest? How could you simply graphically test it?"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "MClZjhiKZyKl",
        "colab_type": "text"
      },
      "source": [
        "# 2. Model fitting. Parameter estimation"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "3O6uTUumZyKo",
        "colab_type": "text"
      },
      "source": [
        "**Problem:**\n",
        "\n",
        "Considering $n$ volume observations $(V_1,\\cdots,V_n)$ at time points $(t_1, \\cdots, t_n)$, we would like to see whether these data could have been generated by a given function \n",
        "$$V: \\begin{array}{ccc} \\mathbb{R}\\times \\mathbb{R}^p & \\longrightarrow & \\mathbb{R} \\\\\n",
        "\t\t\t\t(t,\\theta)            & \\longmapsto    & V(t,\\theta)\\end{array}$$\n",
        "depending on time $t$ and a vector of parameters $\\theta\\in \\mathbb{R}^p$.\n",
        "\n",
        "Another – closely related – problem is to find a set of parameters $\\hat{\\theta}$ that would \"best\" describe the data.\n",
        "\n",
        "In our context a model will be the combination of two parts:\n",
        "1. The structural model $V$ (deterministic)\n",
        "2. An observation model linking the model to the observations (error model, stochastic)\n",
        "\n",
        "The latter is defined by assuming that the observations were generated by the model plus some measurements error and we write\n",
        "$$\n",
        "V_j = V(t_j; \\theta) + \\sigma_j \\varepsilon_j.\n",
        "$$\n",
        "where $\\sigma_j\\varepsilon_j$ describes the error term. It is often natural to assume that this error term is gaussian, with expectation zero. $\\sigma_j$ is the standard deviation of the error and the $\\varepsilon_j$ are independent and identically distributed random variables with\n",
        "$$\n",
        "\\varepsilon_j \\sim \\mathcal{N}(0, 1).\n",
        "$$"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "RtS_iXv_e5GS",
        "colab_type": "text"
      },
      "source": [
        "## Linear least-squares\n",
        "We will first assume a constant error model (i.e. $\\sigma_j=\\sigma,\\, \\forall j$) and an exponential structural model:\n",
        "$$\n",
        "V\\left(t; \\left(V_0, \\alpha \\right)\\right) = V_0 e^{\\alpha t}.\n",
        "$$\n",
        "Transform the problem so that it reduces to a linear regression."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "7RLz5Wr9hXYM",
        "colab_type": "text"
      },
      "source": [
        "$$\n",
        "\\ln(V_j) = \\ln\\left(V_0\\right) + \\alpha t_j + \\sigma \\varepsilon_j\n",
        "$$"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ESRz0WXriWxi",
        "colab_type": "text"
      },
      "source": [
        "Using the formula seen in class, build the least-squares matrix for fitting the first tumov volume data"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "AZ-nGBjfovYw",
        "colab_type": "text"
      },
      "source": [
        "Solve the system"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "aw8g5xJaqa5m",
        "colab_type": "text"
      },
      "source": [
        "Plot the regression line together with the data"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "03prAgfOvioW",
        "colab_type": "text"
      },
      "source": [
        "Compute the estimate of $\\sigma^2$, given by\n",
        "$$\n",
        "s^2 = \\frac{1}{n-2}\\sum_{j=1}^n\\left(y_j - M\\hat{\\theta}\\right)^2\n",
        "$$\n",
        "with $\\hat{\\theta}$ the vector of optimal parameters just found."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "wEwSX7XbwO1J",
        "colab_type": "text"
      },
      "source": [
        "Deduce the estimation of the covariance matrix of the parameter estimates, given by\n",
        "$$\n",
        "\\sigma \\left(M^T M\\right)^{-1}\n",
        "$$\n",
        "Print the standard errors on the parameter estimates."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "rmEQLjBzFDw3",
        "colab_type": "text"
      },
      "source": [
        "Compute a 95\\% interval for the parameters"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "brwDijZ3qjEA",
        "colab_type": "text"
      },
      "source": [
        "Let's use the built-in ordinary least-squares from the statsmodels module to verify our results"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "8bUA0EY-qkpC",
        "colab_type": "code",
        "outputId": "ad8c0572-a4ea-45f0-9d16-d350d95911c7",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 478
        }
      },
      "source": [
        "import statsmodels.api as sm\n",
        "X = sm.add_constant(time)\n",
        "model = sm.OLS(y, X)\n",
        "results = model.fit()\n",
        "results.summary()"
      ],
      "execution_count": 0,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "/usr/local/lib/python3.6/dist-packages/scipy/stats/stats.py:1535: UserWarning: kurtosistest only valid for n>=20 ... continuing anyway, n=10\n",
            "  \"anyway, n=%i\" % int(n))\n"
          ],
          "name": "stderr"
        },
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<table class=\"simpletable\">\n",
              "<caption>OLS Regression Results</caption>\n",
              "<tr>\n",
              "  <th>Dep. Variable:</th>            <td>y</td>        <th>  R-squared:         </th> <td>   0.936</td>\n",
              "</tr>\n",
              "<tr>\n",
              "  <th>Model:</th>                   <td>OLS</td>       <th>  Adj. R-squared:    </th> <td>   0.928</td>\n",
              "</tr>\n",
              "<tr>\n",
              "  <th>Method:</th>             <td>Least Squares</td>  <th>  F-statistic:       </th> <td>   117.8</td>\n",
              "</tr>\n",
              "<tr>\n",
              "  <th>Date:</th>             <td>Sat, 08 Feb 2020</td> <th>  Prob (F-statistic):</th> <td>4.59e-06</td>\n",
              "</tr>\n",
              "<tr>\n",
              "  <th>Time:</th>                 <td>15:20:56</td>     <th>  Log-Likelihood:    </th> <td>-0.57115</td>\n",
              "</tr>\n",
              "<tr>\n",
              "  <th>No. Observations:</th>      <td>    10</td>      <th>  AIC:               </th> <td>   5.142</td>\n",
              "</tr>\n",
              "<tr>\n",
              "  <th>Df Residuals:</th>          <td>     8</td>      <th>  BIC:               </th> <td>   5.747</td>\n",
              "</tr>\n",
              "<tr>\n",
              "  <th>Df Model:</th>              <td>     1</td>      <th>                     </th>     <td> </td>   \n",
              "</tr>\n",
              "<tr>\n",
              "  <th>Covariance Type:</th>      <td>nonrobust</td>    <th>                     </th>     <td> </td>   \n",
              "</tr>\n",
              "</table>\n",
              "<table class=\"simpletable\">\n",
              "<tr>\n",
              "    <td></td>       <th>coef</th>     <th>std err</th>      <th>t</th>      <th>P>|t|</th>  <th>[0.025</th>    <th>0.975]</th>  \n",
              "</tr>\n",
              "<tr>\n",
              "  <th>const</th> <td>    2.8756</td> <td>    0.333</td> <td>    8.640</td> <td> 0.000</td> <td>    2.108</td> <td>    3.643</td>\n",
              "</tr>\n",
              "<tr>\n",
              "  <th>x1</th>    <td>    0.2317</td> <td>    0.021</td> <td>   10.854</td> <td> 0.000</td> <td>    0.182</td> <td>    0.281</td>\n",
              "</tr>\n",
              "</table>\n",
              "<table class=\"simpletable\">\n",
              "<tr>\n",
              "  <th>Omnibus:</th>       <td> 3.611</td> <th>  Durbin-Watson:     </th> <td>   1.356</td>\n",
              "</tr>\n",
              "<tr>\n",
              "  <th>Prob(Omnibus):</th> <td> 0.164</td> <th>  Jarque-Bera (JB):  </th> <td>   1.439</td>\n",
              "</tr>\n",
              "<tr>\n",
              "  <th>Skew:</th>          <td>-0.927</td> <th>  Prob(JB):          </th> <td>   0.487</td>\n",
              "</tr>\n",
              "<tr>\n",
              "  <th>Kurtosis:</th>      <td> 3.137</td> <th>  Cond. No.          </th> <td>    57.5</td>\n",
              "</tr>\n",
              "</table><br/><br/>Warnings:<br/>[1] Standard Errors assume that the covariance matrix of the errors is correctly specified."
            ],
            "text/plain": [
              "<class 'statsmodels.iolib.summary.Summary'>\n",
              "\"\"\"\n",
              "                            OLS Regression Results                            \n",
              "==============================================================================\n",
              "Dep. Variable:                      y   R-squared:                       0.936\n",
              "Model:                            OLS   Adj. R-squared:                  0.928\n",
              "Method:                 Least Squares   F-statistic:                     117.8\n",
              "Date:                Sat, 08 Feb 2020   Prob (F-statistic):           4.59e-06\n",
              "Time:                        15:20:56   Log-Likelihood:               -0.57115\n",
              "No. Observations:                  10   AIC:                             5.142\n",
              "Df Residuals:                       8   BIC:                             5.747\n",
              "Df Model:                           1                                         \n",
              "Covariance Type:            nonrobust                                         \n",
              "==============================================================================\n",
              "                 coef    std err          t      P>|t|      [0.025      0.975]\n",
              "------------------------------------------------------------------------------\n",
              "const          2.8756      0.333      8.640      0.000       2.108       3.643\n",
              "x1             0.2317      0.021     10.854      0.000       0.182       0.281\n",
              "==============================================================================\n",
              "Omnibus:                        3.611   Durbin-Watson:                   1.356\n",
              "Prob(Omnibus):                  0.164   Jarque-Bera (JB):                1.439\n",
              "Skew:                          -0.927   Prob(JB):                        0.487\n",
              "Kurtosis:                       3.137   Cond. No.                         57.5\n",
              "==============================================================================\n",
              "\n",
              "Warnings:\n",
              "[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n",
              "\"\"\""
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 18
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "oWpcJ-eaMAIQ",
        "colab_type": "text"
      },
      "source": [
        "Considering that the number of injected cells is $10^6$ cells, which corresponds to $V_0 = 1$ mm$^3$, and looking at the fit, what do you conclude about the validity of the exponential model?"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "GrmJ7LOxKd85",
        "colab_type": "text"
      },
      "source": [
        "## Nonlinear least-squares"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Ki2PuI76ZyLZ",
        "colab_type": "text"
      },
      "source": [
        "### Competition and the logistic model\n",
        "The above observations demonstrated that the tumor growth had to be faster than their actual growth rate during the observation phase, if starting from $V(t=0) = 1$ mm$^3$. This suggests to look for models that would exhibit such growth deceleration. In ecology, when modeling the growth of a population, a famous model for explaining growth deceleration and saturation is the logistic model. A tumor being a population of cells, it appears natural to test this model against our data. The logistic model states that the individuals (here the tumor cells) would compete for nutrients or space. Introducing the concept of *carrying capacity* $K$ as the maximal reachable size for the population, the fraction of cells able to divide is then $1-\\frac{V}{K}$ and the model writes\n",
        "$$\n",
        "\\left\\lbrace\\begin{array}{l}\n",
        "\\frac{dV}{dt} = \\alpha V\\left(1 - \\frac{V}{K}\\right)\\\\\n",
        "V(t=0) = 1\n",
        "\\end{array}\n",
        "\\right.\n",
        "\\quad \n",
        "\\Longrightarrow\n",
        "\\quad\n",
        "V(t) = \\frac{K}{1+(K-1)e^{-\\alpha t}}.\n",
        "$$"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "7i5yNFMCZyLZ",
        "colab_type": "text"
      },
      "source": [
        "Define a python function ``logistic(time, alpha, K)`` for simulation of this model"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Qrwi8_v3Rq3w",
        "colab_type": "text"
      },
      "source": [
        "Import the curve_fit function from the scipy module"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "571F8AeMPTb6",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "from scipy.optimize import curve_fit"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "8P6rq65MSCCL",
        "colab_type": "text"
      },
      "source": [
        "Use it to fit the logistic model to the data of the first tumor growth"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "aAJrlL0uSSTm",
        "colab_type": "text"
      },
      "source": [
        "Plot the result"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "KxS_dvAeuFwS"
      },
      "source": [
        "## Identifiability\n",
        "The improvement of this model as compared to the previous model is notable. However, the cost for this has been to add a parameter to the model. How do we know that we are not overfitting now? In other words, isn't it too easy to fit the growth curves with three parameters. This is linked to the question of **identifiability** of the parameters. The general theory of statistical estimation offers great tools for such a purpose. Specifically, estimators are **random variables**. As such, they have a distribution coming from the fact that the data itself is uncertain. For a single parameter, the standard deviation of this distribution is called the **standard error**. An important property of the least-square estimator $\\hat{\\theta}$ is that it is asymptotically normally distributed and its asymptotic covariance matrix $C$ can be estimated from the combination of : 1) the (estimated) variance of the measurement error $\\hat{\\sigma}^2$ and 2) the jacobian matrix of the model evaluated in $\\hat{\\theta}$. Specifically, denoting $J$ the (weighted) jacobian matrix of the model, one can show that asymptotically\n",
        "$$\n",
        "\\hat{\\theta} \\sim \\mathcal{N}\\left(\\theta^*, \\hat{\\sigma}^2\\left(J\\cdot J^T\\right)^{-1}\\right)\n",
        "$$\n",
        "where $\\theta^*$ is the true value assumed to have generated the data (which we are estimating with $\\hat{\\theta}$). I invite you to think two minutes about why the presence of $\\hat{\\sigma}$ as a proportional term and $J$ as an inversely proportional term make sense. From $C$ the standard error ($se$) and relative standard error (rse) on parameter $p$ are defined by\n",
        "$$\n",
        "se\\left(\\hat{\\theta}^p\\right) = \\sqrt{C_{p,p}} \\quad rse\\left(\\hat{\\theta}^p\\right) = \\frac{se\\left(\\hat{\\theta}^p\\right)}{\\hat{\\theta}^p}\\times 100\n",
        "$$\n",
        "\n",
        "We will now compute such standard errors."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "UBZyVhSLu5Jz",
        "colab_type": "text"
      },
      "source": [
        "Install the numdifftools package for jacobian computations"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "lZ6vvA2ocrzo",
        "colab_type": "code",
        "outputId": "86f7552d-d93c-459a-f1b7-4cf2ed706ace",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 122
        }
      },
      "source": [
        "!pip install numdifftools"
      ],
      "execution_count": 0,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "Collecting numdifftools\n",
            "\u001b[?25l  Downloading https://files.pythonhosted.org/packages/ab/c0/b0d967160ecc8db52ae34e063937d85e8d386f140ad4826aae2086245a5e/numdifftools-0.9.39-py2.py3-none-any.whl (953kB)\n",
            "\u001b[K     |████████████████████████████████| 962kB 2.7MB/s \n",
            "\u001b[?25hInstalling collected packages: numdifftools\n",
            "Successfully installed numdifftools-0.9.39\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "YUhq9HwwvKLh",
        "colab_type": "text"
      },
      "source": [
        "Load the Jacobian function"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "4nJOYdTqSXg_",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "from numdifftools import Jacobian"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "6t_PuJepvP4z",
        "colab_type": "text"
      },
      "source": [
        "Define a ``logistic_t`` function of the parameters only"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "1ay-Czc5vg18",
        "colab_type": "text"
      },
      "source": [
        "Compute the jacobian at the parameters estimates"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "OkrVLd9-v3nv",
        "colab_type": "text"
      },
      "source": [
        "Compute the covariance matrix of the estimates. Compare with the value obtained from the ``curve_fit`` function"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "mbm8nhypwunM",
        "colab_type": "text"
      },
      "source": [
        "Compute relative standard errors on the parameters (i.e. standard errors divided by their value). What do you conclude about practical identifiability of the model? How about the goodness-of-fit?"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "_q5Eies5ZyLm",
        "colab_type": "text"
      },
      "source": [
        "### The generalized logistic model\n",
        "\n",
        "The previous results suggest that the logistic model is still not great at describing the entire growth curves. This motivates us to consider an even more flexible sigmoidal model: the generalized logistic model. It consists in modulating the strength of the competition through a power $\\gamma$ and writes:\n",
        "$$\n",
        "\\left\\lbrace\\begin{array}{l}\n",
        "\\frac{dV}{dt} = \\alpha V\\left(1 - \\left(\\frac{V}{K}\\right)^\\gamma\\right)\\\\\n",
        "V(t=0) = 1\n",
        "\\end{array}\n",
        "\\right.\n",
        "\\quad \n",
        "\\Longrightarrow\n",
        "\\quad\n",
        "V(t) = \\frac{K}{\\left(1+(K^\\gamma-1)e^{-\\alpha\\gamma t}\\right)^{\\frac{1}{\\gamma}}}.\n",
        "$$"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "NHnMq4BKZyLn",
        "colab_type": "text"
      },
      "source": [
        "Write a model function ``generalized_logistic`` for simulation of this model"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "n_C1Nviys0t2",
        "colab_type": "text"
      },
      "source": [
        "Fit the model and plot the result"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "5apjSQRes9sW",
        "colab_type": "text"
      },
      "source": [
        "Look at standard errors of the parameter estimates. What do you conclude about practical identifiability of the model?"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "GHYBif1cK0qn",
        "colab_type": "text"
      },
      "source": [
        "## Likelihood maximization\n",
        "\n",
        "Let's remember of the general statistical observation model\n",
        "\n",
        "$$\n",
        "V_j = V(t_j; \\theta) + \\sigma_j \\varepsilon_j.\n",
        "$$\n",
        "\n",
        "*Error model*\n",
        "\n",
        "An important assumption of the model is how does $\\sigma_j$ depend on the volume. The first step we did before was to consider $\\sigma_j$ as *constant* (i.e. independent of $j$), which meant that for all the observations, the magnitude of the error is the same. However, it appears also reasonable to consider that from our measurement technique (calipers), larger tumors result in larger errors. A natural assumption could then be a *proportional* error model described by $\\sigma_j = \\sigma V(t_j, \\theta)$. In our case, a detailed study of 133 measurements that were performed twice on the same tumor established a slightly more refined model but we will assume a proportional error model for simplicity.\n",
        "\n",
        "*Likelihood*\n",
        "\n",
        "A central concept in model fitting and parameter estimation is the notion of likelihood and likelihood maximization. The likelihood of the data is defined as the probability density function of the data, under the assumption it has been generated by the model $V$ with parameters $\\theta$ and $\\sigma$. Note that $\\sigma$ is often unknown and is a parameter to be established. However in our analysis we could compute it from the analysis mentioned above and we will take here $\\sigma = 0.1$ (10% error). From the formula above, under the independence and normality assumptions we can compute it to get\n",
        "$$\n",
        "L(\\theta) = p(V_1,\\cdots,V_n;\\theta)=\\prod_{j=1}^n p(V_j ;\\theta) = \\prod_{j=1}^n \\frac{1}{\\sigma_j\\sqrt{2 \\pi}}e^{-\\frac{\\left(V_j - V(t_j,\\theta)\\right)^2}{2\\sigma_j^2}}\n",
        "$$\n",
        "At this point, it is natural to consider the log-likelihood $l(\\theta, \\sigma)=\\ln\\left(L(\\theta)\\right)$ to simplify the calculations. We get\n",
        "\\begin{align}\n",
        "l(\\theta) & = - \\sum_{j= 1}^{n} \\frac{\\left(V_j - V(t_j,\\theta)\\right)^2}{2\\sigma_j^2} - \\sum_{j=1}^n \\ln(\\sigma_j) - n \\ln(\\sqrt{2\\pi}) \\\\\n",
        "l(\\theta) & = - \\sum_{j= 1}^{n} \\frac{\\left(V_j - V(t_j,\\theta)\\right)^2}{2\\sigma^2 V(t_j, \\theta)^2} - n \\ln(\\sigma) - \\sum_{j=1}^n \\ln(V(t_j,\\theta)) - n \\ln(\\sqrt{2\\pi})\n",
        "\\end{align}\n",
        "To simplify further, we will replace $V(t_j, \\theta)$ by the observation $V_j$ in the terms above coming from the error model (third term and denominator in the first sum). The maximization of the log-likelihood then becomes equivalent to the following weighted least squares minimization problem (the $\\sigma$s can be removed because they don't change the minimization problem):\n",
        "$$\n",
        "\\hat{\\theta} = \\underset{\\theta}{\\rm argmin}\\left|\\left|\\frac{V-V(t;\\theta)}{V}\\right|\\right|^2.\n",
        "$$\n",
        "where $V = (V_1, \\cdots, V_n)$, $V(t;\\theta) = (V(t_1; \\theta), \\cdots, V(t_n; \\theta))$ and $||\\cdot||$ is the discrete $L^2$ norm (sum of squares)."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "fjWFcXt8ZyLz",
        "colab_type": "text"
      },
      "source": [
        "#### The \"magical\" Gompertz model"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "mFL4wMABZyL0",
        "colab_type": "text"
      },
      "source": [
        "Looking at the values of the parameter $\\gamma$ in the generalized logistic model, we see that its value was identified to be very small. When $\\gamma$ tends to zero, the expression obtained is equal to the very popular Gompertz model, which can be expressed by the following differential equation and analytical expression:\n",
        "$$\n",
        "\\left\\lbrace\\begin{array}{l}\n",
        "\\frac{dV}{dt} = \\left(\\alpha_0 - \\beta\\ln\\left(\\frac{V}{V_c}\\right)\\right)V\\\\\n",
        "V(t=0) = 1\n",
        "\\end{array}\\right.\n",
        "\\quad \n",
        "\\Rightarrow\n",
        "\\quad\n",
        "V(t) = V_c\\left(\\frac{V_I}{V_c}\\right)^{e^{-\\beta t}}e^{\\frac{\\alpha_0}{\\beta}\\left(1-e^{-\\beta t}\\right)}\n",
        "$$\n",
        "where $V_c$ is the volume of one cell (a constant equal to $10^{-6}$), $\\alpha_0$ is the proliferation rate at one cell and $\\beta$ is the rate of exponential decrease of the relative growth rate. Indeed, one can show that the equation above is equivalent to \n",
        "$$\n",
        "\\left\\lbrace\\begin{array}{l}\n",
        "\\frac{dV}{dt} = \\alpha_1 e^{-\\beta t}V\\\\\n",
        "V(t=0) = 1\n",
        "\\end{array}\\right.\n",
        "$$\n",
        "with $\\alpha_1=\\alpha_0 +\\beta \\ln(V_c)$.\n",
        "\n",
        "This model is implemented in the following function"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "WBTY3flVZyL0",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "def gompertz(time, alpha0, beta):\n",
        "    Vc = 1e-6\n",
        "    VI = 1\n",
        "    V  = Vc*(VI/Vc)**(np.exp(-beta*time))*np.exp(alpha0/beta*(1-np.exp(-beta*time)))\n",
        "    return V"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "OBfF6ZDeZyL3",
        "colab_type": "text"
      },
      "source": [
        "Fit this model to the data and assess the goodness of fit and standard errors. Take $\\alpha_0 = 0.1$ and $\\beta = 0.01$ as initial conditions."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "zaYJIXcnZyL9",
        "colab_type": "text"
      },
      "source": [
        "#### The power law model as a simple and biologically interpretable model\n",
        "Although widely employed due to its excellent descriptive power, one of the main criticism addressed to the Gompertz model is its lack of biological ground. Indeed, while the parameter $\\alpha_0$ can be interpreted from the duration of the cell cycle, the parameter $\\beta$ remains heuristic. It is not a physiological parameter that could be experimentally measured and can only be assessed through fitting the growth curve. \n",
        "\n",
        "The **power law** model is yet another model which consists in assuming that the proliferative tissue – instead of being a constant fraction of the tumor volume as in the exponential model – is rather proportional to the volume of the tumor elevated to a power $\\gamma$. This power (or rather the triple of it) can be interpreted as the fractal (Hausdorff) dimension of the proliferative tissue. For example, when $\\gamma=\\frac{2}{3}$ then the proliferative tissue would only be two-dimensional within a three-dimensional tumor. This could correspond to a proliferative rim limited to the surface of the tumor, and would make sense because the vascularization of a tumor often occurs through its surface. However, an active process of tumor-driven vasculature development (the tumor **neo-angiogenesis**) induces the growth of new infiltrative blood vessels. From the naturall tree structure of the blood network, a fractal dimension naturally occurs and the proliferative tissue, being in the vicinity of the blood vessels, inherits this dimension. Summing up, this gives the following simple differential equation which can, once again, be solved analytically:\n",
        "$$\n",
        "\\left\\lbrace\\begin{array}{l}\n",
        "\\frac{dV}{dt} = \\alpha V^\\gamma\\\\\n",
        "V(t=0) = 1\n",
        "\\end{array}\n",
        "\\right.\n",
        "\\quad \n",
        "\\Longrightarrow\n",
        "\\quad\n",
        "V(t) = (1 + \\alpha (1-\\gamma) t)^{\\frac{1}{1-\\gamma}}.\n",
        "$$\n",
        "\n",
        "In the case of $\\gamma=\\frac{2}{3}$, show that growth of the tumor radius is linear in time. This patterns is experimentally and clinically observed in many situations, including the growth of gliomas.\n",
        "\n",
        "Use the following function to fit the model (initial guess $\\alpha = 0.2, \\; \\gamma = 0.7$) to the data and assess the identifiability of its parameters. What value do you estimate for the fractal dimension of the vasculature in this data set?"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "sx-QKb0kZyL-",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "def power_law(time, alpha, gamma):\n",
        "    VI = 1\n",
        "    V  = (VI**(1-gamma)+alpha*(1-gamma)*time)**(1./(1-gamma))\n",
        "    return V"
      ],
      "execution_count": 0,
      "outputs": []
    }
  ]
}