Add glinternet docs

JamesYang007 · Jun 4, 2024 · 5e0c453 · 5e0c453
1 parent 673d20b
commit 5e0c453
Showing 1 changed file with 309 additions and 1 deletion.
diff --git a/docs/sphinx/notebooks/examples.ipynb b/docs/sphinx/notebooks/examples.ipynb
@@ -7,6 +7,314 @@
     "# __Examples__"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## __Group Lasso with Interaction Terms__"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import adelie as ad\n",
+    "import numpy as np"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "In regression settings, we typically want to include pairwise interaction terms amongst a subset of features to capture some non-linearity.\n",
+    "Moreover, we would like to perform feature selection on the interaction terms as well.\n",
+    "However, to achieve an interpretable model, we would like to also impose a hierarchy such that interaction terms are only included in the model if the main effects are included.\n",
+    "Michael Lim and Trevor Hastie provide a formalization of this problem using group lasso where the group structure imposes the hierarchy and the group lasso penalty allows for feature selection.\n",
+    "For further details, we provide the following reference:\n",
+    "\n",
+    "- [Learning interactions via hierarchical group-lasso regularization](https://hastie.su.domains/Papers/glinternet_jcgs.pdf) \n",
+    "\n",
+    "We will work under a simulation setting.\n",
+    "We draw $n$ independent samples $Z_i \\in \\mathbb{R}^d$ where the continuous features are sampled from a standard normal and the discrete features are sampled uniformly."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "n = 1000        # number of samples\n",
+    "d_cont = 10     # number of continuous features\n",
+    "d_disc = 10     # number of discrete features\n",
+    "seed = 1        # random seed\n",
+    "\n",
+    "np.random.seed(seed)\n",
+    "Z_cont = np.random.normal(0, 1, (n, d_cont))\n",
+    "levels = np.random.choice(10, d_disc, replace=True) + 1\n",
+    "Z_disc = np.array([np.random.choice(lvl, n, replace=True) for lvl in levels]).T"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "It is customary to first center and scale the continuous features so that they have mean $0$ and standard deviation $1$."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "Z_cont_means = np.mean(Z_cont, axis=0)\n",
+    "Z_cont_stds = np.std(Z_cont, axis=0, ddof=0)\n",
+    "Z_cont = (Z_cont - Z_cont_means) / Z_cont_stds"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "This gives us a combined data matrix $Z$ with the appropriate levels information.\n",
+    "By convention, a $0$-level feature is a continuous feature and otherwise it is a discrete feature with that many levels."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "Z = np.asfortranarray(np.concatenate([Z_cont, Z_disc], axis=1))\n",
+    "levels = np.concatenate([np.zeros(d_cont), levels])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We generate the response vector $y$ from a linear model where one continuous and discrete main effects as well as their interaction term are included."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "Z_one_hot_0 = np.zeros((n, int(levels[d_cont])))\n",
+    "Z_one_hot_0[np.arange(n), Z_disc[:, 0].astype(int)] = 1\n",
+    "Z_cont_0 = Z_cont[:, 0][:, None]\n",
+    "Z_sub = np.concatenate([\n",
+    "    Z_cont_0,\n",
+    "    Z_one_hot_0,\n",
+    "    Z_cont_0 * Z_one_hot_0,\n",
+    "], axis=1)\n",
+    "beta = np.random.normal(0, 1, Z_sub.shape[1])\n",
+    "y = Z_sub @ beta + np.random.normal(0, 1, n)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We are now in position to construct the full feature matrix to fit a group lasso model.\n",
+    "As a demonstration, suppose we (correctly) believe that there is a true interaction term containing the first continuous feature, but we do not know the other feature.\n",
+    "We, therefore, wish to construct an interaction between the first continuous feature against all other features.\n",
+    "It is easy to specify this pairing, as shown below via `intr_map`. \n",
+    "The following code constructs the interaction matrix `X_intr`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "intr_map = {\n",
+    "    0: None,\n",
+    "}\n",
+    "X_intr = ad.matrix.interaction(Z, intr_map, levels)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "To put all groups of features on the same relative \"scale\", we must further center and scale all interaction terms between two continuous features.\n",
+    "Then, it can be shown that interactions between two discrete features induce a (Frobenius) norm of $1$,\n",
+    "a discrete and continuous feature induce a norm of $\\sqrt{2}$,\n",
+    "and two continuous features induce a norm of $\\sqrt{3}$.\n",
+    "These values will be used as penalty factors later when we call the group lasso solver.\n",
+    "We first compute the necessary centers and scales."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "pairs = X_intr._pairs\n",
+    "pair_levels = levels[pairs]\n",
+    "is_cont_cont = np.prod(pair_levels == 0, axis=1).astype(bool)\n",
+    "cont_cont_pairs = pairs[is_cont_cont]\n",
+    "cont_cont = Z[:, cont_cont_pairs[:, 0]] * Z[:, cont_cont_pairs[:, 1]]\n",
+    "centers = np.zeros(X_intr.shape[1])\n",
+    "scales = np.ones(X_intr.shape[1])\n",
+    "cont_cont_indices = X_intr.groups[is_cont_cont] + 2\n",
+    "centers[cont_cont_indices] = np.mean(cont_cont, axis=0)\n",
+    "scales[cont_cont_indices] = np.std(cont_cont, axis=0, ddof=0)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now, we construct the full feature matrix $X$ including the one-hot encoded main effects as well as the standardized version of the interaction terms using the centers and scales defined above."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "X_one_hot = ad.matrix.one_hot(Z, levels)\n",
+    "X = ad.matrix.concatenate([\n",
+    "    X_one_hot,\n",
+    "    ad.matrix.standardize(\n",
+    "        X_intr,\n",
+    "        centers=centers, \n",
+    "        scales=scales,\n",
+    "    ),\n",
+    "], axis=1)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Before calling the group lasso solver, we must prepare the grouping and penalty factor information."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "groups = np.concatenate([\n",
+    "    X_one_hot.groups,\n",
+    "    X_one_hot.shape[1] + X_intr.groups,\n",
+    "])\n",
+    "\n",
+    "is_cont_disc = np.logical_xor(pair_levels[:, 0], pair_levels[:, 1])\n",
+    "penalty = np.ones(X_intr.groups.shape[0])\n",
+    "penalty[is_cont_cont] = np.sqrt(3)\n",
+    "penalty[is_cont_disc] = np.sqrt(2)\n",
+    "penalty = np.concatenate([\n",
+    "    np.ones(X_one_hot.groups.shape[0]),\n",
+    "    penalty,\n",
+    "])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Finally, we call the group lasso solver with our prepared inputs."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "100%|\u001b[1;32m█\u001b[0m\u001b[1;32m█\u001b[0m\u001b[1;32m█\u001b[0m\u001b[1;32m█\u001b[0m\u001b[1;32m█\u001b[0m\u001b[1;32m█\u001b[0m\u001b[1;32m█\u001b[0m\u001b[1;32m█\u001b[0m\u001b[1;32m█\u001b[0m\u001b[1;32m█\u001b[0m| 100/100 [00:00:00<00:00:00, 4657.95it/s] [dev:71.1%]\n"
+     ]
+    }
+   ],
+   "source": [
+    "state = ad.grpnet(\n",
+    "    X=X, \n",
+    "    glm=ad.glm.gaussian(y), \n",
+    "    groups=groups, \n",
+    "    penalty=penalty,\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "First, observe that the first two groups of features that enter the model are precisely the first continuous and discrete main effects."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "array([ 0, 10, 11, 12, 13, 14], dtype=int32)"
+      ]
+     },
+     "execution_count": 11,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "state.betas[13, :X_one_hot.shape[1]].indices"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Next, we see that the first interaction terms to be included corresponds to the interaction between the first continuous and discrete features in `Z`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "array([ 0, 10], dtype=int32)"
+      ]
+     },
+     "execution_count": 12,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "first_intr_index = X_one_hot.shape[1] + state.betas[16, X_one_hot.shape[1]:].indices[0]\n",
+    "relative_index = np.argmax(groups == first_intr_index) - X_one_hot.groups.shape[0]\n",
+    "pairs[relative_index]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We conclude that the group lasso correctly finds the causal effects of $y$ early in the path."
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -1745,7 +2053,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.10.6"
+   "version": "3.10.0"
   }
  },
  "nbformat": 4,