Skip to content

Commit

Permalink
Add glinternet docs
Browse files Browse the repository at this point in the history
  • Loading branch information
JamesYang007 committed Jun 4, 2024
1 parent 673d20b commit 5e0c453
Showing 1 changed file with 309 additions and 1 deletion.
310 changes: 309 additions & 1 deletion docs/sphinx/notebooks/examples.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,314 @@
"# __Examples__"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## __Group Lasso with Interaction Terms__"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import adelie as ad\n",
"import numpy as np"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In regression settings, we typically want to include pairwise interaction terms amongst a subset of features to capture some non-linearity.\n",
"Moreover, we would like to perform feature selection on the interaction terms as well.\n",
"However, to achieve an interpretable model, we would like to also impose a hierarchy such that interaction terms are only included in the model if the main effects are included.\n",
"Michael Lim and Trevor Hastie provide a formalization of this problem using group lasso where the group structure imposes the hierarchy and the group lasso penalty allows for feature selection.\n",
"For further details, we provide the following reference:\n",
"\n",
"- [Learning interactions via hierarchical group-lasso regularization](https://hastie.su.domains/Papers/glinternet_jcgs.pdf) \n",
"\n",
"We will work under a simulation setting.\n",
"We draw $n$ independent samples $Z_i \\in \\mathbb{R}^d$ where the continuous features are sampled from a standard normal and the discrete features are sampled uniformly."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"n = 1000 # number of samples\n",
"d_cont = 10 # number of continuous features\n",
"d_disc = 10 # number of discrete features\n",
"seed = 1 # random seed\n",
"\n",
"np.random.seed(seed)\n",
"Z_cont = np.random.normal(0, 1, (n, d_cont))\n",
"levels = np.random.choice(10, d_disc, replace=True) + 1\n",
"Z_disc = np.array([np.random.choice(lvl, n, replace=True) for lvl in levels]).T"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It is customary to first center and scale the continuous features so that they have mean $0$ and standard deviation $1$."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"Z_cont_means = np.mean(Z_cont, axis=0)\n",
"Z_cont_stds = np.std(Z_cont, axis=0, ddof=0)\n",
"Z_cont = (Z_cont - Z_cont_means) / Z_cont_stds"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This gives us a combined data matrix $Z$ with the appropriate levels information.\n",
"By convention, a $0$-level feature is a continuous feature and otherwise it is a discrete feature with that many levels."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"Z = np.asfortranarray(np.concatenate([Z_cont, Z_disc], axis=1))\n",
"levels = np.concatenate([np.zeros(d_cont), levels])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We generate the response vector $y$ from a linear model where one continuous and discrete main effects as well as their interaction term are included."
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
"Z_one_hot_0 = np.zeros((n, int(levels[d_cont])))\n",
"Z_one_hot_0[np.arange(n), Z_disc[:, 0].astype(int)] = 1\n",
"Z_cont_0 = Z_cont[:, 0][:, None]\n",
"Z_sub = np.concatenate([\n",
" Z_cont_0,\n",
" Z_one_hot_0,\n",
" Z_cont_0 * Z_one_hot_0,\n",
"], axis=1)\n",
"beta = np.random.normal(0, 1, Z_sub.shape[1])\n",
"y = Z_sub @ beta + np.random.normal(0, 1, n)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We are now in position to construct the full feature matrix to fit a group lasso model.\n",
"As a demonstration, suppose we (correctly) believe that there is a true interaction term containing the first continuous feature, but we do not know the other feature.\n",
"We, therefore, wish to construct an interaction between the first continuous feature against all other features.\n",
"It is easy to specify this pairing, as shown below via `intr_map`. \n",
"The following code constructs the interaction matrix `X_intr`."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"intr_map = {\n",
" 0: None,\n",
"}\n",
"X_intr = ad.matrix.interaction(Z, intr_map, levels)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To put all groups of features on the same relative \"scale\", we must further center and scale all interaction terms between two continuous features.\n",
"Then, it can be shown that interactions between two discrete features induce a (Frobenius) norm of $1$,\n",
"a discrete and continuous feature induce a norm of $\\sqrt{2}$,\n",
"and two continuous features induce a norm of $\\sqrt{3}$.\n",
"These values will be used as penalty factors later when we call the group lasso solver.\n",
"We first compute the necessary centers and scales."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"pairs = X_intr._pairs\n",
"pair_levels = levels[pairs]\n",
"is_cont_cont = np.prod(pair_levels == 0, axis=1).astype(bool)\n",
"cont_cont_pairs = pairs[is_cont_cont]\n",
"cont_cont = Z[:, cont_cont_pairs[:, 0]] * Z[:, cont_cont_pairs[:, 1]]\n",
"centers = np.zeros(X_intr.shape[1])\n",
"scales = np.ones(X_intr.shape[1])\n",
"cont_cont_indices = X_intr.groups[is_cont_cont] + 2\n",
"centers[cont_cont_indices] = np.mean(cont_cont, axis=0)\n",
"scales[cont_cont_indices] = np.std(cont_cont, axis=0, ddof=0)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, we construct the full feature matrix $X$ including the one-hot encoded main effects as well as the standardized version of the interaction terms using the centers and scales defined above."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"X_one_hot = ad.matrix.one_hot(Z, levels)\n",
"X = ad.matrix.concatenate([\n",
" X_one_hot,\n",
" ad.matrix.standardize(\n",
" X_intr,\n",
" centers=centers, \n",
" scales=scales,\n",
" ),\n",
"], axis=1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Before calling the group lasso solver, we must prepare the grouping and penalty factor information."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"groups = np.concatenate([\n",
" X_one_hot.groups,\n",
" X_one_hot.shape[1] + X_intr.groups,\n",
"])\n",
"\n",
"is_cont_disc = np.logical_xor(pair_levels[:, 0], pair_levels[:, 1])\n",
"penalty = np.ones(X_intr.groups.shape[0])\n",
"penalty[is_cont_cont] = np.sqrt(3)\n",
"penalty[is_cont_disc] = np.sqrt(2)\n",
"penalty = np.concatenate([\n",
" np.ones(X_one_hot.groups.shape[0]),\n",
" penalty,\n",
"])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Finally, we call the group lasso solver with our prepared inputs."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|\u001b[1;32m█\u001b[0m\u001b[1;32m█\u001b[0m\u001b[1;32m█\u001b[0m\u001b[1;32m█\u001b[0m\u001b[1;32m█\u001b[0m\u001b[1;32m█\u001b[0m\u001b[1;32m█\u001b[0m\u001b[1;32m█\u001b[0m\u001b[1;32m█\u001b[0m\u001b[1;32m█\u001b[0m| 100/100 [00:00:00<00:00:00, 4657.95it/s] [dev:71.1%]\n"
]
}
],
"source": [
"state = ad.grpnet(\n",
" X=X, \n",
" glm=ad.glm.gaussian(y), \n",
" groups=groups, \n",
" penalty=penalty,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First, observe that the first two groups of features that enter the model are precisely the first continuous and discrete main effects."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([ 0, 10, 11, 12, 13, 14], dtype=int32)"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"state.betas[13, :X_one_hot.shape[1]].indices"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Next, we see that the first interaction terms to be included corresponds to the interaction between the first continuous and discrete features in `Z`."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([ 0, 10], dtype=int32)"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"first_intr_index = X_one_hot.shape[1] + state.betas[16, X_one_hot.shape[1]:].indices[0]\n",
"relative_index = np.argmax(groups == first_intr_index) - X_one_hot.groups.shape[0]\n",
"pairs[relative_index]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We conclude that the group lasso correctly finds the causal effects of $y$ early in the path."
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand Down Expand Up @@ -1745,7 +2053,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.6"
"version": "3.10.0"
}
},
"nbformat": 4,
Expand Down

0 comments on commit 5e0c453

Please sign in to comment.