From a710c2e2a26110862a29a8d72669b881a93a4ea9 Mon Sep 17 00:00:00 2001
From: Hannah Tillman <hannah.tillman@h2o.ai>
Date: Wed, 23 Oct 2024 13:11:16 -0500
Subject: [PATCH] ht/requested changes rd.2

---
 h2o-docs/src/product/data-science/hglm.rst | 33 +++++++++-------------
 1 file changed, 13 insertions(+), 20 deletions(-)

diff --git a/h2o-docs/src/product/data-science/hglm.rst b/h2o-docs/src/product/data-science/hglm.rst
index bde7fb8fb38a..ec9ea62597e8 100644
--- a/h2o-docs/src/product/data-science/hglm.rst
+++ b/h2o-docs/src/product/data-science/hglm.rst
@@ -33,7 +33,7 @@ where:
 - :math:`\varepsilon_{ij} \sim N(0, \delta_e^2)`;
 - :math:`u_{ij} \sim N(0, \delta_u^2)`:
 - :math:`\varepsilon_{ij}, u_{mj}` are independent;
-- :math:`u_{mj}, u_{m,j}` are independent if :math:`m \neq m`.
+- :math:`u_{mj}, u_{m,j}` are independent if :math:`m \neq m'`.
 
 We need to solve the following parameters: :math:`\beta_{00}, \beta_{0j}, \beta_{m0}, u_{mj}, \delta_e^2, \delta_u^2`.
 
@@ -62,7 +62,7 @@ Algorithm-specific parameters
 
 - **method**: Obtains the fixed and random coefficients as well as the various variances (defaults to ``"em"``).
 
-- `random_columns <algo-params/random_columns.html>`__: An array of random column indices to be used for ``HGLM``.
+- `random_columns <algo-params/random_columns.html>`__: An array of random column names from which random effects coefficients will be generated in the model building process.
 
 -  `rand_family <algo-params/rand_family.html>`__: Specify the distribution of the random effects. Currently only ``rand_family=["gaussisan"]`` is supported.
 
@@ -144,37 +144,31 @@ where:
       In general, you can place the intercept at the beginning or the end of each row of data, but we chose to put it at the end for our implementation.
 
 - :math:`\theta_f \text{ is a } p` by 1 vector of fixed coefficients;
-- :math:`A_{rj}` is usually denoted by :math:`Z_j \text{ where } Z_j = \begin{bmatrix} z^T_{j1} \\ z^T_{j2} \\ z^T_{j3} \\ \vdots \\ z^T_{jn_j} \\\end{bmatrix}`;
+- :math:`A_{rj}` is usually denoted by :math:`Z_{rj} \text{ where } Z_{rj} = \begin{bmatrix} z^T_{j1} \\ z^T_{j2} \\ z^T_{j3} \\ \vdots \\ z^T_{jn_j} \\\end{bmatrix}`;
    
    .. note::
 
       We included a term for the random intercept here. However, there are cases where we do not have a random intercept, and the last element of 1 will not be there for :math:`z_{ji}`.
 
 - :math:`\theta_{rj}` represents the random coefficient and is a :math:`q` by 1 vector;
-- :math:`r_j \text{ is an } n_j` by 1 vector of level-1 random effects assumed multivariate normal in distribution with 0 mean vector, covariance matrix :math:`\sigma^2 I_{n_{j\times nj}} \text{ where } I_{n_{j \times nj}}` is the identity matrix, :math:`n_j \text{ by } n_j`;
+- :math:`r_j \text{ is an } n_j` by 1 vector of level-1 residual noise assumed multivariate normal in distribution with 0 mean vector, covariance matrix :math:`\sigma^2 I_{n_{j}\times n_{j}} \text{ where } I_{n_{j \times nj}}` is the identity matrix, :math:`n_j \text{ by } n_j`;
 - :math:`j` denotes the level-2 units where :math:`j = 1,2, \cdots , J`;
 - :math:`T_j` is a symmetric positive definite matrix of size :math:`n_j \text{ by } n_j`. We assume that :math:`T_j` is the same for all :math:`j = 1,2, \cdots , J`, and it is kept to be symmetric positive definite throughout the whole model building process.
 
 M-step
 ~~~~~~
 
-EM conceives of :math:`Y_j` as the observed data with :math:`\theta_{rj}` as the missing data. Therefore, the complete data are :math:`(Y_j, \theta_{rj}), j=1, \cdots, J \text{ while } \theta_f, \sigma^2, \text{ and } T_j` are the parameters that need to be estimated. If the complete data were observed, finding the ML estimates will be simple. To estimate :math:`\theta_f`, subtract :math:`A_{rj} \theta_{rj}` from both sides of *equation 6*:
+EM conceives of :math:`Y_j` as the observed data with :math:`\theta_{rj}` as the missing data. Therefore, the complete data are :math:`(Y_j, \theta_{rj}), j=1, \cdots, J \text{ while } \theta_f, \sigma^2, \text{ and } T_j` are the parameters that need to be estimated. If the complete data were observed, finding the ML estimates will be simple. To estimate :math:`\theta_f`, subtract :math:`A_{rj} \theta_{rj}` from both sides of *equation 6* yielding:
 
 .. math::
    
    Y_j - A_{rj} \theta_{rj} = A_{fj} \theta_f + r_f \quad \text{ equation 7}
 
-and justifying the ordinary least squares (OLS) estimate:
+Next, multiply *equation 7* with :math:`A^T_{fj}` and sum across the level-2 unit :math:`j`. Note that :math:`\sum^J_{j=1} A^T_{fj} r_j \sim 0`. Re-arrange the terms and you will get *equation 8*, which is also the ordinary least squares (OLS) estimate:
 
 .. math::
-   
-   \hat{\theta_f} = \Big( \sum^J_{j=1} A^T_{fj} A_{fj} \Big)^{-1} \sum^J_{j=1} A^T_{fj} (Y_j - A_{rj} \theta_{rj}) \quad \text{ equation 8}
 
-*Equation 8* can also be solved by multipying *equation 7* with :math:`A^T_{fj}` and sum across the level-2 unit :math:`j`. 
-
-.. note::
-   
-   :math:`\sum^J_{j=1} A^T_{fj} r_j \sim 0` and rearrange the terms and you get *equation 8*.
+   \hat{\theta_f} = \Big( \sum^J_{j=1} A^T_{fj} A_{fj} \Big)^{-1} \sum^J_{j=1} A^T_{fj} (Y_j - A_{rj} \theta_{rj}) \quad \text{ equation 8}
 
 Next, ML estimators for :math:`T_j` and :math:`\sigma^2` are straightforward:
 
@@ -199,7 +193,7 @@ where :math:`N = \sum^J_{j=1} n_j`.
 E-step
 ~~~~~~
 
-While the CDSS are not observed, they can be estimated by their conditional expectations given the data :math:`Y` and parameter estimates from the previous iterations. `Dempster et al. <#references>`__ showed that substituting the expected CDSS for the M-step formulas would produce new parameter estimates having a higher likelihood than the current estimates.
+While the CDSS are not observed, they can be estimated by their conditional expectations given the data :math:`Y` and parameter estimates from the previous iterations. `Dempster et al. [4] <#references>`__ showed that substituting the expected CDSS for the M-step formulas would produce new parameter estimates having a higher likelihood than the current estimates.
 
 To find :math:`E(CDSS | Y, \theta_f, T, \sigma^2)` requires deriving the conditional distribution of the missing data :math:`\theta_r`, given :math:`Y, \theta_f, T, \sigma^2`. From *equation  6*, the joint distribution of the complete data is:
 
@@ -207,7 +201,7 @@ To find :math:`E(CDSS | Y, \theta_f, T, \sigma^2)` requires deriving the conditi
    
    \begin{pmatrix} Y_j \\ \theta_{rj} \\\end{pmatrix} \sim N \Bigg[ \begin{pmatrix} A_{fj} \theta_{f} \\ 0 \\\end{pmatrix} , \begin{pmatrix} A_{rj}T_jA^T_{rj} + \sigma^2 & A_{rj}T_j \\ T_j A^T_{rj} & T_j \\\end{pmatrix} \Bigg] \quad \text{ equation 12}
 
-From *equation 12*, we can dervie the conditional distribution of the missing data given the complete data as follows:
+From *equation 12*, we can obtain the conditional distribution of the missing data given the complete data as follows:
 
 .. math::
    
@@ -221,7 +215,7 @@ with
 
    C_j = A^T_{rj} A_{rj} + \sigma^2 T^{-1}_j \quad \text{ equation 15}
 
-Complete the EM algorithm
+The complete EM algorithm
 ~~~~~~~~~~~~~~~~~~~~~~~~~
 
 The complete EM algorithm is as follows:
@@ -237,10 +231,9 @@ The complete EM algorithm is as follows:
 
 3. Substitution: substitute the estimated CDSS from *equation 17* into the M-step forumulas (*equations 8, 9,* and *10*);
 4. Processing: feed the new estimates of :math:`\theta_f, \sigma^2, T_j` into step 2;
-5. Cycling: continue steps 2, 3, and 4 until the following stopping conditions are satisfied:
-   
-   a. Changes in the log-likelihood (*equation 16*) become sufficiently small, or
-   b. The largest change in the value of any of the parameters is sufficiently small.
+5. Cycling: continue steps 2, 3, and 4 until the following stopping condition is satisfied:
+
+   - The largest change in the value of any of the parameters is sufficiently small.
 
 Log-likelihood for HGLM
 ~~~~~~~~~~~~~~~~~~~~~~~