You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Consider the following definition of MLE problem for multinomials. The input to the problem is a finite set $\mathcal Y$, and a weight $c_y\ge0$ for each $y\in\mathcal Y$. The output from the problem is the distribution $p^{}$ that solves the following maximization problem.
$$
p^{}=\arg \max {p \in \mathcal{P}{\mathcal{Y}}} \sum_{y \in \mathcal{Y}} c_{y} \log p_{y}
$$
( i ) Prove that, the vector $p^{*}$ has components
$$
p_{y}^{*}=\frac{c_{y}}{N}
$$
for $\forall y \in \mathcal{Y}$, where $N=\sum_{y \in \mathcal{Y}} c_{y} .$ (Hint: Use the theory of Lagrange multiplier)
The first step is to re-write the log-likelihood function in a way that makes direct use of "counts" taken from the training data:
$$
\begin{aligned}
l(\Omega)=& \sum_{i=1}^{m} \log p\left(y^{(i)}\right)+\sum_{i=1}^{m} \sum_{j=1}^{n} \log p_{j}\left(x_{j}^{(i)} \mid y^{(i)}\right) \
=& \sum_{y \in \mathcal{Y}} \operatorname{count}(y) \log p(y) \
&+\sum_{j=1}^{n} \sum_{y \in \mathcal{Y}} \sum_{x \in{-1,+1}} \text { count }{j}(x \mid y) \log p{j}(x \mid y)
\end{aligned}
$$
where as before
$$
\begin{gathered}
\operatorname{count}(y)=\sum_{i=1}^{m}1\left(y^{(i)}=y\right) \
\operatorname{count}{j}(x \mid y)=\sum{i=1}^{m}1\left(y^{(i)}=y \wedge x_{j}^{(i)}=x\right)
\end{gathered}
$$
Consider first maximization of this function with respect to the $q(y)$ parameters. It is easy to see that the term
$$
\sum_{j=1}^{d} \sum_{y \in \mathcal{Y}} \sum_{x \in{-1,+1}} \operatorname{count}{j}(x \mid y) \log p{j}(x \mid y)
$$
does not depend on the $p(y)$ parameters at all. Hence to pick the optimal $p(y)$ parameters, we need to simply maximize
$$
\sum_{y \in \mathcal{Y}} \operatorname{count}(y) \log p(y)
$$
Subject to the constraints $p(y) \geq 0$ and $\sum_{y=1}^{k} p(y)=1$, by the consequence of ( i ) , the values for $q(y)$ which maximize this expression under these constraints is simply
$$
p(y)=\frac{\operatorname{count}(y)}{\sum_{y=1}^{k} \operatorname{count}(y)}=\frac{\operatorname{count}(y)}{n}
$$
By a similar argument, we can maximize each term of the form
$$
\sum_{x \in{-1,+1}} \text { count }{j}(x \mid y) \log p{j}(x \mid y)
$$
Applying ( i ), we can get
$$
p_j(x\mid y)=\frac{\text{count}j(x\mid y)}{\sum{x\in{-1,1}}\text{count}_j(x\mid y)}=\frac{\text{count}_j(x\mid y)}{\text{count}(y)}
$$