Glam Prestige Journal

Bright entertainment trends with youth appeal.

$\begingroup$

If I've got a function

$$ \log p(\tau | \theta) = \log ( \frac{\exp(\theta^T \tau)}{\sum_\tau \exp(\theta^T\tau)} ) $$

how do I calculate its derivative to maximize the log-likelihood?

$$\log p(\tau | \theta) = \theta^T \tau - \log( \sum_\tau \exp (\theta^T \tau) $$ Using the chain rule $$ u = \log(v)$$ $$ \frac{du}{dv} = 1 / v$$

$$ v = \sum_\tau \exp(w) $$ $$ \frac{dv}{dw} = \sum_\tau \exp(w) $$

$$ w = \theta^T \tau $$ $$ \frac{dw}{d\theta} = \tau $$

so leaves me with

$$ \frac{du}{d\theta} = \frac{du}{dv} \cdot \frac{dv}{dw} \cdot \frac{dw}{d\theta} = \frac{1}{\sum_\tau \exp(\theta^T \tau)} \cdot \sum_\tau exp(\theta^T \tau) \cdot \tau = \tau$$

From the answer sheet this is wrong but I'm not entirely sure why? Can someone point out the mistake?

Thanks

$\endgroup$ 6

2 Answers

$\begingroup$

You write $\sum_\tau \exp(\theta^\top \tau)$, which implies that $\tau $ is just a dummy variable ranging over some set. Your final answer cannot involve $\tau$.

You are correct right up until $$ \frac{\sum_\tau \exp(\theta^\top \tau)\tau}{\sum_\tau \exp(\theta^\top \tau)}. $$ However, you cannot "pull the $\tau$ out of the top summation" because $\tau$ is not a constant with respect to the summation index, $\tau$. Therefore, the above expression is as simple as it gets (without additional information).

Edit: I see now the correct definition of the function you are trying to differentiate is $$ \theta^\top\tau-\log \sum_{\tau} \exp(\theta^\top \tau). $$ Notice that $\tau$ is playing two roles here; one as a summation index, and the other as a fixed vector. This can cause confusion. To make things clearer, I will use $\sigma$ for the summation index: $$ \theta^\top\tau-\log \sum_{\sigma} \exp(\theta^\top \sigma) $$ Using my previous explanation, the simplest form of the derivative of this is $$ \tau-\frac{\sum_\sigma \exp(\theta^\top \sigma)\sigma}{\sum_\sigma \exp(\theta^\top \sigma)}. $$

$\endgroup$ 0 $\begingroup$

$$\nabla\left(\log\sum_i\exp(\theta^T\tau_i)\right)=\frac{\sum_i\exp(\theta^T\tau_i)\tau_i}{\sum_i\exp(\theta^T\tau_i)}.$$

$\endgroup$

Your Answer

Sign up or log in

Sign up using Google Sign up using Facebook Sign up using Email and Password

Post as a guest

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy