Glam Prestige Journal

Bright entertainment trends with youth appeal.

$\begingroup$

I have a histogram, and I want to estimate the parameters of the underlying distribution. Here is the data I've taken from the graph:

$$ \begin{array}{c|lcr} \text{Interval} & \text{Count}\\ \hline 0-.70 & 0\\ .70-.75 & 1\\ .75-.80 & 21\\ .80-.85 & 36\\ .85-.90 & 59\\ .90-.95 & 69\\ .95-1 & 4\\ \end{array} $$

So far, I've estimated $\alpha = 33.15$ and $\beta=4.78$. I rounded each point to the center of the range (so $36$ points at $.825$, $59$ at $.875$, etc.). From this, I found the mean and variance, and solved for the values of $\alpha$ and $\beta$.

Is this the best way to do it?

This question is making the assumption that the data follows a beta distribution.

$\endgroup$ 1

1 Answer

$\begingroup$

This is a maximum likelihood kind of question. You have binned your data already, so you can calculate the expected probability of each bin, given $\alpha$ and $\beta$:

$$p_i=I_{b_i}(\alpha,\beta)-I_{a_i}(\alpha,\beta),$$

where $I_x(\alpha,\beta)$ is the regularized incomplete beta function, and the $i$-th bin is $[a_i,b_i]$. Now we have a multinomial distribution in the bins, with these being the probabilities for each bin, so the likelihood of your result is $L={N\choose n_1,\dots,n_k}p_1^{n_1}\cdots p_k^{n_k}$. Maximize with respect to $\alpha,\beta$! The multinomial coefficient doesn't depend on $\alpha,\beta$, so you can leave it out, and the log-likelihood is easier for computation:

$$\ell=\sum_in_i\log p_i=\sum_in_i\log(I_{b_i}(\alpha,\beta)-I_{a_i}(\alpha,\beta))$$

According to Mathematica, using your data, the best value is $\ell=-272.57$ at $\alpha=36.39$ and $\beta=5.247$.

$\endgroup$

Your Answer

Sign up or log in

Sign up using Google Sign up using Facebook Sign up using Email and Password

Post as a guest

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy