$\begingroup$

Background information

Suppose we have a dataset for a competition in 2016 here.

+-------+------+------+------+ | Round | X1 | X2 | Rank | +-------+------+------+------+ | 1 | .586 | .329 | 1 | +-------+------+------+------+ | 1 | .111 | .171 | 2 | +-------+------+------+------+ | ... | ... | ... | ... | +-------+------+------+------+ | 1 | .625 | .663 | 8 | +-------+------+------+------+ | 2 | .412 | .312 | 1 | +-------+------+------+------+ | 2 | .250 | .341 | 2 | +-------+------+------+------+ | ... | ... | ... | ... | +-------+------+------+------+ | 2 | .063 | .008 | 10 | +-------+------+------+------+ | 3 | .817 | .520 | 1 | +-------+------+------+------+ | ... | ... | ... | ... | +-------+------+------+------+

Each season has many rounds and each round may have different number of contestants (e.g. Round 1 has 8 and round 2 has 10 as shown above). The contestants have to compete with each other in order to win the game. For example, contestant $i$, who gets rank '1', will be the champion while $j$, who gets rank '2', will be the first runner-up, etc.

What I am interested in (or the ultimate goal) is finding the probability of 2 selected contestants in a certain round being the top two in correct order.

So, I am going to model it with the following equation, which is a mixture of 2 multinomial logit models.

$$ \begin{align} \mathrm{P}(i,j,I) & = \mathrm{Pr}\left(V_i>V_j>V_k,k

eq i,j\right)\\ & =\frac{e^{\theta_i}}{\sum_{p=1}^I{e^{\theta_p}}}\cdot\frac{e^{c\theta_j}}{\sum_{q=1,q

eq i}^I{e^{c\theta_q}}} \end{align} $$ , where

$V_i=\theta_i + \epsilon_i$ is the utility of contestant $i$,

$ \theta_i = \beta_0+\beta_1X_{1,i}+\beta_2X_{2,i} $,

$ X_{s,i} $ is the $s$-th predictor for winner $i$,

$I$ is the number of contestants in a round, and

$c$ is a already determined constant multiplier between 0 and 1.

The likelihood function and the log-likelihood can thus be obtained as below, $$ \begin{align} L(\beta)&=\prod_{r=1}^R{\mathrm{P}(i,j,I_r)}=\prod_{r=1}^R{\left(\frac{e^{\theta_{i,r}}}{\sum_{p=1}^{I_r}{e^{\theta_{p,r}}}}\cdot\frac{e^{c\theta_{j,r}}}{\sum_{q=1,q

eq i}^{I_r}{e^{c\theta_{q,r}}}}\right)} \\ \log L(\beta)&= \sum_{r=1}^R{\left(\theta_{i,r}+c\theta_{j,r}-\log{\left(\sum_{p=1}^{I_r}{e^{\theta_{p,r}}}\cdot\sum_{q=1,q

eq i}^{I_r}{e^{c\theta_{q,r}}}\right)}\right)}\\ &= \sum_{r=1}^R{\left(\theta_{i,r}+c\theta_{j,r}-\log{\left(\sum_{p=1}^{I_r}{\sum_{q=1,q

eq i}^{I_r}{e^{\theta_{p,r}+ c\theta_{q,r}}}}\right)}\right)} \end{align} $$ , where

$\theta_{i,r} = \beta_0+\beta_1X_{1,i,r}+\beta_2X_{2,i,r}$,

$X_{s,i,r} $ is the $s$-th predictor for winner $i$ in $r$-th round,

$I_r$ is the number of contestants in $r$-th round, and

$R$ is the total number of rounds in a season.

Obstacles

Not sure how to model the probability by round instead of by contestant in R

instead of by contestant in R Using ordered logit seems to be more convenient but not sure how to transform the mixture to the desired

Maximum likelihood estimates give me 2 double sums in a big sum which is computationally expensive even if I don't use any loops in R so I give up using my own program to find the MLEs

I have also used SPSS built-in ordinal regression to model the probability but the results are not desirable and the constant multiplier cannot be included

Main questions

So, my main questions are:

Is it possible to model the required probability with the multinom function in nnet package in R? Is it more convenient to use ordered logit to model the probability?

Thank you very much.