We are provided with a set of $n$ targets. Each target is characterized by a utility value. We know the distribution of the utility value for each target, but do not know its current value. Therefore, we need to probe a target to know its exact current value. We are allowed to probe $m$ out of $n$ targets and will chose one target after probing $m$ targets. We seek the optimal probing strategy that maximizes the expected utility, i.e., the utility of the target we pick at the end.
I would like to know if this problem is related to some known problem.