2754 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 68, 2020
A Solution for Large-Scale Multi-Object Tracking
Michael Beard , Ba Tuong Vo , and Ba-Ngu Vo
Abstract—A large-scale multi-object tracker based on the gen- The total number of objects in a scenario is often quoted
eralised labeled multi-Bernoulli (GLMB) filter is proposed. The as a major factor influencing the computational complexity of
algorithm is capable of tracking a very large, unknown and time- a multi-object tracking problem. This is partially true, since
varying number of objects simultaneously, in the presence of a an increase in the number of objects leads to an increase in
high number of false alarms, as well as missed detections and the number of possible events that a tracking algorithm must
measurement origin uncertainty due to closely spaced objects.
The algorithm is demonstrated on a simulated tracking scenario,
consider. However, this is not the only concern, and a more
where the peak number objects appearing simultaneously exceeds meaningful indication of complexity is the density of the objects
one million. Additionally, we introduce a new method of applying and measurements in both space and time. For example, it is
the optimal sub-pattern assignment (OSPA) metric to determine a straightforward to track a large cumulative number of objects
meaningful distance between two sets of tracks. We also develop an over time, when only a small number are present simultaneously
efficient strategy for its exact computation in large-scale scenarios at any given instant. Likewise, a large number of spatially
to evaluate the performance of the proposed tracker. well-separated objects is relatively easy to track, since they
Index Terms—Random finite sets, generalised labeled multi-
can be considered as statistically independent entities. In these
Bernoulli, multi-object tracking, large-scale tracking, OSPA. cases, it is likely that running single-object filters in parallel
would suffice. Difficulties begin to arise when objects come into
I. INTRODUCTION close spatial proximity from the point of view of the sensor, and
ULTI-OBJECT tracking is a problem with a wide variety the ensuing increase in data association ambiguity leads to a
M of applications across diverse disciplines, and numerous
effective solutions have been developed in recent decades [1]–
combinatorial explosion in the number of statistically likely ob-
servation events. In dynamic multi-object scenarios, the problem
[3]. The common goal of multi-object tracking is to estimate is further compounded by the motion of the objects.
the trajectories of an unknown and time-varying number of Various methods have been proposed to address the com-
objects, using sensor measurements corrupted by phenomena putational challenges of tracking a large number of objects
including observation noise, false alarms, missed detections, simultaneously. One of the earliest works on large scale multi-
and data association uncertainty. The combination of these object tracking proposed the use of efficient spatial searching
effects gives rise to a highly demanding computational task, algorithms to associate measurements to tracks [12]. Although
with complexity that grows exponentially as the number of capable of processing a very large number of objects, the
objects/measurements increases. Tracking a very large number proposed method does not account for dependencies between
of objects simultaneously (in excess of hundreds of thousands) closely spaced objects, and thus its performance is likely to
is thus a challenging problem, with important practical ap- degrade in such cases. In [13] it was suggested that the same
plications. A few notable examples are: (i) space situational techniques could be applied to help improve the efficiency of
awareness, which requires tracking thousands of satellites and the joint probabilistic data association (JPDA) algorithm, but
millions of debris objects [4]–[6]; (ii) wide area surveillance no numerical results were provided. An alternative approach
(e.g. monitoring large urban environments), requires tracking was proposed in [14], known as linear multi-target integrated
hundreds of thousands of objects over time, including vehicles probabilistic data association (LMIPDA). This method reduces
and people in crowded environments [7]–[9]; (iii) cell biology, computation using an approximation that treats nearby objects
where tracking the motion of large numbers of cells is critical as additional sources of clutter, and the resulting algorithm was
to understanding their behaviour in living tissues [10], [11]; demonstrated on a simulated 50-object scenario. An approach
(iv) wildlife biology, where tracking large animal populations based on 2D assignment of measurements to tracks was proposed
is needed to study the behaviour of wildlife in their natural in [15], with an application to large-scale air traffic surveillance.
habitats [16]. The scenario had a total of about 800 tracks, however, the
target/measurement density and the number of simultaneous
Manuscript received July 13, 2019; revised February 13, 2020; accepted objects was not provided.
March 29, 2020. Date of publication April 10, 2020; date of current version Algorithms based on multiple hypothesis tracking (MHT)
June 5, 2020. The associate editor coordinating the review of this manuscript and have also been proposed for tracking large numbers of objects.
approving it for publication was Prof. Youngchul Sung. This work was supported For example, applications to cell tracking [10] and wildlife
by the Australian Research Council under Discovery Projects DP170104584 and
DP160104662. (Corresponding author: Ba Tuong Vo.)
tracking [16], have been demonstrated on thousands of objects
The authors are with the School of Electrical Engineering, Computing in total, with several hundred objects simultaneously [16]. The
and Mathematical Sciences, Curtin University, Bentley, WA 6102, Aus- labelled multi-Bernoulli (LMB) filter, a one-term approxima-
tralia (e-mail:
[email protected];
[email protected]; ba- tion of the generalised labeled multi-Bernoulli (GLMB) fil-
[email protected]). ter, has also been demonstrated on a simulated scenario with
This article has supplementary downloadable material available at https://
ieeexplore.ieee.org, provided by the authors. This includes two illustrative videos over a thousand objects simultaneously [17], and in [18] it
of the tracking results. was used with spatial searching to track hundreds of sea-ice
Digital Object Identifier 10.1109/TSP.2020.2986136 objects.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
BEARD et al.: SOLUTION FOR LARGE-SCALE MULTI-OBJECT TRACKING 2755
An alternative approach to improving the scalability of multi- metric [41], in its most commonly used form, measures the dis-
target tracking systems is the concept of distributed estima- tance between two sets of states, and does not take into account
tion, which spreads the computational load across multiple phenomena such as track switching and track fragmentation.
sensor nodes, each of which only processes observations from Nonetheless, by developing a meaningful base-distance between
its local region. Random finite set (RFS) based multi-sensor two tracks, we are able to use OSPA to construct a physically
fusion algorithms have been proposed for the PHD/CPHD meaningful distance between two sets of tracks, which we called
filters [19]–[22], multi-Bernoulli filter [23]–[26], and hybrid OSPA(2) to distinguish it from the standard use. To evaluate the
Poisson multi-Bernoulli filter [27]. However these approaches performance of the proposed tracker on a scenario involving an
are not true multi-object trackers, since only the current states are unknown and time-varying number of objects with a peak in
estimated. Multi-sensor fusion for multi-object tracking filters excess of one million, we developed a scalable procedure for
was first examined in [28], where fusion rules for the LMB exact computation of the OSPA(2) metric. Preliminary results
filter [29] and marginalised GLMB filter [30], were proposed and on OSPA(2) were published in [42], [43]. This paper provides
demonstrated for multi-sensor multi-object tracking. A variation complete mathematical details.
for the LMB filter was subsequently proposed in [31], where The rest of the paper is structured as follows. Section II pro-
sensor fusion is performed based on a Cauchy–Schwarz diver- vides the necessary background on the GLMB filter. Section III
gence. Further works on robustness of distributed multi-object presents some theoretical results regarding the decomposition of
estimation have been reported in [32], [33], and the latest works GLMB densities. In section IV we apply these results to imple-
on computationally efficient implementations were proposed ment an efficient GLMB filter, capable of handling large-scale
in [34]. multi-object tracking problems. Sections V and VI present the
To satisfy the competing demands of computational effi- OSPA(2) metric, and its use to evaluate the proposed large-scale
ciency versus accuracy, a trade-off is necessary. In this paper, tracker. Some concluding remarks are given in section VII.
we present a multi-object tracking filter that accomplishes this Mathematical proofs are given in the Appendix.
trade-off by exploiting the properties of GLMB densities, and
the standard multi-object likelihood through a principled ap- II. BACKGROUND: GENERALISED LABELED
proximation. The result is an algorithm that can be applied MULTI-BERNOULLI TRACKER
to large-scale multi-object tracking scenarios exhibiting com-
monly encountered structural properties, without suffering from In multi-object systems, tracking is distinct from filtering, in
an intractable increase in computational complexity. The al- the sense that tracking involves the estimation of the trajectories
gorithm is highly effective when the scenario consists of iso- of objects over time, as opposed to the multi-object state at each
lated groups of high object/measurement density, and is capable time instant. The generalised labelled multi-Bernoulli (GLMB)
of adapting to changes in the structure of these groups over filter is an algorithm that is specifically designed to provide
time. estimates of object trajectories by modeling the multi-object
Our proposed method is based on functional approximation state as a labeled random finite set (RFS). In this section we
of the multi-object density (equivalent to a probability density briefly revisit the GLMB filter, and the interested reader is
for finite-set-valued random variable [35]) – a key element of referred to [36]–[38] for more detailed treatments.
the random finite set (RFS) approach – that encapsulates all We begin by defining the notion of a labeled RFS. Let X be a
information on the current set of tracks in a single non-negative single-object state space, L a discrete label space, L : X × L →
function. Processing the large number of combinations of events L the projection defined by L((x, )) = for all points (x, ) ∈
translates to recursive computation of the multi-object filtering X × L. Denote by F(S) the collection of all finite subsets of
density [36]–[38]. Tractability hinges on efficient functional some underlying space S. Now consider X∈ F(X × L) and
approximation/computation of the so-called GLMB filtering its corresponding label set L(X) = {L(x) : x ∈ X}. Then the
recursion, under limited processing/memory resources. Concep- labels of the points in X are distinct if and only if X and
tually, the key enablers in our proposed large-scale tracker are: its label set L(X) have equal cardinality. This is expressed
(i) adaptive approximation of the GLMB filtering density, at each mathematically by defining a distinct label indicator function
time, by a product of tractable and approximately independent
Δ (X) δ|X| [|L (X)|] ,
GLMB densities; and (ii) efficient parallel computation of these
GLMB densities by exploiting the conjugacy of the GLMB fam- which has value 1 if the labels in X are distinct and 0 other-
ily. This strategy is distinct from the approach in [17], where the wise. A labeled RFS is defined as a marked RFS with distinct
GLMB is approximated by a single term. In essence, our strategy marks [36]. More precisely, a labeled RFS with state space X
efficiently identifies and processes significant combinations by and label space L is an RFS of X × L, constructed by marking
exploiting structural properties and parallelisation, to make the the elements of an RFS of X with distinct labels from L, i.e. any
most of the limited computing resources. Consequently, while realisation X must satisfy Δ(X) = 1.
the focus of this paper is on large-scale problems, our solution
also provides significant efficiency gains when applied to smaller A. Multi-Object Dynamic Model
scale problems.
Our study would be incomplete without evaluating the track- Given the multi-object state X (at time k) with label space L,
ing performance of the proposed multi-object tracker, which is each (x, ) ∈ X either survives with probability PS (x, ) and
a challenging task in itself [39], [40]. We require a measure of evolves to a new state (x+ , + ) (at time k + 1) with probability
dissimilarity between two sets of tracks, which: (i) is physically density f+ (x+ |x, )δ [+ ] or dies with probability 1 − PS (x, ).
meaningful; (ii) satisfies the properties of a metric for mathemat- The surviving label space L Lk is given by a disjoint union
ical consistency; and (iii) is computable for scenarios involving of birth label spaces Bt for all times t = 0, . . . , k, i.e. Lk =
k
t=0 Bt . To ensure that the birth label spaces are disjoint,
millions of tracks. The optimal sub-pattern assignment (OSPA)
2756 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 68, 2020
each birth label is constructed as an ordered pair, consisting I∈F(L) w(I,c) = 1 and each p(c) (·, ) is a probability
c∈C
|Bt |
of the birth time and a unique index, i.e. Bt = {(t, i)}i=1 . density on X. For convenience, we denote the space of all GLMB
The set B + of new objects (born at time k + 1) with birth densities on F(X × L) by GL , and adopt the following abbrevi-
label space B+ Bk+1 is distributed according to the labeled ated notation for a GLMB density in terms of its parameters
multi-Bernoulli (LMB) density
L(B + ) π w(I,c) , p(c) (I,c)∈F(L)×C . (3)
(B+ ) B+
f B,+ (B +) = Δ(B +) 1B+ rB,+ [1−rB,+]B+ −L(B + ) pB,+ ,
If the multi-object filtering density at the current time step is
where [h]X x∈X h(x) (with [h]∅ = 1) is a multi-object a GLMB given by (3), then the multi-object filtering density at
exponential, rB,+ () is the probability that a new object with the next time step, given by the multi-object Bayes recursion
label is born, and pB,+ (·, ) is the distribution of its kinematic
state [36]. The multi-object state X + (at time k + 1) with label π + (X + |Z+ ) ∝ g (Z+ |X + ) π(X + ) f (X + |X)δX, (4)
space L+ L ∪ B+ is formed by the union of surviving objects
and new born objects. Using the standard assumption that, is also a GLMB with parameters [38]
conditional on X, objects move, appear and die independently of
(I ,c,θ ) (c,θ )
each other, the expression for the multi-object transition density π + = wZ++ + , pZ+ + (5)
(I+ ,c,θ+ )∈F(L+ )×C×Θ+
f + is given by [36]–[38]
where
(B+ )
f + (X + |X) = f S,+ (X + ∩(X×L) |X)f B,+ (X + −(X×L)) (I ,c,θ+ )
wZ++ ∝
(I,c,I+ ,θ+)
w(I,c) ωZ+ , (6)
where I
I−I+ I∩I+
f S,+ (W |X) = Δ(W )Δ(X)1L(X) (L (W ))[Φ (W ; ·)]X (I,c,I+,θ+)
ω Z+
(c)
= 1Θ+ (I+ ) (θ+ ) 1 − P̄S
(c)
P̄S ,
Φ (W ; x, ) = 1 − 1L(W ) () (1 − PS (x, )) (7)
B ∩I+ (c,θ ) I+
+ δ [+ ] PS (x, )f+ (x+ |x, ) . × [1 − rB,+ ]B+ −I+ rB,+
+
ψ̄Z+ + ,
(x+ ,+ )∈W (8)
(c)
B. Multi-Object Observation Model P̄S () = p(c) (·, ) , PS (·, ) , (9)
For a given multi-object state X, each (x, ) ∈ X is either de-
(c,θ ) (c) (θ ( ))
tected with probability PD (x, ) and generates a detection z with ψ̄Z+ + (+ ) = p̄+ (·, + ) , ψZ++ + (·, + ) , (10)
likelihood g(z|x, ) or missed with probability 1 − PD (x, ).
The multi-object observation Z is the union of the observations (c) PS (·, +)f+ (x+ |·, +) , p(c) (·, +)
from detected objects and Poisson clutter with intensity κ. The p̄+ (x+ , +) = 1L (+) (c)
P̄S (+ )
multi-object likelihood function is given by [36]–[38]
(θ()) + 1B+ (+ ) pB,+ (x+ , + ) , (11)
g (Z|X) = ψZ (x, ) (1)
(c) (θ (+ ))
θ∈Θ(L(X)) (x,)∈X
(c,θ ) p̄+ (x+ , + ) ψZ++ (x+ , + )
pZ+ + (x+ , + ) = (c,θ )
. (12)
where Θ is the set of positive 1-1 maps θ : L → {0 : |Z|}, i.e. ψ̄Z+ + (+ )
maps such that no two distinct labels are assigned the same
positive value, Θ(I) is the subset of Θ with domain I, and The recursive propagation of the current filtering density
(3) to the next time is more compactly expressed by a GLMB
PD (x,)g(zj |x,) joint prediction and update operator Ω : GL → GL+ defined by
(j) κ(zj ) , j ∈ {1, . . . , |Z|}
ψ z (x, ) = . (B ) (B )
{ 1:|Z| } 1 − PD (x, ) , j = 0 Ω(π; f B + , Z+ ) = π + according to (5)–(12), where f B + is
the next birth density and Z+ is the next measurement set.
The map θ specifies that object generates detection zθ() ∈ Z,
with θ() = 0 if is undetected. The positive 1-1 property means III. LARGE-SCALE GLMB FILTERING
that θ is 1-1 on { : θ() > 0}, and ensures that any detection in
Z is generated from at most one object. Due to practical limitations on computational resources, the
original implementation of the GLMB filter proposed in [36],
[37] cannot accommodate a very large number of objects simul-
C. Generalised Labelled Multi-Bernoulli Random Finite Sets taneously. The main computational bottleneck occurs in the mea-
A generalised labeled multi-Bernoulli RFS is defined as a surement update, which involves processing each component of
class of labeled RFS that is distributed according to a multi- the predicted GLMB density using Murty’s algorithm to find
object density with the form the K most significant components according to their weights. If
X there are N labels in a component and M measurements in total,
π (X) = Δ (X) w(I,c) δI (L (X))) p(c) , (2) then the complexity of the update is O(K(N + M )3 ). Murty’s
(I,c)∈F(L)×C algorithm can be replaced with Gibbs sampling to reduce the
computational complexity of processing each component down
where F(L) is the space of all finite subsets of L, C is some to O(KN 2 M ) [38]. However, the quadratic complexity in the
finite space, each w(I,ξ) is a non-negative weight such that number of objects will still render this algorithm infeasible for
BEARD et al.: SOLUTION FOR LARGE-SCALE MULTI-OBJECT TRACKING 2757
tracking a large number of objects simultaneously. Arguably, filtering density is an L-partitioned GLMB on F(X × L),
any feasible solution for large-scale multi-object tracking should
(L)
have a maximum computational complexity of approximately πL = πL . (13)
O(KN M · log(N M )). L∈L
In many practical multi-object scenarios, the objects are Consider a new partition L+ ∈ P(L+ ), where each set of la-
not uniformly distributed across the state space, but often in bels L ∈ L+ is associated with a set of gated measurements
separate groups. This structure can be exploited to improve (L) (I) (J)
Z+ ⊂ Z+ , such that Z+ ∩ Z+ = ∅ for I
= J. We seek
computational efficiency of the multi-object tracker. Rather
to approximate the new filtering density as an L+ -partitioned
than representing the entire multi-object density as one “large”
GLMB on F(X × L+ ).
GLMB, we can approximate it as a product of much “smaller”
For a particular choice of L+ ∈ P(L+ ) and Z+ ∈ P(Z+ ),
GLMBs, herein referred to as label-partitioned GLMBs. This is
let A+ (L+ , Z+ ) denote the space of all positive 1-1 mappings
based on the premise that a large GLMB, with well-separated
A+ : L+ → {∅} Z+ , where positive 1-1 means 1-1 on Z+ .
groups of estimated objects, is decomposable into a product
In other words, a given A+ ∈ A+ (L+ , Z+ ) maps each group
of almost independent smaller GLMBs. Consequently, such an
of labels in the partition L+ , to either {∅} or a unique group
approximation results in a negligible loss of information, whilst
of measurements in the partition Z+ , i.e. A+ (L) is the set of
providing significant gains in computational efficiency.
measurements corresponding to the set of labels L. Ideally, we
seek the approximation
A. Label-Partitioned GLMB
(L)
A labeled RFS density on F(X × L) is said to be label- π L+ ,Z+ ,A+ = π L+ ,Z+ ,A+ (14)
L∈L+
partitioned if it can be written as the following product
(L) (L)
π L+ ,Z+ ,A+ = Ω π L ; f B
(L∩B+ )
, A+ (L) (15)
π L (X) = π L (X ∩ (X × L)) ,
L∈L
where
where L is some partition of the label space L, and each
(L) (L+ , Z+ , A+ ) = arg min DKL π + , π L+ ,Z+ ,A+ (16)
factor π L is a labeled RFS density
on F(X × L). Note that L+ ∈P(L+ )
for all X ∈ F(X × L), X = L∈L X ∩ (X × L),1 and hence Z+ ∈P(Z+ )
A+ ∈A(L+ ,Z+ )
{F(X × L)}L∈L is also a partition of F(X × L). We call π L ,
(L)
denoted by its factors {π L }L∈L , an L-partitioned labeled RFS s.t. |L| ≤ Lmax , ∀L ∈ L+ (17)
(L)
density. Further, if each factor π L is a GLMB, then π L is said to and DKL (·, ·) denotes the Kullback-Leibler divergence (KLD).
be an L-partitioned GLMB on F(X × L). We denote by GL (L) Here Lmax is a user parameter determined by the available
the space of all L-partitioned GLMBs on F(X × L). computational resources. Higher values of Lmax result in a more
Suppose that the current filtering density π L is an L- accurate approximation to the multi-object filtering density,
partitioned GLMB on F(X × L). Then, the prediction to the however, more memory and faster processing will be required
next time is also an L-partitioned GLMB. However, the resulting to achieve real-time performance. Smaller values yield a coarser
filtering density generally does not take on the same form. While approximation, but real-time performance is achievable with less
the new filtering density π + is still a GLMB that, in principle, computational resources. Choosing Lmax = 1 is equivalent to
can be computed [36]–[38], a direct computation via expansion running parallel Bernoulli filters [3, Section 14.7].
of the product and subsequent update is not scalable in practice. The combinatorial optimization problem in (16) is intractable,
Moreover, due to object births, deaths and transitions, the current since the space of partitions is prohibitively large. We propose
partition L of L is unsuitable as a basis for approximating the an approximate two-step solution which is tractable for large-
new filtering density. scale problems. The first step involves choosing a suboptimal
Nonetheless, when estimated objects in the new filtering den- partition of L+ , subject to a constraint on the maximum group
sity also occur in separate groups, as do the sets of measurements cardinality. The second step involves parallel computation of the
which are associated with different groups of objects, it is possi- factors in the new filtering density, directly as a product of factors
ble to exploit this structure to improve computational efficiency. from the old filtering density. This strategy avoids the explicit
To maintain scalability and parallelisability, we approximate the expansion of the product, and subsequent refactorization after
new filtering density as a label-partitioned GLMB. This entails update, which would be intractable. These steps are described
selection of the optimal partition of labels and measurements, in the following two subsections.
and the optimal association of groups of labels to groups of
measurements, in some statistical sense. Intuitively the selection
of the partitions and associations should result in negligible B. Label Partitioning
statistical dependence between labels and measurements across We aim to find a partition of the label space such that any pair
different groups. of labels that do not appear in the same group are approximately
Let P(L), P(L+ ) and P(Z+ ) denote the sets of all partitions statistically independent. In the standard multi-object tracking
of the current label space L, the new label space L+ and the model, as defined in Sections II-A and II-B, this statistical
new measurements Z+ respectively. Suppose that the current dependence arises solely via the uncertainty in the unknown
association between measurements and objects. That is, if a
1 The disjoint union notation (), in expressions involving unions over a particular detection could have originated from one of several
partition, would be equivalent to the standard union. However, we use it to objects, then there will be a statistical dependence between those
serve as a reminder that the constituent sets of the union are disjoint. objects in the multi-object filtering density.
2758 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 68, 2020
Intuitively, tracks that are well-separated in the measurement following separable form
space will have low statistical dependence, because the proba- ⎛ ⎞
bility that these tracks give rise to closely spaced measurements (L) (L)
(L) (L)
is extremely low. In this context, “well-separated” means that ĝ ⎝ Z+ | X+ ⎠ g Z+ |X + (21)
the distance between tracks in the measurement space is large L∈L+ L∈L+ L∈L+
compared to the measurement noise and the uncertainty in the which facilitates a fast parallel evaluation of the label-partitioned
objects’ predicted location. This is the key property that we ex- GLMB filtering density, as described in subsection III-C.
ploit in order to partition the label space in a way that minimises A naive approach to partitioning L+ , subject to the conditions
the amount of potential measurement sharing between objects above, will have computational complexity O(|L+ |2 ) since all
in different groups. possible label pairs must be examined to determine whether their
In principle, this can be achieved by analysing the distribu- gating regions intersect. This is clearly infeasible for large-scale
tions of the predicted measurements corresponding to all objects tracking problems. Fortunately, techniques in computational
represented in a GLMB. Suppose each factor has the form geometry can be applied to dramatically reduce the computa-
tional complexity of the partitioning, thereby making it feasible
(L) (I,c) (c)
πL = wL,L , pL,L (18) for such problems. A method for efficient implementation is
(I,c)∈F(L)×C(L) discussed in Section IV-A.
Then for each label ∈ L+ we compute the distribution of the
C. Computing Label-Partitioned GLMB Filtering Density
predicted measurement
Suppose the current filtering density is an L-partitioned
p̃+ (z, ) =
g(z|·, ), p+ (·, ) (19) GLMB, and that we are given a new partition L+ of L+ . The
goal is to approximate the filtering density at the next time as
via the distribution of the predicted state an L+ -partitioned GLMB. To achieve this in a way that is scal-
able to problems involving a very large number of objects, we
consider approximating the current filtering density according
p+ (x, ) ∝ 1B+ () rB () pB (x, ) to a modified partition structure S = {L ∩ L : L ∈ L+ }, i.e. we
take each element of the new partition L+ , and intersect it with
+ 1L () 1I (L) ()
the current label set L. The current filtering density can then be
L∈L (I (L) ,c(L) )∈
approximated as an S-partitioned GLMB
F(L)×C(L)
(S)
(I (L) ,c(L) ) (c(L) ) πS = πS (22)
× wL,L pL,L (·) , f (x|·, ) . (20) S∈S
that mimimises DKL (π L ; π S ). This approximation is given
explicitly in Proposition 1 (see Appendix A for proof).
The distribution p̃+ (·, ) can be used to construct a “measure- Proposition 1: Given an L-partitioned GLMB π L =
ment gating region,” B() ⊆ Z for each label ∈ L+ , which (L)
contains the majority of the probability mass for the predicted {π L }L∈L on F(X × L), and suppose that S is another
measurement. These gating regions are the basis for partitioning partition of L. Then the S-partitioned labeled RFS density
(S)
the next label space L+ . A partition L+ = {L1 , . . . , L|L| } is π S = {π S }S∈S that minimises DKL (π L ; π S ), is an
formed by splitting L+ into groups of labels whose correspond- S-partitioned GLMB, with GLMB factors
ing gating regions intersect (either directly, or indirectly via (L,S)
(S)
a sequence of labels in the same group), and where there is π S X (S) = π L,S X (S)
no intersection between gating regions for labels in different L∈L
groups. Furthermore, for computational feasibility, the gating where
regions must be sufficiently small such that L+ can be partitioned (L,S)
π L,S =
(H,c) (c)
wL,S,L,S , pL,S,L,S
into groups no larger than some predefined cardinality threshold (H,c)∈F(L∩S)×C(L)
Lmax . That is, L must satisfy the following three conditions: (H,c)
(H∪W,c)
1) For all L ∈ L+ , and for any i , j ∈ L, either wL,S,L,S = wL,L
B(i ) ∩ B(j )
= ∅ or there exists {1 , . . . , n } ⊆ L W ∈F(L−S)
such that B(i ) ∩ B(1 )
= ∅, B(1 ) ∩ B(2 )
= (c) (c)
∅,. . . , B(n−1 ) ∩B(n )
= ∅, B(n ) ∩ B(j )
= ∅ pL,S,L,S (x, ) = 1()pL,L (x, )
2) [ ∈Li B()] ∩ [ ∈Lj B()] = ∅, for all i, j ∈ for each (L, S) ∈ L × S such that L ∩ S
= ∅.
{1, . . . |L+ |}, i
= j, Given the approximation (22) to the current filtering density,
3) |L| ≤ Lmax for all L ∈ L+ . and the separable multi-object likelihood (21), the joint predic-
Under these conditions, the label space is partitioned such tion and update can be applied to all factors of π S+ in parallel,
that there is no overlap between the regions of Z corresponding yielding the new L+ -partitioned GLMB filtering density,
to the groups of labels represented by L+ , taking into account
(J)
the prediction and likelihood. Consequently the multi-object π L+ ,+ = π L+ ,+ . (23)
filtering density at the next time should exhibit negligible sta- J∈L+
tistical dependence between different groups of the partition An explicit expression for (23) is given in Proposition 2 (see Ap-
(L)
{X + = X + ∩ F(X × L) : L ∈ L+ }. Hence, we can assume pendix A for proof). Further details for efficient implementation
that the multi-object likelihood can be well-approximated by the are discussed in Sections IV-B and IV-C.
BEARD et al.: SOLUTION FOR LARGE-SCALE MULTI-OBJECT TRACKING 2759
then the distribution of the predicted measurement is also a
Gaussian mixture
U ()
p̃+ (z; ) = w(u) () N z, m̃(u) () , P̃ (u) () , (24)
u=1
-
l l
l - m̃(u) () = HF m(u) () , (25)
l -
P̃ (u) () = R + H(Q + F P F T )H T . (26)
For tractability, the distribution of the predicted measurement is
l - l -
then approximated as a uniform mixture
U ()
p̃+ (z, ) ≈ U z; B (u) () (27)
ll l
u=1
l -
where U (·; B (u) ()) is a uniform distribution on an axis-aligned
hyper-rectangle B (u) () that should correspond to the region
l - l - l - where N (·, m̃(u) (), P̃ (u) ()) has significant mass. An efficient
method for computing this is to consider the ellipsoidal gate
centered on m̃(u) () and shaped according to P̃ (u) (), which
Fig. 1. High-level flow diagram of the large-scale GLMB filter. has probability PG of containing the received measurement. The
axis-aligned hyper-rectangle B (u) () can be chosen as a tight
bounding box on the ellipsoidal gate, which can be computed in
Proposition 2: Given a separable multi-object likelihood
simple closed form as a function of m̃(u) (), P̃ (u) () and PG .
(21), where L+ is partition of L+ , and suppose that the current
For any L ⊆ L+ , define the prediction gate
multi-object filtering density is an S -partitioned GLMB of the
(J∩L) ⎡ ⎤
form π S = {π S }J∈L+ where U ()
B(L) = ⎣ B (u) ()⎦ . (28)
(J∩L) (I,c) (c)
πS = wS,J∩L , pS,J∩L . ∈L u=1
(I,c)∈F(J∩L)×C(J)
We seek a partition L+ of L+ which maximises |L+ | and satisfies
Then the multi-object filtering density at the next time is the
(J)
L+ -partitioned GLMB π L+ ,+ = {π L+ ,+ }J∈L+ where B(I) ∩ B(J) = ∅, ∀I, J ∈ L+ , I
= J. (29)
A solution can always be found via the following procedure
(J) (J∩L) (J∩B ) (J)
π L+ ,+ = Ω π S ; f B + , Z+ with O(RT logd RT + RI + RL ) complexity, where d is the
dimension of the measurement space, RT is the total number
(B ) of hyper-rectangles, RI is the number of intersecting hyper-
and Ω(·; f B + , Z+ ) : GL → GL+ is the joint prediction and
update operator. rectangle pairs, and RL is the total number of labels at the current
time. A segment-tree [44]–[46] is first constructed containing all
hyper-rectangles, which is used to find all intersecting prediction
IV. IMPLEMENTATION gates, with complexity O(RT log RT + RI ). A graph is then
In this section we describe in more detail our implementation constructed, consisting of one node for each label at the current
of a large-scale GLMB filter, based on the concepts introduced time, and an edge for every pair of labels whose prediction gates
in the previous section. The algorithm is composed of several intersect. The connected components of the graph are found
modules as shown in Fig. 1. The details of each of these modules via depth-first search, which has O(RL ) complexity, and can
are discussed in the following subsections. be further accelerated via parallel processing. The connected
components then give the desired partition of L+ . However, if
the additional constraint |L| ≤ Lmax , ∀L ∈ L+ is not satisfied, it
A. Step 1: Label Space Partition Selection is necessary to reduce the value for PG and repeat the procedure
We proceed under the standard assumptions of linear- until a feasible solution is found.
Gaussian transition and measurement models
B. Step 2: Birth Factorisation and GLMB Repartitioning
f+ (x+ |x, ) = N (x+ ; F x, Q),
Once an appropriate label space partition L+ of L+ has been
g (z|x, ) = N (z; Hx, R). (B )
found, we proceed to factorise the LMB birth density f B + as
If the prior distribution for track is a Gaussian mixture a product over the partition B+ = {L ∩ B+ : L ∈ L+ } of B+ ,
and to approximate the current filtering density πL as a product
U ()
over the partition S = {L ∩ L : L ∈ L+ } of L.
p (x; ) = w(u) () N x, m(u) () , P (u) () The LMB birth is exactly factorised as a B+ -partitioned LMB
(B ) (B )
u=1 f B + = {f B + }B+ ∈B+ . The minimum KLD approximation
2760 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 68, 2020
(S) In addition, pruning is necessary for entire factors of the
to π L as an S-partitioned GLMB, π S = {π S }S∈S , is ob-
tained via Proposition 1. Consider a given S ∈ S, then for each GLMB filtering density, also with the aim of improving the
(L,S) algorithm’s computational efficiency. The idea is to eliminate
L ∈ L, we calculate the GLMB π L,S for labels which are com-
entire factors which have negligible contribution to the new
(S) (L,S)
mon to both S and L. The expression for π S = L∈L π L,S filtering density. A simple criterion is the probability that a factor
is given by expanding the product over L ∈ L. In practice, this has zero cardinality. Factors for which this probability exceeds
does not necessary require enumerating all pairs (L, S), because a certain threshold are simply deleted. An alternative is to retain
some combinations may satisfy L ∩ S = ∅ resulting in a trivial only a fixed number of factors with the highest probability of
(L,S) having non-zero cardinality.
expression for π L,S . It is possible to identify pairs for which
L ∩ S
= ∅ by querying an appropriate map structure used to
represent the GLMB factors with logarithmic complexity. In E. Step 5: Measurement Driven Birth
(S)
addition, the computation of the final product for π S can The association probabilities computed in step 2 capture the
be efficiently implemented using a k-shortest path algorithm likelihood that the given measurement originated from any one
or stochastic sampling strategy to truncate the result without of the existing objects. These probabilities are now used to
explicit expansion. Note that this factorisation and repartitioning construct a labelled multi-Bernoulli distribution, which will
step is trivially parallelisable in each of the factors. serve as the birth density for the next iteration of the filter.
Each measurement with association probability below some
pre-defined threshold is used to generate a component of the
C. Step 3: Parallel Propagation of Label-Partitioned GLMB LMB distribution. The measurement itself, along with prior
Given the new partition L+ of L+ , in preparation for the distributions on the unobserved state components, are used to
update, it is necessary to compute for each L ∈ L+ , the non generate a birth density for each component. This so-called
(L) “measurement-driven” approach requires fewer prior assump-
intersecting measurement sets Z+ = Z+ ∩ B(L) that fall in-
tions regarding the initial state of newborn objects than static
side the region B(L) found in Step 1. This will automati-
(I) (J) birth models.
cally satisfy Z+ ∩ Z+ = ∅ since B(I) ∩ B(J) = ∅ for any
I, J ∈ L+ . A K-dimensional (K-D) tree data structure [47], can
(L) F. Step 6: Estimate Extraction
be used to find Z+ efficiently with O(RU |Z+ |1−1/d + RZ )
A parallelisable strategy for extracting labelled estimates of
where d is the dimension
complexity,
of the measurement space,
the current object states is to process each factor independently
RU = ∈L U () and RZ = ∈L |Z+ ∩ B()|.
Since both the likelihood and prediction can be written using standard approaches. The estimates for a particular factor
as products over the same partition, the factors of the L+ - can be obtained by first finding the maximum a-posteriori (MAP)
partitioned GLMB filtering density can be computed in par- cardinality estimate for the number of objects, then finding the
allel independently as shown in Proposition 2. This is a key highest weighted component with the relevant cardinality, and
feature that allows us to address large-scale multi-target tracking finally selecting either the MAP/EAP estimates for each label.
problems, by utilising modern multi-core architectures. For each The overall multi-object estimate is obtained by taking the union
L ∈ L+ , computation of the filtering density is given by Proposi- of the estimates from all factors. These are subsequently used
tion 2, via the GLMB joint prediction and update operator Ω with to update the track estimates, by matching up the estimates with
(L∩L) (L∩B ) (L) corresponding labels across different times.
inputs π S , f B + ,Z+ . Our implementation follows [38],
making use of a Gibbs sampler to efficiently generate GLMB
components with high filtering weights, while also maintaining V. MULTI-OBJECT TRACKING PERFORMANCE METRIC
diversity across the generated samples. In [41], the optimal sub-pattern assignment (OSPA) metric
Finally, an “association probability” is evaluated for each was proposed as a mathematically consistent and physically
measurement in Z+ , which is used to generate the LMB birth meaningful distance between two sets of points. This has found
density for the subsequent iteration of the filter. For a mea- widespread use in the evaluation of multi-object tracking per-
surement z ∈ Z+ that falls inside the bounding region B(L), formance, where it has become a common practice to present
this probability can be computed after the update of the factor a plot of the OSPA distance between the estimated and true
corresponding to L, by summing the weights of all GLMB com- multi-object states versus time. While this can provide an in-
ponents in which the measurement was assigned to an object. dication of the multi-object tracking performance, it does not
For measurements that do not fall inside any bounding region, fully account for errors between the estimated and true sets
the corresponding association probability is set to zero. of tracks. Specifically, the OSPA distance between multi-object
states does not penalise phenomena such as track switching and
fragmentation in a consistent fashion.
D. Step 4: Pruning Label-Partitioned GLMB Filtering Density In [39], a metric called OSPA for tracks (OSPA-T) was
To improve computational efficiency, for each factor of the proposed, along with a method for its approximate computation.
GLMB filtering density we prune its constituent components by The disadvantage of this approach is that the approximation
removing those whose contributions are deemed to be insignifi- no longer satisfies the axioms of a metric, and thus it may
cant. In previous implementations of the GLMB filter [36]–[38], not behave as one would expect. A number of drawbacks of
a simple component pruning procedure was used, whereby a the approximate OSPA-T were discussed in [40], in which the
fixed number of highest weighted components were retained authors defined another metric that alleviates these drawbacks.
after each iteration, or components with weights below a certain However, this metric is computationally intractable for most
threshold were deleted. practical problems. In [48], two metrics inspired by the CLEAR
BEARD et al.: SOLUTION FOR LARGE-SCALE MULTI-OBJECT TRACKING 2761
MOT heuristics for tracking performance evaluation in computer
vision were proposed. The drawbacks of these metrics were
pointed out in [49], and two other metrics were proposed. How-
ever, these metrics are not suitable for evaluating multi-object
tracking performance in large-scale scenarios.
Since the OSPA distance between two sets is constructed from
the base-distance between the elements of the sets, a simple way
of using it to evaluate multi-object tracking performance is to
choose the base-distance to be a distance between tracks. When
this base-distance is also constructed via OSPA, the result is the
OSPA metric on an OSPA base-distance, which we call OSPA(2) Fig. 2. Hypothetical scenario with two well-separated tracks inside a relatively
to distinguish it from the standard use. A meaningful OSPA(2) long time window.
depends on a meaningful base-distance between tracks.
This section presents the OSPA(2) metric together with a phys-
ically meaningful base-distance between tracks. We also develop where
a scalable algorithm for computing this OSPA(2) distance (ex- ⎧
actly), capable of evaluating large-scale tracking performance. ⎨0, |φ| = |ψ| = 0
We begin by defining the following notation: d(c) (φ, ψ) = c, |φ| =
|ψ| .
r T = {1, 2, . . . , K} is a finite space of time indices (rep- ⎩
min c, d φ(1) , ψ (1) , |φ| = |ψ| = 1
resenting times {t1 , t2 , . . . , tK }), which includes all time (32)
indices from the beginning to the end of the scenario.
r X is the single-object state space, and F(X) is the space of Note that in (31) {x(t)} is a singleton if t ∈ Dx , or {x(t)}
finite subsets of X. is empty if t ∈ / Dx (and likewise for {y(t)}). In this case, the
r U is the space of all functions mapping time indices in T to OSPA distance defined in (30) reduces to (32). Hence, the order
state vectors in X, i.e. U = {f : T → X}. We refer to each parameter p becomes redundant, and is omitted.
element of U as a track. The base-distance (31) exhibits counter-intuitive be-
r For any f ∈ U, its domain, denoted by Df ⊆ T, is the set haviour [43], wherein a pair of well-separated tracks may have
of time instants at which the object exists. a smaller distance than expected. If there are two short tracks
Also, recall that for a function d(·, ·) to be called a “metric,” inside a much longer window, then averaging over this window
it must satisfy the following four properties: may result in a relatively small distance, even if the tracks are
P1. d(x, y) ≥ 0 (non-negativity), separated by a distance of c. For example, consider the scenario
P2. d(x, y) = 0 ⇐⇒ x = y (identity), illustrated in Fig. 2. Inside the window of length T = 100, there
P3. d(x, y) = d(y, x) (symmetry), are exactly two tracks, x with domain Dx = {91 : 95}, and y
P4. d(x, y) ≤ d(x, z) + d(z, y) (triangle inequality). with domain Dy = {96 : 100}. If the base-distance is calculated
(c)
The OSPA distance dp (φ, ψ) between φ, ψ ∈ F(X) with according to (31), i.e. as an average over t ∈ {1, . . ., T } then
order p and cutoff c is defined as follows [41]. For φ = d˜(c) (x, y) = c/10. Furthermore, as the length of the window
{φ(1) , φ(2) , . . . , φ(m) } and ψ = {ψ (1) , ψ (2) , . . . , ψ (n) }, m ≤ n increases, the value of this base-distance decreases, which is
counter intuitive, as the two tracks do not overlap in time, and
are indeed very far apart. Thus it would actually be expected that
d(c)
p (φ, ψ) the base-distance assign the maximum penalty c.
$ $ %%1/p This issue can be resolved by constructing the distance
1
m p
min d¯(c) φ(i) , ψ (π(i)) + cp (n − m) (30) d˜(c) (x, y) between two tracks x, y ∈ U as the mean OSPA
n π∈Πn i=1 distance between the set of states defined by x and y, over all
times t ∈ Dx ∪ Dy , i.e.
where d¯(c) (φ(i) , ψ (i) ) min(c, d(φ(i) , ψ (i) )), in which d(·, ·) is d(c) ({x(t)},{y(t)})
, Dx ∪ Dy
= ∅
(c) (c) ˜(c)
d (x, y) = t∈Dx ∪Dy |Dx ∪Dy | .
a metric on the space X. If m > n, then dp (φ, ψ) dp (ψ, φ).
(c) (c) 0, Dx ∪ D y = ∅
Additionally, dp (∅, φ) c, and dp (∅, ∅) 0. (33)
Note that in (30), the factor of 1/n, which normalises the
distance by the number of objects, is crucial for the OSPA to Averaging over Dx ∪ Dy , instead of {1, . . ., T } as per (31),
have the intuitive interpretation as a per-object error. results a base-distance with intuitively consistent behaviour.
For the example in Fig. 2, this choice of base-distance gives
d˜(c) (x, y) = c. Notice that even if the window is expanded, the
A. Base-Distance Between Tracks distance still evaluates to the cutoff value c, as we would expect.
We now use the OSPA distance (30) to construct a base- In order to use (33) as a base-distance between tracks, we need
distance as a metric on the space U of tracks. One simple to establish that it defines a metric on U. That is, it must satisfy
base-distance is [42]: the properties P1-P4 as previously mentioned.
Proposition 3: Let d(c) (·, ·) be a metric on the finite subsets
of X, such that the distance between a singleton and an empty
1 (c)
T
d˜(c) (x, y) = d ({x(t)} , {y(t)}) (31) set assumes the maximum attainable value c. Then the distance
T t=1 between two tracks as defined by (33) is also a metric.
2762 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 68, 2020
The proof of this proposition involves some rather lengthy assignable pairs of tracks in O(|wk |m log n) time. Once these
algebraic manipulations, and as such, it has been relegated to have been found, a sparse optimal assignment algorithm can
Appendix B. This result establishes that (33) is indeed a metric be used to obtain the lowest-cost matching between the ground
on the space U, when d(c) (·, ·) is defined according to (32). truth and estimated tracks. Such algorithms can solve sparse
Before proceeding to define OSPA(2) , we make two important assignment problems with a much lower average complexity
observations regarding the properties of this base-distance. than is possible under the dense optimal assignment formulation.
r Since d(c) (·, ·) ≤ c, the distance between tracks saturates 2) Visualisation of OSPA(2) : In practice, it is desirable to
at the value c, i.e. d˜(c) (·, ·) ≤ c. examine the tracking performance as a function of time, so that
r For two tracks x and y such that Dx = Dy , (33) can be trends in algorithm behaviour can be analysed in response to
interpreted as a mean square error (MSE) between x and changing scenario conditions. This can be achieved by plotting
y. Hence, this base-distance can be regarded as a general- αk (X, Y ; wk ) = dˇ(c)
p (Xwk , Ywk ) (35)
isation of the MSE for tracks of different domains.
as a function of k, where wk is a set of time indices that varies
B. OSPA(2) for Tracks with k, and
The distance between two tracks as defined in Section V- Xwk = {x|wk : x ∈ X and Dx ∩ wk
= ∅} , (36)
A is both a metric on the space U, and bounded by Ywk = {y|wk : y ∈ Y and Dy ∩ wk
= ∅} , (37)
the value c. It is therefore suitable to serve as a base-
distance for the OSPA metric on the space of finite sets of where f |w denotes the restriction of f to domain w.
tracks F(U). Let X = {x(1) , x(2) , . . . , x(m) } ⊆ F(U) and Y = Note that the sets Xwk and Ywk only contain those tracks
{y (1) , y (2) , . . . , y (n) } ⊆ F(U) be two sets of tracks, where m ≤ whose domain overlaps with wk , i.e. any tracks whose domain
lies completely outside the set wk are disregarded. Choosing
n. We define the distance dˇp (X, Y ) between X and Y as the
(c)
different values for the set wk allows us to examine the per-
OSPA with base-distance d˜(c) (·, ·) (the time averaged OSPA formance of tracking algorithms over different time scales. For
given by equation (33)). That is, example, a straightforward approach is to set wk = {k − N +
1, k − N + 2, . . . , k}, so that at time k, the set wk consists of
dˇ(c)
p (X, Y ) only the latest N time steps. In this case, choosing a small
$ $ %%1/p value for N will indicate the tracking performance over shorter
1 m p
min ˜
d (c) (i) (π(i))
x ,y +c (n − m)
p
, time periods, while larger values will reveal the longer-term
n π∈Πn i=1 tracking performance. This choice is highly dependent on the
(34) application at hand. For example, in real-time surveillance, we
may only be interested in tracking objects over a period of a
where c is the cutoff and p is the order parameter. If m > n, few minutes, as older information may be considered irrelevant.
then dˇp (X, Y ) dˇp (Y, X). Note also that dˇp (∅, X) c,
(c) (c) (c) In this case, a small value for N would suffice to capture the
important aspects of the tracking performance. However, in an
and dˇp (∅, ∅) 0. We call this distance OSPA-on-OSPA or
(c)
off-line scenario analysis application, we might require accurate
OSPA(2) , which can be interpreted as the time-averaged per-track trajectory estimates over much longer time periods, in which
error. case a larger value for N would be more appropriate.
1) Efficient Evaluation of OSPA(2) : Evaluating (34) involves Furthermore, examining the same scenario using OSPA(2)
the following three steps: with different window lengths could reveal important insights
1) Compute an m × n cost matrix C, with entries Ci,j = into the relationship between long-term and short-term per-
d˜(c) (x(i) , y (j) ), according to (33). formance. For example, the design of a multi-object tracking
2) Solve a 2-D assignment problem with cost matrix C, to system often involves trade-offs between estimation accuracy
find the minimum cost 1-1 assignment of columns to rows. and response time, and comparing OSPA(2) with long and short
3) Use the result of step 2 to evaluate dˇp (X, Y ) via (34).
(c)
windows could help to characterise the nature of this trade-off.
A basic implementation of this procedure would require Note that computing the OSPA(2) on a sliding window as
computing the base-distance between all pairs of tracks in X described above, converges to the traditional OSPA (for sets of
and Y , which has complexity O(|wk |mn). Step 2 then requires points) as N becomes smaller. For N = 1 the OSPA(2) becomes
solving a dense optimal assignment problem with complexity identical to the traditional OSPA.
O((m + n)3 ). This would preclude its use in large-scale track- It is important to understand that the OSPA(2) distance has a
ing scenarios involving millions of objects, as the cost matrix different interpretation to that of the traditional OSPA distance.
would consume too much memory, and the assignment problem Whereas the traditional OSPA distance captures the error be-
would be infeasible. tween the true and estimated multi-target states at a single instant
Fortunately, in many practical applications, the base-distance in time, the OSPA(2) distance captures the error between the true
between most pairs of tracks will saturate at the cutoff value c. and estimated sets of tracks over a range of time instants, as
This can be exploited to dramatically reduce the computational determined by the choice of wk . Therefore, careful consideration
complexity, making it feasible for large-scale problems. Firstly, must be given to the design of wk , and the user must be mindful
recall that the time averaging in the base-distance (33) is carried of this when interpreting the results.
out only over the union of the track domains. Consequently, 3) Behavior of the OSPA(2) : The OSPA(2) metric exhibits
the base-distance between any pair of tracks that have no cor- intuitively consistent behavior when faced with various types
responding states closer than a distance of c, must saturate of common tracking errors. For example, errors in localisation,
at c. Such pairs can be considered unassignable, and efficient cardinality, track fragmentation and track label switching, all
spatial searching algorithms can be applied to extract only the yield expected increases in the OSPA(2) distance (illustrations
BEARD et al.: SOLUTION FOR LARGE-SCALE MULTI-OBJECT TRACKING 2763
are shown in [42], [43]). Some of the key points that distinguish
the OSPA(2) from the traditional OSPA distance are as follows:
r A track that is dropped and later regained with the same
identity, yields a smaller increase in the OSPA(2) than a
track that it dropped and regained with a different identity.
r When visualised according to the method described in the
previous section, a longer window effectively sustains the
influence of cardinality and track labelling errors for a
longer duration, i.e. the metric “remembers” mistakes that
were committed by the tracker further into the past.
r A greater frequency of track fragmentation and labelling
errors results in a higher OSPA(2) distance.
Illustrations of how the OSPA(2) metric respond to specific Fig. 3. True and estimated cardinality for large-scale tracking scenario.
scenario events can be found in [42], [43]. Also included is
a larger synthetic example, in which the frequency of track
fragmentation and label switching is varied in order to observe
its influence on the OSPA(2) distance.
VI. NUMERICAL RESULTS
In this section, we demonstrate the proposed GLMB filter
implementation on a simulated large-scale multi-object track-
ing scenario in which the peak number of objects appearing
simultaneously exceeds one million. The scenario runs for 1000
time steps, on a rectangular 64 km by 36 km surveillance region.
The single-object state consists of 2D position and velocity, i.e.
state vectors have the form [ x ẋ y ẏ ]T , with units of metres and
Fig. 4. OSPA and OSPA(2) distance for large-scale tracking scenario.
metres per second.
New objects are generated throughout the scenario by sam-
2
pling from an LMB distribution at each time step. This dis- where ⊗ denotes the Kronecker product, and wk ∼ N (0, σw I2 )
tribution consists of 20,000 components, where the density of is a 2 × 1 independent and identically distributed (i.i.d.) Gaus-
the i-th component is a Gaussian N (·; m(i) , P ) with m(i) = sian process noise vector. In the current scenario, the sampling
[ x(i) 0 y (i) 0 ]T and P = diag[50, 5] ⊗ I2 , where the positional interval is T = 1s, and the process noise standard deviation
elements of m(i) are sampled according to x(i) ∼ U (0, 64000) is σw = 0.2m/s2 . The peak cardinality of approximately 1.2
and y (i) ∼ U (0, 36000). At a given time step k, the existence million objects occurs at time 700, when the mean object density
probabilities of all components in the birth model are set to is around 520 per km2 .
a common value rB,+ , but different values for rB,+ are used We simulate data from a position sensor, corrupted by noise,
within different time intervals according to missed detections and false alarms. An object that exists at time
k with state xk is detected with probability PD = 0.88, in which
) case the object generates a measurement
0.15, k ∈ [1, 400] ∪ [501, 700] * +
rB,+ = , 1 0 0 0
0.01, k ∈ [401, 500] ∪ [701, 1000] z= x + vk ,
0 0 1 0 k
which simulates a variable rate of object birth. With 20,000 birth and vk ∼ N (0, σv2 I2 ) is a 2 × 1 i.i.d. Gaussian measurement
components, the value rB,+ = 0.15 equates to an average birth noise vector, with σv = 5m. Each set of measurements also
rate of 3000 objects per time step, and rB,+ = 0.01 gives an contains false alarms, the number of which is Poisson dis-
average birth rate of 200 objects per time step. It is imperative to tributed with a mean of 200 per km2 (i.e. an overall mean of
note that the filter is not provided with any information about the 200 × 64 × 36 = 460, 800 per scan), and a spatial distribution
locations of the birth components nor their probabilities. Instead, that is uniform across the surveillance region.
it uses a measurement-based approach to adaptively construct For the filter, the number of GLMB components gener-
an LMB birth distribution after each iteration. ated during the update of the i-th factor in the filtering den-
At each time step, objects that existed at the previous time sity is max(min(|L(i) |3 , 5000), 500), i.e. the size of the fac-
survive with probability PS = 0.999, i.e. one out of every 1000 tor’s label space cubed, lower bounded at 500 and upper
objects spontaneously dies on average. This survival probability bounded at 5000. In the pruning step, this is reduced to
is known by the filter. An object that has state xk at time k and max(min( 15 |L(i) |3 , 1000), 100). In the label partitioning step,
survives to time k + 1, takes on a new state according to the the probability threshold for computing the size of the hyper-
discrete white noise acceleration model rectangles representing the bounding regions for each track is
PG = 0.99. The maximum size of the label space of any single
xk+1 = F xk + Gwk , factor is set to Lmax = 20. If the constraint imposed by Lmax
* + * 2 + cannot be satisfied with the chosen value for PG , the partitioning
1 T T /2
F = ⊗ I2 , G = ⊗ I2 , routine is repeated by progressively reducing PG by 20% until
0 1 T the constraint can be satisfied.
2764 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 68, 2020
Fig. 5. Estimated tracks at time step 700 when there is a peak cardinality of 1.2 million objects. The inset is a magnified view of a 2 km by 1 km region.
The tracking algorithm was coded in C++, making use of commit more labelling errors at times when there are many
OpenMP to implement parallel processing wherever possible. new objects appearing. Fig. 5 show a snapshot of the estimated
We executed the algorithm on a machine with four 16-core AMD tracks produced by the proposed large-scale GLMB tracker at
Opteron 6376 processors (for a total of 64 physical processor time 700 when the peak cardinality of approximately 1.2 million
cores), and 256 GB of memory. On this hardware configuration, objects is reached. Two illustrative videos are also provided in
the peak time taken to process a frame of measurements was supplementary materials, one showing only the measurements
approximately 5 minutes when the cardinality was highest, but in time, and the other showing the estimates in time.
the algorithm ran considerably faster than this at times when
there were fewer objects in the scene. The peak memory us- VII. CONCLUSION
age of the algorithm was approximately 50 GB. To evaluate
tracking performance, the OSPA(2) metric was coded in Matlab, We have presented an efficient and scalable implementation
using the “parfor” construct to parallelise some aspects of the of the generalised labeled multi-Bernoulli filter, that is capable
computation. The average time taken to evaluate each point in of estimating the trajectories of a very large number of objects
the OSPA(2) curve was approximately 1 minute. simultaneously, in the order of millions per frame. The pro-
The true and estimated cardinality is shown in Fig. 3, and posed method makes efficient use of the available computational
Fig. 4 shows OSPA(2) distance, as well as the traditional OSPA resources, by decomposing large-scale tracking problems into
distance. For the OSPA calculations, the cutoff was set to smaller independent sub-problems. The decomposition is car-
c = 50m, the order was p = 1, and for the OSPA(2) , a sliding ried out via marginalisation of high-dimensional multi-object
window over the latest 50 time steps was used to evaluate each densities, using a technique that is shown to be optimal in the
point in the curve. From the cardinality plot, it can be seen sense that it minimises the Kullback-Leibler divergence for a
that the estimated cardinality lags behind the true cardinality given partition of the object label space. This allows the algo-
during the times when new targets are being born. This is to rithm to fully exploit the potential of highly parallel processing,
be expected, as the measurement-driven birth model needs to as afforded by modern multi-core computing architectures. Due
consider a very large number of potential birth tracks at each to its relatively low processing time and memory requirements,
scan. To avoid initiating too many false tracks, the filter delays simulations show that the proposed technique is capable of
the initiation of tracks until there is more data to confirm the tracking in excess of a million objects simultaneously, running
presence of an object. The OSPA(2) plot shows increased error on standard off-the-shelf computing equipment. Additionally,
during the periods in which new targets are appearing, due to the we have introduced a new way of using the OSPA metric to
delay in initiating new tracks. At other times, the error stabilises capture multi-object tracking error (rather than filtering error),
to approximately 2.5m per object per unit time. As we would and applied it to evaluate the performance of our large-scale
expect, the OSPA(2) error is consistently higher than the OSPA multi-object tracker.
error, since it penalises incorrect labelling behaviour which is
not captured by the OSPA distance. Notably, the difference APPENDIX
between the two curves is greatest during the times when the
A. GLMB Approximation and Factorisation
true cardinality is increasing. This is to be expected, since track
initiation is arguably one of the most challenging aspects of The proofs of Propositions 1 and 2, draw on the preliminary
multi-object tracking, and we can thus expect the tracker to results Lemma 4, Lemma 5 and Corollary 6 below.
BEARD et al.: SOLUTION FOR LARGE-SCALE MULTI-OBJECT TRACKING 2765
Lemma 4: Consider a labeled RFS density π on F(X × L), (L̄)
× Δ X (L̄) δI L X (L) X (L̄) [p(c) ]X δX (L̄)
and a partition L of L. The L-partition labeled RFS density π L
that minimises the Kullback-Leibler divergence DKL (π; π L ) is (L)
(L) = Δ(X (L) ) w(J∪U,c) δJ (L(X (L) ))[p(c) ]X
given by π L = {π L }L∈L , where J∈F(L) U ∈F(L̄) c∈C
(L̄)
Δ(X (L̄) )δU L X (L̄) [p(c) ]X δX (L̄)
(L)
π L (X ) = π (X) δ(X − X (L) ).
(L)
×
Proof: The proof follows that for the vector case [50], where the last line follows from decomposing the sum over
except that set integrals replace standard integrals. Since F(L), into sums over F(L) and its complement F(L̄).
L is a partition of L, {X (L) = X ∩ (X × L)}L∈L is a Using Lemma 3 of [36],
partition of X, and δ(X − X (L) ) can be replaced with
({})
(L̄)
Δ(X (L̄) )δU (L(X (L̄) ))[p(c) ]X δX (L̄)
∈L−L δX . Expanding the Kullback-Leibler divergence
between π and an arbitrary L-partitioned labeled RFS den- * +H
sity π̃ L and regrouping yields DKL (π; π̃ L ) = DKL (π; π L ) + = δU (H) p (c)
(x, ·)dx
(L) (L)
L∈L DKL (π L ; π̃ L ). The result follows by noting that the H∈F(L̄)
(L) (L)
marginals π̃ L = π L result in the minimum divergence.
= δU (H) = 1.
Lemma 5: Consider a GLMB π=
{(w(I,c) , p(c) )}(I,c)∈F(L)×C on F(X × L), and a partition L of H∈F(L̄)
(L) (L)
L. The L-partitioned labeled RFS density π L = {π L }L∈L Moreover, noting that the argument of p(c) in [p(c) ]X is
that minimises DKL (π; π L ), is an L-partitioned GLMB, with (L) (c) (L)
restricted to X × L, we have [p(c) ]X = [pL,L ]X , and hence
GLMB factors the desired result.
Treating L as L, and C(L) as C in Lemma 5 yields:
(L) (J,c) (c)
πL = wL,L , pL,L
(J,c)∈F(L)×C Corollary 6: Given L ∈ F(L), a GLMB in GL of the form
(J,c) (I,c) (c)
wL,L = w(J∪U,c) π (L) = (wL , pL ) ,
(I,c)∈F(L)×C(L)
U ∈F(L−L)
(c) and any S ∈ F(L) such that L ∩ S
= ∅. The marginal
pL,L (x, ) = 1L ()p(c) (x, ) .
π (L,S) (X (L∩S) ) = π (L) (X (L) )δ(X (L) − X (S) )
Proof: Using the delta form for GLMB [36],
X is a GLMB in GL∩S given by
π (X) = Δ (X) w(I,c) δI (L (X)) p(c)
(H,c) (c)
(I,c)∈F(L)×C π (L,S) = wL,S , pL,S
(H,c)∈F(L∩S)×C(L)
it follows from Lemma 4 that for any X (L) ∈ F(X × L), (H,c)
wL,S =
(H∪W,c)
wL
W ∈F(L−S)
(L)
π L (X (L) ) = π (X) δ X − X (L) (c) (c)
pL,S (x, ) = 1L∩S ()pL (x, ).
= π(X (L) X (L̄) )δX (L̄) Proof of Proposition 1: Applying Lemma 4, the optimal
approximation is given by the product of its marginals
(S)
where L̄ = L − L, and X (L̄) = X − X (L) = X ∩ (X × L̄). π S (X) = π S (X (S) ),
Substituting for π(X (L) X (L̄) ) gives S∈S
(S)
(L)
π L (X (L) ) π S (X (S) ) = π L (X) δ(X − X (S) ).
= Δ X (L) X (L̄) w(I,c) Since {X (L) }L∈L is a partition of X, observe that {(X (L) −
(I,c)∈F(L)×C X (S) )}L∈L is a partition of X − X (S) , hence the integration
(L̄)
can be performed over
× δI L X (L) X (L̄) [p(c) ]X X δX (L̄)
(L)
δ(X − X (S) ) = δ(X (L) − X (S) ).
= Δ(X (L) )Δ X (L̄)
L∈L
w(I,c)
Substituting for π L and regrouping yields
(I,c)∈F(L)×C
(L,S)
(L̄)
(S)
π S (X (S) ) = π L,S (X (S) ),
× δI L X (L) X (L̄) [p(c) ]X [p(c) ]X δX (L̄)
(L)
L∈L
(L)
= Δ(X (L) ) w(I,c) [p(c) ]X (L,S)
π L,S (X (L∩S) ) =
(L)
π L (X (L) )δ(X (L) − X (S) ).
(I,c)∈F(L)×C
2766 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 68, 2020
(L,S) It remains to verify metric property P4 (the triangle inequal-
Applying Corollary 6 to evaluate the marginal π L,S yields the
desired result. ity), which is accomplished via induction. Since d(c) (·, ·) is a
Proof of Proposition 2: Given a partition L+ of L+ , define metric, it is clear that the distance between the tracks at a single
the corresponding partition S of L and B+ of B+ time index 1 (representing time t1 ), satisfies the triangle inequal-
ity. Let us assume that the distance between the tracks over time
S = {J ∩ L : J ∈ L+ } , indices {1, 2, . . . , k} (representing {t1 , t2 , . . . , tk }), satisfies the
B+ = {J ∩ B+ : J ∈ L+ } . triangle inequality. We now proceed to show that the distance
between the tracks over time indices {1, 2, . . . , k, k + 1} also
(J) (J)
Let Z+ J∈L+ Z+ , and X+ J∈L+ X + . Then the next satisfies the triangle inequality.
posterior is given by When at least one of the sets Dx ∪ Dy , Dy ∪ Dz , Dz ∪ Dx
is empty, the triangle inequality for tracks over time indices
π L+ ,+ (X + |Z+ ) ∝ g (Z+ |X + ) f B (X B,+ ) 1, 2, . . . , k, k + 1 can be easily verified, since d(c) (·, ·) is a
metric. Hence, we consider the case where Dx ∪ Dy , Dy ∪ Dz ,
× f S (X S,+ |X S ) π S (X S ) δX S , Dz ∪ Dx are all non-empty.
Some notations are needed for compactness. Let us denote
X S,+ = X + ∩ (X × L) , the cardinalities of the basic sets in Dx ∪ Dy ∪ Dz by
X B,+ = X + ∩ (X × B+ ) .
N |Dx ∩ Dy ∩ Dz | , (38)
Noting that S is a partition of L, it follows that the transition Nx̆ |Dx − Dy − Dz | , Np |Dx ∩ Dy − Dz | , (39)
factors into
(J∩L) (J∩L)
Ny̆ |Dy − Dz − Dx | , Nq |Dy ∩ Dz − Dx | , (40)
f S (X S,+ |X S ) = f S (X S,+ |X S ).
J∈L+
Nz̆ |Dz − Dx − Dy | , Nr |Dz ∩ Dx − Dy | , (41)
Similarly, noting that B+ is a partition of B+ , it follows that the as illustrated in the diagram below.
birth factors into
(B )
(J∩B ) (J∩B )
f B + (X B,+ ) = f B + (X B,+ + ).
J∈L+
Substituting
(J∩L) (J∩L)
π S (X S ) = πS (X S ),
J∈L+
(J) (J)
g (Z+ |X + ) = g(Z+ |X + ),
J∈L+
yields
(J) (J) (J)
π L+ ,+ (X + |Z+ ) = πL+ ,+ (Z+ |X + ),
J∈L+
(J) (J) (J) (J) (J) (J∩B ) (J∩B )
π L+ ,+ (Z+ |X + ) = g(Z+ |X + )f B + (X B,+ + )
(J∩L) (J∩L) (J∩L) (J∩L) Furthermore, we adopt the following abbreviations:
× f S (X S,+ |X S )π S (X S )δX S .
S N + Np + N q + N r , (42)
Applying Proposition 1 in [38], for each J ∈ L+ gives the final
result. T S + |Dx ∪ Dy ∪ Dz | = 2S + Nx̆ + Ny̆ + Nz̆ , (43)
P |Dx ∪ Dy | = S + Nx̆ + Ny̆ , (44)
B. Proof of Proposition 3 (Metric for Tracks)
Q |Dy ∪ Dz | = S + Ny̆ + Nz̆ , (45)
Firstly, since d(c) (·, ·) ≥ 0 and |Dx ∪ Dy | ≥ 0, all terms in the
summation over t ∈ Dx ∪ Dy are clearly non-negative. Hence R |Dz ∪ Dx | = S + Nx̆ + Nz̆ , (46)
d˜(c) (·, ·) satisfies metric property P1.
Second, d˜(c) (x, y) = 0 if and only if x = y = ∅, or p d(c) ({x(t)} , {y(t)}) , (47)
t∈Dx ∪Dy
d ({x(t)}, {y(t)}) = 0 for all t ∈ Dx ∪ Dy . Since d(c) (·, ·) is
(c)
a metric, d(c) ({x(t)}, {y(t)}) = 0 ⇔ {x(t)} = {y(t)}. Hence p d(c) ({x (k + 1)} , {y (k + 1)}) , (48)
d˜(c) (x, y) = 0 ⇔ x = y, satisfying metric property P2.
Third, since d(c) (·, ·) is a metric, and Dx ∪ Dy = Dy ∪ Dx , all q d(c) ({y(t)} , {z(t)}) , (49)
t∈Dy ∪Dz
terms in (35) are symmetric in their arguments. Hence d˜(c) (·, ·)
satisfies metric property P3. q d(c) ({y (k + 1)} , {z (k + 1)}) (50)
BEARD et al.: SOLUTION FOR LARGE-SCALE MULTI-OBJECT TRACKING 2767
r d(c) ({z(t)} , {x(t)}) , (51) (ii) {z(k + 1)} = ∅, and {x(k + 1)}
= ∅ or {y(k + 1)}
= ∅,
t∈Dz ∪Dx i.e.
r d(c) ({z (k + 1)} , {x (k + 1)}) . r q+c p+c r+c q p+c
(52) + ≥ or + ≥ .
R Q+1 P +1 R+1 Q P +1
The sum p over (non-empty) Dx ∪ Dy is decomposed into: (iii) At least two of {x(k + 1)}, {y(k + 1)} and {z(k + 1)}
r p̂N consisting of N terms on (Dx ∩ Dy ∩ Dz );
r p̂N consisting of Np terms on (Dx ∩ Dy − Dz ); and are non-empty, i.e.
r (Npq + Nr + Nx̆ + Ny̆ )c, since d({x(k)}, {y(k)}) = c on r + r q + q p + p
+ ≥ .
(Dx ∪ Dy ) − (Dx ∩ Dy ). R+1 Q+1 P +1
Similar decompositions also apply to q, and r. Hence, For case (i)
p p̂N + p̂Np + (Nq + Nr + Nx̆ + Ny̆ ) c, r+c q+c p
+ −
R+1 Q+1 P
q q̂N + q̂Nq + (Nr + Np + Ny̆ + Nz̆ ) c, P (Q + 1) r + P (Q + 1) c (R + 1) P q + (R + 1) P c
= +
r r̂N + r̂Nr + (Np + Nq + Nz̆ + Nx̆ ) c. P (Q + 1) (R + 1) P (Q + 1) (R + 1)
(Q + 1) (R + 1) p
The following bounds are required for the proof −
P (Q + 1) (R + 1)
p̂N ≤ q̂N + r̂N , (53) P (Q + 1) r + (R + 1) P c P c + (Q + R + 1) P c
= +
p ≤ p̂N + (P − N ) c ≤ P c, (54) P (Q + 1) (R + 1) P (Q + 1) (R + 1)
(QR + Q + R + 1) p
q ≥ q̂N + (Nr + Np + Ny̆ + Nz̆ ) c q̂N + Q c, (55) −
P (Q + 1) (R + 1)
r ≥ r̂N + (Np + Nq + Nz̆ + Nx̆ ) c r̂N + R c, (56) P Qr + RP q − QRp + P r + P q + P c
=
P (Q + 1) (R + 1)
Note that: the triangle inequality (53) holds because p̂N , q̂N ,
(Q + R + 1) P c − (Q + R + 1) p
r̂N are, respectively, sums of distances between x and y, y and +
z, z and x, over all time indices in Dx ∩ Dy ∩ Dz ; (54) holds P (Q + 1) (R + 1)
because p̂Np ≤ Np c, Np + Nq + Nr + Nx̆ + Ny̆ = P − N , and P (r + q + c) + (Q + R + 1) (P c − p)
≥ ≥ 0,
p̂N ≤ N c; and (55), (56) hold because q̂Nq , r̂Nr ≥ 0. P (Q + 1) (R + 1)
The following identities (follows directly from the definitions)
where we used the triangle inequality (65) and the bound P c ≥ p
P + Q = T + Ny̆ , (57) from (54).
For case (ii )
= R + S + 2Ny̆ , (58)
r q+c p+c
+ −
Q + R = T + Nz̆ , (59) R Q+1 P +1
R + P = T + Nx̆ , (60) (P + 1) (Q + 1) r
=
(P + 1) (Q + 1) R
N + Q + R − Q = Np + Nx̆ + Nz̆ , (61)
R (P + 1) q + R (P + 1) c − (Q + 1) Rp − (Q + 1) Rc
N + Q + R − P = Np + 2Nz̆ , (62) +
(P + 1) (Q + 1) R
are also required for the proof. Note that so far, all of the variables (P Q + P + Q + 1) r
=
we have defined are non-negative. (P + 1) (Q + 1) R
Adopting the above notation, the properties of d(c) (·, ·) and
˜ ·) can be expressed as R (P + 1) q − (Q + 1) Rp + R (P + 1) c − (Q + 1) Rc
the triangle inequality for d(·, +
(P + 1) (Q + 1) R
c ≥ p , q , r , (63) (P Q + P + Q + 1) r + RP q + Rq − QRpc
=
r + q − p ≥ 0, (64) (P + 1) (Q + 1) R
r q p (P − Q) Rc − Rp
+ ≥ or equivalently P Qr + RP q − QRp ≥ 0. +
R Q P (P + 1) (Q + 1) R
(65)
P Qr + RP q − QRp + (P + Q + 1) r + Rq
=
We need to prove that the triangle inequality holds for the (P + 1) (Q + 1) R
following three cases (note that the result holds trivially when
(P − Q) Rc − Rp
{x(k + 1)} = {y(k + 1)} = {z(k + 1)} = ∅): +
(i) {x(k + 1)} = {y(k + 1)} = ∅, and {z(k + 1)}
= ∅, i.e. (P + 1) (Q + 1) R
r+c q+c (P + Q + 1) r + Rq − Rp + (P − Q) Rc
+
p
≥ . ≥
R+1 Q+1 P (P + 1) (Q + 1) R
2768 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 68, 2020
where the last line follows from the triangle inequality (65). Into (67), we substitute the three identities P + Q = T + Ny̆ ,
Using the bounds (54), (55), (56) for p, q, and r, and the identity Q + R = T + Nz̆ and R + P = T + Nx̆ , from (57), (59) and
P + Q = R + S + 2Ny̆ from (58), we have (60) respectively. We also use the upper/lower bounds on p, q,
and r, from (54), (55) and (56) respectively. This yields the
(P + Q + 1) r + Rq − Rp + (P − Q) Rc following expression
≥ (R + S + 2Ny̆ + 1) r̂N + R c + R q̂N + Q c
(P + Q + 1) r + (R + P + 1) q − (Q + R + 1) p
− R [p̂N + (P − N ) c] + (P − Q) Rc
≥ (T + Ny̆ + 1) r̂N + R c + (T + Nx̆ + 1) q̂N + Q c
= + Rr̂N +(S + 2Ny̆ + 1) r̂N +R (R + S + 2Ny̆ + 1) c
+ Rq̂N +Q Rc − (T + Nz̆ + 1) [p̂N + (P − N ) c]
−Rp̂N −(P −N ) Rc+(P − Q) Rc
, -. / , -. / , -. / =+ (T + 1) r̂N + Ny̆ r̂N + (T + 1) R c + Ny̆ R c
≥ 0 +0
+R Rc+Q Rc +(N −Q) Rc + (T +1) q̂N + Nx̆ q̂N + (T + 1) Q c + Nx̆ Q c
−(T +1) p̂N − Nz̆ p̂N −(T + 1) (P −N ) c − Nz̆ (P −N ) c
= R + Q + N − Q Rc , -. / , -. / , -. / , -. /
≥ 0 −Nz̆ N c+ (T +1)(Np +2Nz̆ ) c−Nz̆ P c+Nz̆ N c
= [Np + Nx̆ + Nz̆ ] Rc ≥ 0,
= [(T + 1) (Np + 2Nz̆ ) − Nz̆ P ] c
where we used r̂N + q̂N − p̂N ≥ 0 from (53) and R + Q +
N − Q = Np + Nx̆ + Nz̆ from (61). ≥ [(T + 1) Nz̆ − Nz̆ P ] c
For case (iii) = (S + Nz̆ + 1) Nz̆ c,
r + r q + q p + p
+ − where we used r̂N + q̂N − p̂N ≥ 0 from (53), Ny̆ r̂N ≥ 0,
R+1 Q+1 P +1
Nx̆ q̂N ≥ 0, −p̂N ≥ −N c from (54), Q + R − P + N =
(P + 1) (Q + 1) (r + r ) (R + 1) (P + 1) (q + q ) Np + 2Nz̆ from (62), Ny̆ R c ≥ 0, Nx̆ Q c ≥ 0, Np + 2Nz̆ ≥
= +
(P + 1) (Q + 1) (R + 1) (P + 1) (Q + 1) (R + 1) Nz̆ , and T − P = S + Nz̆ from (43) and (44).
(Q + 1) (R + 1) (p + p ) For (68), note from (57), (59) and (60), that P + Q = T +
− Ny̆ , Q + R = T + Nz̆ and R + P = T + Nx̆ . Additionally,
(P + 1) (Q + 1) (R + 1)
(P Q + P + Q + 1) r (RP + R + P + 1) q P Q = (S + Nx̆ + Ny̆ ) (S + Ny̆ + Nz̆ )
= +
(P + 1) (Q + 1) (R + 1) (P + 1) (Q + 1) (R + 1)
= SS + SNx̆ + 2SNy̆ + SNz̆
(QR + Q + R + 1) p (P Q + P + Q + 1) r
− + + Nx̆ Ny̆ + Ny̆ Nz̆ + Nz̆ Nx̆ + Ny̆ Ny̆ ,
(P + 1) (Q + 1) (R + 1) (P + 1) (Q + 1) (R + 1)
(RP + R + P + 1) q (QR + Q + R + 1) p QR = (S + Ny̆ + Nz̆ ) (S + Nx̆ + Nz̆ )
+ −
(P + 1) (Q + 1) (R + 1) (P + 1) (Q + 1) (R + 1) = SS + SNx̆ + SNy̆ + 2SNz̆
(P Qr + RP q − QRp) + (r + q − p ) + Nx̆ Ny̆ + Ny̆ Nz̆ + Nz̆ Nx̆ + Nz̆ Nz̆ ,
= (66)
(P + 1) (Q + 1) (R + 1)
RP = (S + Nx̆ + Nz̆ ) (S + Nx̆ + Ny̆ )
(P + Q + 1) r + (R + P + 1) q − (Q + R + 1) p
+ (67) = SS + 2SNx̆ + SNy̆ + SNz̆
(P + 1) (Q + 1) (R + 1)
(P Q + P + Q) r +(RP + R + P ) q −(QR + Q + R) p + Nx̆ Ny̆ + Ny̆ Nz̆ + Nz̆ Nx̆ + Nx̆ Nx̆ .
+ .
(P + 1) (Q + 1)(R + 1)
(68) Hence, expanding (68) results in (69) shown at the bottom of
this page, where we used the triangle inequality (64) for p ,
Note that (66) ≥ 0 from the triangle inequalities (65) and (64). q , r .
It remains to be shown that (67) + (68) ≥ 0. Finally, since c ≥ p , it follows that (67) + (68) ≥ 0.
(P Q + P + Q) r + (RP + R + P ) q − (QR + Q + R) p
= + SSr + SNx̆ r + 2SNy̆ r + SNz̆ r + Nx̆ Ny̆ r + Ny̆ Nz̆ r + Nz̆ Nx̆ r + Ny̆ Ny̆ r + T r + Ny̆ r
+ SSq + 2SNx̆ q + SNy̆ q + SNz̆ q + Nx̆ Ny̆ q + Ny̆ Nz̆ q + Nz̆ Nx̆ q + Nx̆ Nx̆ q + T q + Nx̆ q
− SSp − SNx̆ p − SNy̆ p − 2SNz̆ p − Nx̆ Ny̆ p − Ny̆ Nz̆ p − Nz̆ Nx̆ p − Nz̆ Nz̆ p − T p − Nz̆ p
, -. / , -. / , -. / , -. / , -. / , -. / , -. / , -. / , -. / , -. /
≥ 0 +0 +0 − SNz̆ p +0 +0 +0 − Nz̆ Nz̆ p +0 − Nz̆ p
= − (S + Nz̆ + 1) Nz̆ p , (69)
BEARD et al.: SOLUTION FOR LARGE-SCALE MULTI-OBJECT TRACKING 2769
It is possible to extend the base-distance (33) to the case where [25] B. L. Wang, W. Yi, R. Hoseinnezhad, S. Q. Li, L. J. Kong, and X. B.
the terms of the sum in (33) are raised to the power of q, and Yang, “Distributed fusion with multi-Bernoulli filter based on generalized
covariance intersection,” IEEE Trans. Signal Process., vol. 65, no. 1, pp.
take the qth root of the resulting sum. 242–255, Jan. 2017.
[26] T. C. Li, J. M. Corchado, and S. D. Sun, “On generalized covariance inter-
section for distributed PHD filtering and a simple but better alternative,”
REFERENCES in Proc. IEEE Int. Fusion Conf., 2017, pp. 1–8.
[27] F. Meyer et al., “Message passing algorithms for scalable multitarget
[1] S. Blackman, R. Popoli, Design and Analysis of Modern Tracking Systems,
tracking,” in Proc. IEEE, vol. 106, no. 2, pp. 221–259, Feb. 2018.
Norwood, MA, USA: Artech House, 1999.
[28] C. Fantacci, B.-N. Vo, B.-T. Vo, G. Battistelli, and L. Chisci, “Robust
[2] Y. B.-Shalom, X. Rong Li, Multitarget-Multisensor Tracking: Principles
fusion for multisensor multiobject tracking,” IEEE Signal Process. Lett.,
and Techniques, Bradford, U.K.: YBS Publishing, 1995.
vol. 25, no. 5, pp. 640–644, 2018.
[3] R. P. S. Mahler, Statistical Multisource-Multitarget Information Fusion,
[29] S. Reuter, B.-T. Vo, B.-N. Vo, and K. Dietmayer, “The labeled multi-
Norwood, MA, USA: Artech House, 2007.
Bernoulli filter,” IEEE Trans. Signal Process., vol. 62, no. 12, pp. 3246–
[4] B. A. Jones, D. S. Bryant, B.-T. Vo, and B.-N. Vo, “Challenges of multi-
3260, Jun. 2014.
target tracking for space situational awareness,” in Proc. 18th Int. Conf.
[30] C. Fantacci, and F. Papi, “Scalable multisensor multitarget tracking using
Inf. Fusion, Washington D.C., Jul. 2015, pp. 1278–1285.
the marginalized-GLMB density,” IEEE Signal Process. Lett., vol. 23, no.
[5] H. Klinkrad, “Space debris: Models and risk analysis,” Berlin, Germany:
6, pp. 863–867, Jun. 2016.
Springer, 2006.
[31] X. Wang, A. K. Gostar, T. Rathnayake, B. Xu, and A. B.-Hadiashar,
[6] Committee for the Assessment of the U.S. Air Force’ s Astrodynamic
“Centralized multiple-view sensor fusion using labeled multi-Bernoulli
Standards; Aeronautics and Space Engineering Board; Division on En-
filters,” Signal Process., vol. 150, pp. 75–84, 2018.
gineering and Physical Sciences; National Research Council, Continuing
[32] S. Li, W. Yi, R. Hoseinnezhad, G. Battistelli, B. Wang, and L. Kong,
Kepler’ s Quest: Assessing Air Force Space Command’ s Astrodynamics
“Robust distributed fusion with labeled random finite sets,” IEEE Trans.
Standards, Washington, DC, USA: The National Academies Press, 2012.
Signal Process., vol. 66, no. 2, pp. 278–293, Jan. 2018.
[7] V. Reilly, H. Idrees, and M. Shah, “Detection and tracking of large number
[33] M. Üney, J. Houssineau, E. Delande, S. J. Julier, and D. E. Clark, “Fusion
of targets in wide area surveillance,” in Proc. Eur. Conf. Computer Vision,
of finite-set distributions: Pointwise consistency and global cardinality,”
Springer, Berlin, Heidelberg, Sep. 2010, pp. 186–199.
IEEE Trans. Aerosp. Electron. Syst., vol. 55, no. 6, pp. 2759–2773,
[8] T. Pollard and M. Antone, “Detecting and tracking all moving objects in
Dec. 2019.
wide-area aerial video,” in Proc. IEEE Comput. Soc. Conf. Comput. Vision
[34] S. Li, G. Battistelli, L. Chisci, W. Yi, B. Wang, and L. Kong, “Computation-
Pattern Recognit. Workshops, Jun. 2012, pp. 15–22.
ally efficient multi-agent multi-object tracking with labeled random finite
[9] J. Prokaj, X. Zhao, and G. Medioni, “Tracking many vehicles in wide area
sets,” IEEE Trans. Signal Process., vol. 67, no. 1, pp. 260–275, Jan. 2019.
aerial surveillance,” in Proc. IEEE Comput. Soc. Conf. Comput. Vision
[35] B.-N. Vo, S. Singh, and A. Doucet, “Sequential Monte Carlo methods for
Pattern Recogn. Workshops, Jun. 2012, pp. 37–43.
multitarget filtering with random finite sets,” IEEE Trans. Aerosp. Electron.
[10] N. Chenouard, I. Bloch, and J.-C. Olivo-Marin, “Multiple hypothesis
Syst., vol. 41, no. 4, pp. 1224-1245, Oct. 2005.
tracking for cluttered biological image sequences,” IEEE Trans. Pattern
[36] B.-T. Vo and B.-N. Vo, “Labeled random finite sets and multi-object conju-
Anal. Mach. Intell., vol. 35, no. 11, pp. 2736–3750, Nov. 2013.
gate priors,” IEEE Trans. Signal Process., vol. 61, no. 13, pp. 3460-3475,
[11] E. Meijering, O. Dzyubachyk and I. Smal, “Methods for cell and particle
Jul. 2013.
tracking,” in Methods in Enzymology, vol. 504, New York, NY, USA:
[37] B.-N. Vo, B.-T. Vo, and D. Phung, “Labeled random finite sets and the
Academic, 2012, pp. 183–200.
Bayes multi-target tracking filter,” IEEE Trans. Signal Process., vol. 62,
[12] J. K. Uhlmann, “Algorithms for multiple-target tracking,” Amer. Scientist,
no. 24, pp. 6554–6567, Dec. 2014.
vol. 80, no. 2, pp. 128–141, 1992.
[38] B.-N. Vo, B.-T. Vo, and H. G. Hoang, “An efficient implementation of the
[13] J. B. Collins and J. K. Uhlmann, “Efficient gating in data association with
generalized labeled multi-Bernoulli filter,” IEEE Trans. Signal Process.,
multivariate Gaussian distributed states,” IEEE Trans. Aerosp. Electron.
vol. 65, no. 8, pp. 1975–1987, Apr. 2017.
Syst, vol. 28, no. 3, pp. 909–916, Jul. 1992.
[39] B. Ristic, B.-N. Vo, D. Clark, and B.-T. Vo, “A metric for performance
[14] D. Musicki, B. F. La Scala, and R. J. Evans, “Multi-target tracking in
evaluation of multi-target tracking algorithms,” IEEE Trans. Signal Pro-
clutter without measurement assignment.” in Proc. 43 rd IEEE Conf. Decis.
cess., vol. 59, no. 7, pp. 3452–3457, Jul. 2011.
Control, 2004, pp. 716–721.
[40] T. Vu and R. Evans, “A new performance metric for multiple target tracking
[15] H. Wang, T. Kirubarajan, and Y. Bar-Shalom, “Precision large scale
based on optimal subpattern assignment,” in Proc. 17th Int. Conf. Inf.
air traffic surveillance using IMM/assignment estimators,” IEEE Trans.
Fusion, Salamanca, Spain, Jul. 2014.
Aerosp. Electron. Syst,, vol. 35, no. 1, pp. 255–266, Jan. 1999.
[41] D. Schuhmacher, B.-T. Vo, and B.-N. Vo, “A consistent metric for per-
[16] M. Betke, D. E. Hirsh, A. Bagchi, N. I. Hristov, N. C. Makris, and T. H.
formance evaluation of multi-object filters,” IEEE Trans. Signal Process.,
Kunz, “Tracking large variable numbers of objects in clutter,” in Proc.
vol. 56, no. 8, pp. 3447–3457, Aug. 2008.
IEEE Conf. Computer Vision Pattern Recogn., 2007.
[42] M. Beard, B.-T. Vo, and B.-N. Vo, “OSPA(2) : Using the OSPA metric to
[17] B.-N. Vo, B.-T. Vo, S. Reuter, Q. Lam, and K Dietmayer, “Towards large
evaluate multi-target tracking performance,” in Proc. Int. Conf. Control,
scale multi-target tracking,” in Proc. Int. Soc. Opt. Photon.: Sensors Syst.
Automat. Inf. Sci., Chang Mai, Thailand, Oct. 2017.
Space Appl. VII, 2014, vol. 9085, Art. no. 90850W.
[43] M. Beard, B.-T. Vo, and B.-N. Vo, “Performance evaluation for large-
[18] J. Olofsson, C. Veibäck, and G. Hendeby, “Sea ice tracking with a spatially
scale multi-target tracking algorithms,” in Proc. 21st Int. Conf. Inf. Fusion,
indexed labeled multi-Bernoulli filter,” in Proc. 20th Int. Conf. Inf. Fusion,
Cambridge, UK, Jul. 2018.
Xi’an, China, Jul. 2017.
[44] A. Guttman, “R-trees: A dynamic index structure for spatial searching,”
[19] D. E. Clark, S. J. Julier, R. Mahler, and B. Ristic, “Robust multi-object
in Proc. ACM SIGMOD Int. Conf. Manage. Data, Jun. 1984, vol. 14, pp.
sensor fusion with unknown correlations,” in Proc. Sens. Signal Process.
47–57.
Def., Sep. 2010, pp. 1–5.
[45] Y. Manolopoulos, A. Nanopoulos, and Y. Theodoridis, R-Trees: Theory
[20] M. Uney, D. E. Clark, and S. J. Julier, “Distributed fusion of PHD filters
and Applications, Berlin, Germany: Springer, 2010.
via exponential mixture densities,” IEEE J. Sel. Topics Signal Process.,
[46] A. Zomorodian and H. Edelsbrunner, “Fast software for box intersections,”
vol. 7, no. 3, pp. 521–531, Jun. 2013.
in Proc. 16th Annual Symp. Comput. Geometry, ACM, 2000, pp. 129–138.
[21] G. Battistelli, L. Chisci, C. Fantacci, A. Farina, and A. Graziano, “Consen-
[47] J. Bentley, “Multidimensional binary search trees used for associative
sus CPHD filter for distributed multitarget tracking,” IEEE J. Sel. Topics
searching,” Commun. ACM, vol. 18, no. 9, pp. 509–517, Sep. 1975.
Signal Process., vol. 7, no. 3, pp. 508–520, Jun. 2013.
[48] J. Bento, “A metric for sets of trajectories that is practical and mathemat-
[22] G. Battistelli, L. Chisci, C. Fantacci, A. Farina, and R. Mahler, “Distributed
ically consistent,” 2016, arXiv:1601.03094.
fusion of multitarget densities and consensus PHD/CPHD filters,” in Proc.
[49] A. S. Rahmathullah, Á. F. García-Fernández, and L. Svensson, “A metric
SPIE Def., Security Sens., Baltimore, MD, USA, vol. 9474, 2015.
on the space of finite sets of trajectories for evaluation of multi-target
[23] M. B. Guldogan, “Consensus Bernoulli filter for distributed detection and
tracking algorithms,” 2016, arXiv:1605.01177.
tracking using multi-static doppler shifts,” IEEE Signal Process. Lett., vol.
[50] J. Cardoso, “Dependence, correlation and Gaussianity in independent
24, no. 6, pp. 672–676, Jun. 2014.
component analysis,” J. Mach. Learn. Res., vol. 4, pp. 1177–1203, 2003.
[24] W. Yi, M. Jiang, R. Hoseinnezhad, and B. Wang, “Distributed multisensor
fusion using generalised multi-Bernoulli densities,” IET Radar, Sonar
Navig., vol. 11, no. 3, pp. 434–443, Mar. 2016.