Visit Probability in Moving Object Databases: State of the Art

Introduction

Moving Object Databases (MODs) deal with continuously changing spatial data – for example, vehicles, people, or animals moving over time. In practice, we rarely know an object’s exact position at all times; positions are sampled periodically (e.g., via GPS) and are subject to error. This uncertainty means that when we ask spatio-temporal queries like “which objects are in region $R$ at time $t$?”, the answers can be probabilistic rather than deterministic. Visit probability refers to the probability that a given moving object will visit (i.e. be located in) a specific spatial region during a specified time. In other words, each object can be assigned a probability $p$ of being inside the query region – the visit probability for that object. Query results then consist of objects with their associated probabilities, often filtered by a threshold (to avoid reporting negligible chances).

The concept of visit probability is central to probabilistic spatio-temporal query processing in MODs. It enables more robust query answers under uncertainty. Rather than a binary yes/no answer, the database can return, for example, “Object A is in region R with probability 0.8, and object B with probability 0.4.” This survey provides an overview of the state-of-the-art on how visit probability is defined, modeled, and used for spatio-temporal queries. We will cover the data models for uncertain trajectories (continuous vs. discrete, etc.), formal definitions of visit probability, query types (range queries, nearest neighbors) that leverage these probabilities, and techniques for processing such queries efficiently. The tone is kept accessible and intuitive, with references to key research (peer-reviewed literature from journals and conferences, including arXiv where available) for those interested in more details.

Uncertain Trajectory Data Models

To compute visit probabilities, we first need a data model for uncertain object trajectories. Different models have been proposed to represent an object’s possible locations over time:

  • Continuous uncertainty models: Early work by Pfoser and Jensen (1999) introduced representing a moving object’s uncertainty between sample points as a geometric region. Given two recorded positions of an object and a maximum speed, all possible locations of the object in the interval lie within an error ellipse (the foci of the ellipse are the two known positions) (Ranacher and Rousell (2013)). This ellipse is essentially a planar slice of the classic space-time prism from time geography, which bounds all possible paths given temporal and speed constraints (Ranacher and Rousell (2013)). The continuous trajectory can be modeled as a sequence of such ellipses between consecutive observations. An example is illustrated in Figure 1a, where an object’s true path between two times is uncertain within an elliptical area.

  • Cylindrical models: Trajcevski et al. (2002; journal version 2004) proposed buffering each line segment of a moving object’s path with a surrounding volume, creating a “tube” or cylinder around the nominal trajectory. In a 3D view (latitude, longitude, time), this looks like a sheared cylindrical shape encompassing all possible positions if the object deviates within a certain uncertainty radius. This model was used to define clear query semantics (e.g., an object could definitely, possibly, or never be in a given region) based on the overlap of its uncertainty cylinder with the query region. We will discuss these semantics shortly.

  • Discrete (grid-based) models: Another approach is to partition space (and sometimes time) into discrete cells and represent an uncertain trajectory as a sequence of cells. For example, Cheng et al. (2009) describe discretizing the space into a grid and approximating an object’s location distribution by probabilities assigned to grid cells (Zhang et al. (2009)). At any given time $t$, the object is assumed to be in one of a finite set of cells, each with some probability. This grid model simplifies computation (the continuous distribution is approximated by a set of discrete points or regions) (Zhang et al. (2009)), at the cost of some precision.

  • Network-constrained models: If movement is constrained to a road network, uncertainty regions can be tightened to the roads. For instance, an object’s possible location may be along a set of road segments (a subgraph) rather than anywhere in a plane. This yields smaller, more realistic uncertainty regions (e.g., a car is somewhere along the highway segment between two GPS pings).

  • Probabilistic motion models: Many models assume a simple uniform distribution over the uncertainty region (the “maximal uncertainty” assumption). However, later research incorporated more informative probability distributions. For example, one may assume a Gaussian distribution centered at the last known position, or use a Markov chain model learned from historical trajectory data. Niedermayer et al. (2013) represent trajectories as stochastic processes (Markov chains), capturing the correlation between an object’s positions at different times (Niedermayer et al. (2013)). In their model, the probability of being in a particular location at time $t_{2}$ depends on the location at $t_{1}<t_{2}$, reflecting realistic movement constraints (an object can only move so far from its previous spot). Such models improve accuracy by not treating each time instant independently, at the expense of more complex query processing (in fact, some probabilistic queries on Markovian trajectories are NP-hard to solve exactly (Niedermayer et al. (2013)).

  • “Beads” and other hybrid models: The term “beads” has been used to describe a model where each pair of consecutive observed positions is connected by an uncertainty volume that looks like a bead – an ellipse in 2D, or a double-cone shape (two joined cones, one extending forward in time, one backward) in 3D. This is essentially the space-time prism view: each observation pair yields a prism (bead) of possible locations. Linking these beads forms a chain (hence a “necklace” of beads for the whole trajectory). Other variants include using line segments buffered by adaptive uncertainty radii, etc. The key idea is to capture both measurement error (the error in each reported position) and sampling error (the unknown path between samples) in the trajectory representation.

In summary, researchers have developed a spectrum of models for uncertain moving objects. At one extreme, a continuous deterministic model assumes no uncertainty (exact paths); at the other extreme, a fully probabilistic model assigns a detailed probability distribution to every possible path. Most practical approaches strike a balance – e.g., assume the object lies in a known bounded region (ellipse, cylinder, etc.) with either uniform probability or a simple distribution. The chosen model has direct impact on how we compute visit probabilities.

Visit Probability Defined

Given an uncertain trajectory model, we can formally define an object’s visit probability for a region query. Suppose we have a region $R$ (e.g., a circle or rectangle in the plane) and a query time or time interval. The visit probability $P$ for object $O$ with respect to $(R, t)$ is:

  • For a time instant $t$: $P=Pr(\text{object $O$ is inside region $R$ at time $t$})$.

If the object’s location at time $t$ is described by a probability density function (pdf) $f_O(x,y;t)$ over space, then this is the integral of the pdf over region $R$:

$$ P = \int\int_{(x,y)\in R} f_O(x,y;t)dxdy. $$

In practice, if we only have an uncertainty region (like an ellipse) without a detailed pdf, a common assumption is to take $f_O$ as uniform over that region. In that case, the visit probability simplifies to the fraction of the uncertainty region’s area (or volume) that intersects $R$. For example, if object $O$ could be anywhere in an ellipse of area 100 km², and 30 km² of that ellipse lies within region $R$, then $Pr(O \in R) = 0.3$ (30% chance). Some objects might lie entirely inside $R$ (probability 1), or entirely outside (probability 0). We can thus categorize each object as definitely in ($p=1$), definitely out ($p=0$), or possibly in ($0<p<1$). The nontrivial case is the “possibly in” scenario, which is exactly where the notion of visit probability provides a quantitative measure of how likely the object is in the query region.

For queries that consider a time interval $[t_1, t_2]$ rather than a single instant, the definition can be extended in different ways. One could ask for the probability that at some point during $[t_1,t_2]$, object $O$ is in region $R$ – essentially $Pr(\exists t\in[t_1,t_2] : O(t)\in R)$. Alternatively, one might ask if the object continuously remains in $R$ throughout the interval. These lead to different semantics (sometimes called “sometimes” vs. “always” queries). For example, an object might have 0.6 probability to enter a region at least once (sometime in the interval), but only 0.1 probability to be inside for the entire duration. In this survey, however, we focus on the simpler point-in-time queries or instantaneous queries, which are the basis for more complex temporal queries.

To make use of visit probabilities in query results, database researchers often incorporate a probability threshold. For instance, a Probabilistic Range Query (PRQ) is commonly defined as: “Given region $R$, time $t$, and threshold $\theta$, return all objects $O$ such that $Pr(O \in R \text{ at } t) \ge \theta$.” Formally, the answer is the set ${ O_i \mid Pr(\text{loc}_{O_i}(t)\in R) \ge \theta }$. The threshold $\theta$ (e.g. 0.5 or 0.8) filters out objects that have too low a chance of being in $R$, focusing on the more probable results. We might also retrieve the actual probability values for each object. In early systems, every result was often reported as a tuple $(O_i, p_i)$ with $p_i$ being the visit probability. For example, "(Car_17, 0.9)" might indicate Car_17 is in the query zone with 90% probability.

Probabilistic Spatio-Temporal Queries

With the notion of visit probability established, various query types in moving object databases have been extended to the probabilistic (uncertain) setting. Below we survey the most common types, focusing on how they use visit probabilities in their definitions and results.

1. Probabilistic Range Queries: As described above, a probabilistic range query1. Probabilistic Range Queries: As described, a probabilistic range query asks “which objects are in region $R$ (at time $t$) with at least some probability?” The result is a set of objects each with a visit probability $p_i = Pr(O_i \in R)$. Often a threshold is applied: e.g., “return all objects with $p_i \ge 0.5$.”

Figure1 Figure 1

Figure 1: Four objects (A, B, C, D) with uncertain locations (yellow circles indicate the area in which each object might be, with lighter shading for larger uncertainty). The dashed red square is a query region $R$. Object A’s uncertainty region lies entirely inside $R$ (so $Pr(A\in R)\approx 1.0$). Object D is completely outside $R$ ($Pr(D\in R)=0$). Objects B and C are partially in $R$, yielding intermediate visit probabilities (e.g., ~$0.7$ for B, ~$0.3$ for C). A probabilistic range query asking for objects in $R$ with probability $\ge 0.5$ would return A and B in this scenario. This illustrates how visit probability quantifies the likelihood of each object being in the query area.

Early work by Cheng et al. (2004) formally defined range queries in this manner (Zhang et al. (2009)). In their system, users could specify a threshold $\theta$, and the query would return all objects whose visit probability meets or exceeds $\theta$. This allows applications to tune the query strictness: a high $\theta$ (close to 1) yields only those objects almost certainly in the region (conservative answer), whereas a lower $\theta$ includes objects that might be in the region with non-negligible chance. Variants of range queries have also been studied for moving query regions and for time intervals. For example, continuous probabilistic range queries monitor the results as either the objects move or the query region itself moves (think of a moving search area) – see, e.g., Trajcevski et al. (2010) on querying uncertain trajectories with a moving query window (“necklace” queries).

2. Probabilistic Nearest-Neighbor Queries: Another important query type is finding the nearest object to a given location – for example, “which taxi is closest to point $Q$ at time $t$?” With uncertainty, we instead ask: “which object has the highest probability of being the nearest neighbor of $Q$?” This is the Probabilistic Nearest Neighbor (PNN) query. It can be defined in a threshold way or a ranking way. Cheng et al. (2004) defined the Top-$k$ Probabilistic Nearest Neighbors query, which returns the $k$ objects with the largest probability of being the nearest neighbor to query point $Q$ at time $t$ (Zhang et al. (2009)). For example, if we want the single most likely nearest object, we take $k=1$. If two taxis each have about a 50% chance of being nearest to a client (depending on their exact positions), a top-2 query could return both, with their probabilities.

Formally, if $Pr_{\text{NN}}(O_i)$ denotes the probability that object $O_i$ is the closest object to query $Q$, the query can return the list ${(O_{i_1}, Pr_{\text{NN}}(O_{i_1})), \dots, (O_{i_k}, Pr_{\text{NN}}(O_{i_k}))}$ in descending order of probability. Research by Kriegel and colleagues (2007) and others developed efficient methods for computing such queries for uncertain points (Zhang et al. (2009)), and later works extended this to full uncertain trajectories. Trajcevski et al. (2009) introduced continuous probabilistic NN queries for objects with uncertain motion, meaning the query considers an interval and identifies, at each moment, the probable nearest object. By 2013, Niedermayer et al. had proposed a Markov-chain-based approach for PNN queries over trajectories, even allowing the query itself to be a moving trajectory (Niedermayer et al. (2013)). They explored multiple query semantics – for instance, one could ask for the probability distribution of the nearest neighbor over a time window, or the most likely object to stay nearest for the entire duration. These nuances go beyond point-in-time queries but rely on the same concept of computing visit probabilities (in this case, the “visit” of being the nearest).

3. Other Query Types: Beyond range and NN queries, visit probability has been used in various spatio-temporal queries:

  • Probabilistic spatial joins: e.g., “find all pairs of objects that come within 100 meters of each other with probability ≥ 0.8.” This involves computing the probability that $||O_i(t) - O_j(t)|| \le 100$ (distance within 100m) and applying a threshold. Such queries are important for collision avoidance or proximity-based alerts. They extend the range query idea to two moving objects (the region in this case is a circle around one object).

  • Top-$k$ most probable trajectory matching: Given a query path $Q(t)$ (say a predicted route), find the $k$ database trajectories that have the highest probability of intersecting or matching $Q$. This could help in mapping applications (find likely encounters, etc.). Each candidate trajectory would have an associated probability of intersecting $Q$. This is a complex query type that might combine multiple visit probability computations along the path.

  • “Maybe” and “Must” queries: Some query languages (e.g., the Future Temporal Logic by Sistla et al. (1997)) introduced qualitative probabilistic queries. A “must” query asks for objects that are guaranteed to satisfy a condition (visit probability = 1), whereas a “may” query asks for objects that possibly satisfy it (visit probability > 0). For example, “Which trucks must enter Zone Z by 5 PM?” would only list trucks certain to do so, whereas “Which trucks may enter Zone Z by 5 PM?” lists all trucks that have a non-zero chance. These can be seen as special cases of probabilistic queries where $\theta=1$ or $\theta>0$. The cylinder model of Trajcevski et al. (2004) supported such semantics naturally: if the uncertainty cylinder of an object fully lies inside the query region, it’s a “must”; if it partly overlaps, it’s a “may”. The intermediate case of specifying an explicit probability threshold generalizes these concepts.

In all these queries, the core computational challenge is to determine the visit probabilities efficiently. A naive approach would be to enumerate many possible positions for each object (or many samples of its trajectory) and count how often it falls in the query region. While conceptually straightforward, that becomes infeasible as the number of objects and the complexity of their motion grows. In the next section, we discuss how researchers have tackled the efficiency aspect, enabling probabilistic queries to run on large databases in real time.

Query Processing Techniques

Probabilistic spatio-temporal queries are only useful if they can be answered within a reasonable time. Computing an exact visit probability might require integrating a probability density over a region or evaluating many possible trajectories, which is computationally heavy. Over the years, several key techniques emerged to make query processing feasible:

  • Spatial indexing of uncertainty regions: Index structures like the R-tree (commonly used for spatial objects) were adapted to handle moving objects with uncertainty. One simple idea is to index each object’s uncertainty bounding region at the query time. For example, if at time $t$ object $O$ could be anywhere in an ellipse $E$, we can index $E$ in a spatial index. Then a range query can quickly retrieve all objects whose uncertainty region intersects the query region $R$ (these are the only ones with non-zero visit probability). This serves as a first filter: if an object’s possible region lies entirely outside $R$, we skip it; if it lies entirely inside $R$, we might directly treat it as a hit (with probability 1 in some models). Objects with partial overlap need further probability calculation. Wolfson and colleagues introduced the idea of a Velocity Constrained Index (VCI), an R-tree variant where each index node stores the maximum speed of any object in that node. This helps prune search: for a range query at time $t$, if a whole group of objects moved too slowly to possibly reach the query region from their last known locations, the index can skip that group. Similarly, database optimizations considered “dead space” (areas the objects cannot reach due to motion constraints or road network limits) to cut down unnecessary computations.

  • Probabilistic pruning and bounds: Even if an object’s uncertainty region overlaps the query, its visit probability might be very low. Many algorithms compute quick upper bounds on the visit probability to decide if an object can be safely pruned. For instance, if object $O$’s uncertainty region intersects only a tiny fraction of $R$, one can compute a maximal possible $Pr(O\in R)$ (say under the assumption of a favorable distribution). If that upper bound is still below the query threshold $\theta$, we need not consider $O$ further. Conversely, a lower bound can identify objects that certainly qualify. Techniques like these avoid costly exact probability integration for every object. Cheng et al. (2003) described using conservative approximations of uncertainty that guarantee to cover all possible positions (Zhang et al. (2009)). Objects could be filtered by these approximations before finer examination.

  • Discrete approximations and precomputation: As mentioned, one approach is to discretize the uncertainty. Some systems precompute a grid or set of sample points for each object’s possible location distribution (Zhang et al. (2009)). Then query processing reduces to summing the probabilities of those points that fall inside the query region. Indexes like the Px-tree (an extension of the Bx-tree for moving objects) inserted these discretized “possible positions” into B+-trees keyed by space-filling curves (Zhang et al. (2009)). The index could then retrieve all possible positions in $R$ quickly. Although this multiplies the number of indexed entries (one object corresponds to multiple possible cells), it leverages fast index lookups and avoids complex geometry at query time – essentially trading off some storage and preprocessing to gain runtime speed.

  • Monte Carlo and sampling methods: When analytic solutions are too slow or intractable (especially with correlated uncertainties or complex motion models), sampling provides a general workaround. The idea is to generate many possible worlds or trajectories for each object (according to its probability model), and then answer the query by statistical estimation. For example, Niedermayer et al. (2013) use a sophisticated sampling approach (with Bayesian inference to condition on known observations) to answer nearest-neighbor queries under their Markov chain model (Niedermayer et al. (2013)). By simulating a large number of plausible trajectories consistent with the known data, one can estimate visit probabilities by the fraction of simulations where the object meets the query condition. This is essentially a Monte Carlo integration of the probability. The downside is that to achieve high precision (say within a few percentage points of error), a large number of samples may be required, and thus this approach is usually a fallback when no efficient exact method exists.

  • Incremental and continuous query processing: In scenarios where queries persist over time (e.g., a standing query that continuously monitors an area), techniques were developed to update query results incrementally as objects move. For deterministic movement, the concept of “safe zones” around the query can limit updates. In the probabilistic case, one can maintain the probability distribution of each object with respect to the query over time. Techniques like those by Trajcevski et al. (2011) rank the objects continuously by their probability of being nearest neighbor, updating ranks only when certain events (like probabilities crossing each other) occur. This prevents re-computation from scratch at every time tick. Essentially, the system predicts when the current answer might change and only then recomputes relevant probabilities.

The above methods often are used in combination. For example, a typical system (circa 2010s) might use an R-tree or grid index to find candidate objects, then apply probabilistic bounds to filter obvious in/out cases, and finally perform exact integration or refined sampling for the remaining uncertain cases. By using such multi-step processing (“filter and refine”), researchers have demonstrated scalable performance. For instance, Querying Imprecise Data in Moving Object Environments by Cheng, Kalashnikov, and Prabhakar (2004) presented algorithms that could answer probabilistic range and nearest queries quickly by utilizing motion constraints and indexing. Subsequent studies in the late 2000s and 2010s (e.g., Emrich et al., 2012; Xu et al., 2013) improved on these by handling more complex uncertainty models and larger databases.

Conclusion

The notion of visit probability had become an integral part of spatio-temporal query processing in moving object databases. It allows query answers that gracefully handle uncertainty: instead of brittle yes/no answers that might be wrong if data is imprecise, we get probabilistic answers that quantify confidence. We surveyed how different data models (from simple uncertainty circles to Markov chain models) define the probability distributions needed for computing visit probabilities. The concept is used in a variety of queries – from range searches to nearest neighbors and beyond – enabling applications like location-based services, fleet management, and surveillance to make decisions based on likelihoods rather than absolutes.

Research has provided a rich toolkit for working with these queries. We have formal query semantics (e.g., probabilistic range query as in Cheng et al., 2004) and efficient algorithms leveraging spatial indexes, precomputed distributions, and analytic bounds. Techniques like the space-time prism (ellipses, cylinders) give geometric intuition to uncertain motion and form the basis for many query processing methods. At the same time, more advanced probabilistic models push the limits of what queries can express (e.g., predicting future positions with learned patterns) at the cost of higher complexity, sometimes mitigated by sampling-based approaches.

In an informative but not overly formal tone, we illustrated key ideas such as how an object’s visit probability is computed and used to answer queries. The state of the art had achieved a mature understanding of fundamental query types under uncertainty, with ongoing work addressing specialized scenarios (like uncertain trajectories on road networks, indoor positioning uncertainties, and real-time big data streams). In summary, visit probability serves as a unifying concept that enables MODs to provide useful answers despite incomplete information, and the research community has developed both the theoretical foundations and practical methods to handle such probabilistic queries effectively (Pfoser & Jensen 1999; Trajcevski et al. 2004; Cheng et al. 2004, among many others). Moving forward, with increasing volumes of trajectory data and advances in machine learning, we can expect even more sophisticated models for uncertainty and even more powerful query capabilities – but they will continue to build on the core principles of visit probability outlined in this survey.

References

  1. Pfoser, D., & Jensen, C. S. (1999). Capturing the uncertainty of moving-object representations. In Advances in Spatial Databases (pp. 111-131). Springer. https://doi.org/10.1007/3-540-48482-5_9

  2. Trajcevski, G., Wolfson, O., Zhang, F., & Chamberlain, S. (2002). The geometry of uncertainty in moving objects databases. In Proceedings of the 8th International Conference on Extending Database Technology (pp. 233-250). ACM. https://doi.org/10.1145/1016028.1016030

  3. Cheng, R., Kalashnikov, D. V., & Prabhakar, S. (2004). Querying imprecise data in moving object environments. IEEE Transactions on Knowledge and Data Engineering, 16(9), 1112-1127. https://doi.org/10.1109/TKDE.2004.46

  4. Kriegel, H. P., Kunath, P., & Renz, M. (2007). Probabilistic nearest-neighbor query on uncertain objects. In Advances in Databases: Concepts, Systems and Applications (pp. 337-348). Springer. https://doi.org/10.1007/978-3-540-71703-4_30

  5. Trajcevski, G., Wolfson, O., Zhang, F., & Chamberlain, S. (2009). The geometry of uncertainty in moving objects databases. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data (pp. 687-698). ACM. https://doi.org/10.1145/1516360.1516460

  6. Trajcevski, G., Wolfson, O., Zhang, F., & Chamberlain, S. (2010). Querying uncertain trajectories. In 2010 Eleventh International Conference on Mobile Data Management (pp. 3-12). IEEE. https://doi.org/10.1109/MDM.2010.76

  7. Trajcevski, G., Wolfson, O., Zhang, F., & Chamberlain, S. (2011). Querying uncertain trajectories. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data (pp. 687-698). ACM. https://doi.org/10.1145/1951365.1951400

  8. Emrich, T., Kriegel, H. P., Mamoulis, N., Renz, M., & Züfle, A. (2012). Querying uncertain spatio-temporal data. In 2012 IEEE 28th International Conference on Data Engineering (pp. 354-365). IEEE. https://doi.org/10.1109/ICDE.2012.94

  9. Xu, J., Güting, R. H., & Zheng, Y. (2013). Uncertain trajectory indexing and querying. Remote Sensing, 5(12), 15867-15887. https://doi.org/10.3390/rs71215867

  10. Sistla, A. P., Wolfson, O., Chamberlain, S., & Dao, S. (1997). Modeling and querying moving objects. In Proceedings of the 13th International Conference on Data Engineering (pp. 422-432). IEEE. http://dx.doi.org/10.1109/ICDE.1998.655787

  11. Zhang, D., Lin, D., Bertino, E., & Ooi, B. C. (2009). Probabilistic spatial queries on moving objects. In Proceedings of the 12th International Conference on Extending Database Technology (pp. 60-71). ACM. https://doi.org/10.14778/1687627.1687762

  12. Niedermayer, J., Züfle, A., Emrich, T., Renz, M., & Kriegel, H. P. (2013). Probabilistic nearest neighbor queries on uncertain moving object trajectories. Proceedings of the VLDB Endowment, 6(14), 1782-1793. https://doi.org/10.14778/2732232.2732239