0

I need to have in-memory two-dimensional index over data. Usage scenario:

  1. Rare bulk writes - new elements will be added in large chunks, frequency of additions is very low comparing to reads.

  2. Frequent reads. Range query (a < x < b AND n < y < m) should be fast. I am not giving any metrics for what "fast" is, because it is evidently depends on many things that are out of the scope of this question.

  3. Data is all in-memory

I have tested a couple of options:

  1. Quadtree. Unfortunately, range query is not performant enough, especially in cases when it intersects multiple high-level quads.
  2. R-Tree. Though queries works faster than quadtree, it seems to me is too complex. Also, what I got from papers is that R-tree is oriented to work with paged data.

What are other options to be considered and which of them can give the highest range query performance?

2
  • The question is too theoretical. Even quadtree should be fast enough in most cases. Maybe you implemented it wrong? What language do you use? Maybe there is too much disk swapping going on. How large is your dataset and how big is your RAM? Commented Dec 1, 2019 at 1:25
  • @Dialecticus, I am using Java. No disk swapping happening, dataset is about 10G and fits in heap (it is about 32G). Also, no GC happening during test, so it can not affect query time Commented Dec 1, 2019 at 14:12

1 Answer 1

0

As said in the comments above, Quadtrees should by quite fast, but they are a bit difficult to get right with respect to numerical precision when dividing by 2 many times. R-Trees are slower to builds and more complex. They are indeed good for disk storage, but in memory you can tune the node size freely until it gives the best performance. If bulk-loading is ok, have a look at STR-R-Trees (sort-tile-recursive), which are slow to build but give best performance afterwards. Otherwise R*Trees (RStarTrees) are best in my experience.

If you are using Java, maybe you find my TinSpin index collection, you can try out multiple indexes including a PH-Tree (a bit-wise quadtree), which is very fast to build but also very good with (small) window queries. For large window queries (1000 results per window or more) Rtrees are probably a bit better.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.