开发者

compact storage for non static binary trees

开发者 https://www.devze.com 2023-04-10 04:52 出处:网络
I have seen array based implementations of static binary trees which do not waste memory for pointers and instead do operations on the current index to go to its parent or children. Is there any artic

I have seen array based implementations of static binary trees which do not waste memory for pointers and instead do operations on the current index to go to its parent or children. Is there any articles that talk about similar methods for binary trees where you will have to insert or delete. I c开发者_运维百科an see that array will no longer be useful for this unless you have an upper bound on the number of insertions that are allowed.


It's always possible to build a binary tree in an array, using simple arithmetic to find child nodes from the parent. A common method (particularly for binary heaps) is to use the following...

left_child_index  = (2 * parent_index) + 1
right_child_index = (2 * parent_index) + 2

So the root node at 0 has children at 1 and 2, the node at 1 has children at 3 and 4 etc.

The downside of this scheme is that while you gain space by not storing pointers, you generally lose space by needing to leave gaps in the array for unused nodes. Binary heaps avoid this by being complete binary trees - every node in the range for the current number of items is valid. This works for heaps, but cannot work for binary search tree operations.

As long as you can resize your arrays (e.g. std::vector in C++), you don't need to place an upper bound on the number of inserts, but you can end up with a lot of gaps in the deeper parts of your array, especially if the tree gets unbalanced.

You also need some way to determine if a position in the array contains a valid node or not - either a flag or a data value that cannot occur in a valid node. Flags could potentially be stored as a packed bit-array, separate from the main nodes.

Another disadvantage is that restructuring the tree means moving data around - not just adjusting pointers. Pointer rotations (needed for many balanced binary trees, such as red-black trees and AVL trees) become potentially very costly operations - they don't just need to move the usual three nodes, but the entire subtree descended from the rotated nodes.

That said, if your items are very small, and if either your tree will stay small or you're OK with a simple unbalanced tree, it's just about possible that this scheme could be useful. This could just about be plausible as a set-of-integers data structure, maybe.

BTW - "plausible" doesn't mean "recommended". Even if you manage to find a case where it's more efficient, I'd find it hard to believe the development time was justified.

Possibly more useful...

Multiway trees contain small arrays of items in each node, rather than the usual one key. They are most often used for database indexes on hard disk. The most well known are B trees, B+ trees and B* trees.

Multiway trees have child node pointers, but for a node that can hold at most n keys, the number of child pointers is typically either n or n+1 - not twice n. Also, a common strategy is to use different node layouts for branch and leaf nodes. Only branch nodes have child pointers. Every data item is in a leaf node, and only leaf nodes contain non-key data. Branch nodes are purely used to guide searches. Since leaf nodes are by far the most numerous nodes, not having child pointers in them is a useful saving.

However - multiway tree nodes are rarely packed full. Again, there's a space overhead for unused array slots. The usual rule is that every node (except the root) must be at least half full. Some implementations put quite a bit of effort into avoiding splitting nodes, and thereby minimizing space overheads, but there is generally an expected overhead roughly proportional to the number of items.

I've also heard of a form of tree that holds multiple keys per node, but only has two child pointers per node. I can't even remember what this is called, I'm afraid.

It's also possible to store (parent pointer, child pointer) pairs in a separate data structure. This is fairly common for representing trees in databases, using a table of (parent ID, child ID) pairs, or a table of (parent ID, sibling index, child ID) triples or whatever. One advantage is that you don't need to store the 'null' pointers.

However, possibly the best option, rather than trying to reduce or eliminate the overhead for storing pointers, is to put that overhead to better use. Threaded binary trees make better use of child pointers for supporting efficient traversals of the tree - http://en.wikipedia.org/wiki/Threaded_binary_tree.


In C++ at least, part of the benefit of using an array instead of individually allocated structures is to avoid the overhead of creating each object. (An array of structures in C++ is continuous in memory, with no headers or allocation alignment issues) Saving one pointer might be small by comparison.

Unfortunately, in Java an array of Objects don't work that way, so using an array won't give you the benefit you might imagine. In C++ a reference to each object is calculated, but in Java a reference to each object is stored in memory, even if they happen to be continuous.

The only thing Java does for you is to use 32-bit references in a 64-bit JVM.

Unless you have a memory limited device, or an exceedingly large data structure (e.g. many millions of elements), you are highly unlikely to notice the difference and you can buy 16 GB for less than £100.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号