# data structures – Are duplicate keys allowed in the definition of binary search trees?

## data structures – Are duplicate keys allowed in the definition of binary search trees?

Many algorithms will specify that duplicates are excluded. For example, the example algorithms in the MIT Algorithms book usually present examples without duplicates. It is fairly trivial to implement duplicates (either as a list at the node, or in one particular direction.)

Most (that Ive seen) specify left children as <= and right children as >. Practically speaking, a BST which allows either of the right or left children to be equal to the root node, will require extra computational steps to finish a search where duplicate nodes are allowed.

It is best to utilize a list at the node to store duplicates, as inserting an = value to one side of a node requires rewriting the tree on that side to place the node as the child, or the node is placed as a grand-child, at some point below, which eliminates some of the search efficiency.

You have to remember, most of the classroom examples are simplified to portray and deliver the concept. They arent worth squat in many real-world situations. But the statement, every element has a key and no two elements have the same key, is not violated by the use of a list at the element node.

So go with what your data structures book said!

Edit:

Universal Definition of a Binary Search Tree involves storing and search for a key based on traversing a data structure in one of two directions. In the pragmatic sense, that means if the value is <>, you traverse the data structure in one of two directions. So, in that sense, duplicate values dont make any sense at all.

This is different from BSP, or binary search partition, but not all that different. The algorithm to search has one of two directions for travel, or it is done (successfully or not.) So I apologize that my original answer didnt address the concept of a universal definition, as duplicates are really a distinct topic (something you deal with after a successful search, not as part of the binary search.)

If your binary search tree is a red black tree, or you intend to any kind of tree rotation operations, duplicate nodes will cause problems. Imagine your tree rule is this:

left < root <= right

Now imagine a simple tree whose root is 5, left child is nil, and right child is 5. If you do a left rotation on the root you end up with a 5 in the left child and a 5 in the root with the right child being nil. Now something in the left tree is equal to the root, but your rule above assumed left < root.

I spent hours trying to figure out why my red/black trees would occasionally traverse out of order, the problem was what I described above. Hopefully somebody reads this and saves themselves hours of debugging in the future!

#### data structures – Are duplicate keys allowed in the definition of binary search trees?

All three definitions are acceptable and correct. They define different variations of a BST.

Your college data structures book failed to clarify that its definition was not the only possible.

Certainly, allowing duplicates adds complexity. If you use the definition left <= root < right and you have a tree like:

``````      3
/
2       4
``````

then adding a 3 duplicate key to this tree will result in:

``````      3
/
2       4

3
``````

Note that the duplicates are not in contiguous levels.

This is a big issue when allowing duplicates in a BST representation as the one above: duplicates may be separated by any number of levels, so checking for duplicates existence is not that simple as just checking for immediate childs of a node.

An option to avoid this issue is to not represent duplicates structurally (as separate nodes) but instead use a counter that counts the number of occurrences of the key. The previous example would then have a tree like:

``````      3(1)
/
2(1)     4(1)
``````

and after insertion of the duplicate 3 key it will become:

``````      3(2)
/
2(1)     4(1)
``````

This simplifies lookup, removal and insertion operations, at the expense of some extra bytes and counter operations.