nosql – HBase row key design for monotonically increasing keys

nosql – HBase row key design for monotonically increasing keys

How should a row key be designed so that the row with key ~10 comes last?

You see the scan output in this way because rowkeys in HBase are kept sorted lexicographically irrespective of the insertion order. This means that they are sorted based on their string representations. Remember that rowkeys in HBase are treated as an array of bytes having a string representation. The lowest order rowkey appears first in a table. Thats why 10 appears before 2 and so on. See the sections Rows on this page to know more about this.

When you left pad the integers with zeros their natural ordering is kept intact while sorting lexicographically and thats why you see the scan order same as the order in which you had inserted the data. To do that you can design your rowkeys as suggested by @shutty.

Im looking for some recommended ways or the ways that are more popular for designing HBase row keys.

There are some general guidelines to be followed in order to devise a good design :

  • Keep the rowkey as small as possible.
  • Avoid using monotonically increasing rowkeys, such as timestamp etc. This is a poor shecma design and leads to RegionServer hotspotting. If you cant avoid that use someway, like hashing or salting to avoid hotspotting.
  • Avoid using Strings as rowkeys if possible. String representation of a number takes more bytes as compared to its integer or long representation. For example : A long is 8 bytes. You can store an unsigned number up to 18,446,744,073,709,551,615 in those eight bytes. If you stored this number as a String — presuming a byte per character — you need nearly 3x the bytes.
  • Use some mechanism, like hashing, in order to get uniform distribution of rows in case your regions are not evenly loaded. You could also create pre-splitted tables to achieve this.

See this link for more on rowkey design.

HTH

HBase stores rowkeys in lexicographical order, so you can try to use this schema with fixed-length rowrey:

<prefix>~0001
<prefix>~0002
<prefix>~0003
...
<prefix>~0009
<prefix>~0010

Keep in mind that you also should use random prefixes to avoid region hot-spotting (when a single region accepts most of the writes, while the other regions are idle).

nosql – HBase row key design for monotonically increasing keys

monotonically increasing keys isnt a good schema for hbase.
you can read more here:
http://hbase.apache.org/book/rowkey.design.html

there also a link there to OpenTSDB that solve this problem.

Leave a Reply

Your email address will not be published.