Generate a Hash from string in Javascript

Generate a Hash from string in Javascript

String.prototype.hashCode = function() {
  var hash = 0, i, chr;
  if (this.length === 0) return hash;
  for (i = 0; i < this.length; i++) {
    chr   = this.charCodeAt(i);
    hash  = ((hash << 5) - hash) + chr;
    hash |= 0; // Convert to 32bit integer
  }
  return hash;
};

Source:
http://werxltd.com/wp/2010/05/13/javascript-implementation-of-javas-string-hashcode-method/

Many of the answers here are the same String.hashCode hash function taken from Java. It dates back to 1981 from Gosling Emacs, is extremely weak, and makes zero sense performance-wise in modern JavaScript. In fact, implementations could be significantly faster by using ES6 Math.imul, but no one took notice. We can do much better than this, at essentially identical performance.

Heres one I did—cyrb53, a simple but high quality 53-bit hash. Its quite fast, provides very good* hash distribution, and because it outputs 53 bits, has significantly lower collision rates compared to any 32-bit hash.

const cyrb53 = function(str, seed = 0) {
    let h1 = 0xdeadbeef ^ seed, h2 = 0x41c6ce57 ^ seed;
    for (let i = 0, ch; i < str.length; i++) {
        ch = str.charCodeAt(i);
        h1 = Math.imul(h1 ^ ch, 2654435761);
        h2 = Math.imul(h2 ^ ch, 1597334677);
    }
    h1 = Math.imul(h1 ^ (h1>>>16), 2246822507) ^ Math.imul(h2 ^ (h2>>>13), 3266489909);
    h2 = Math.imul(h2 ^ (h2>>>16), 2246822507) ^ Math.imul(h1 ^ (h1>>>13), 3266489909);
    return 4294967296 * (2097151 & h2) + (h1>>>0);
};

*It is roughly similar to the well-known MurmurHash/xxHash algorithms. It uses a combination of multiplication and Xorshift to generate the hash, but not as thorough. As a result its faster than either would be in JavaScript and significantly simpler to implement, but may not pass all tests in SMHasher. This is not a cryptographic hash function, so dont use this for security purposes.

Like any proper hash, it has an avalanche effect, which basically means small changes in the input have big changes in the output making the resulting hash appear more random:

501c2ba782c97901 = cyrb53(a)
459eda5bc254d2bf = cyrb53(b)
fbce64cc3b748385 = cyrb53(revenge)
fb1d85148d13f93a = cyrb53(revenue)

You can optionally supply a seed (unsigned integer, 32-bit max) for alternate streams of the same input:

76fee5e6598ccd5c = cyrb53(revenue, 1)
1f672e2831253862 = cyrb53(revenue, 2)
2b10de31708e6ab7 = cyrb53(revenue, 3)

Technically, it is a 64-bit hash, that is, two uncorrelated 32-bit hashes computed in parallel, but JavaScript is limited to 53-bit integers. If convenient, the full 64-bit output can be used by altering the return statement with a hex string or array.

return [h2>>>0, h1>>>0];
// or
return (h2>>>0).toString(16).padStart(8,0)+(h1>>>0).toString(16).padStart(8,0);
// or 
return 4294967296n * BigInt(h2) + BigInt(h1);

Be aware that constructing hex strings drastically slows down batch processing. The array is much more efficient, but obviously requires two checks instead of one. I also included BigInt, which should be slightly faster than String, but still much slower than Array or Number.


Just for fun, heres TinySimpleHash, the smallest hash I could come up with thats still decent. Its a 32-bit hash in 89 chars with better quality randomness than even FNV or DJB2:

TSH=s=>{for(var i=0,h=9;i<s.length;)h=Math.imul(h^s.charCodeAt(i++),9**9);return h^h>>>9}

Generate a Hash from string in Javascript

EDIT

based on my jsperf tests, the accepted answer is actually faster: http://jsperf.com/hashcodelordvlad

ORIGINAL

if anyone is interested, here is an improved ( faster ) version, which will fail on older browsers who lack the reduce array function.

hashCode = function(s){
  return s.split().reduce(function(a,b){a=((a<<5)-a)+b.charCodeAt(0);return a&a},0);              
}

one-liner arrow function version :

hashCode = s => s.split().reduce((a,b)=>{a=((a<<5)-a)+b.charCodeAt(0);return a&a},0)

Leave a Reply

Your email address will not be published.