cz.cuni.jagrlib
Class HashFunction

java.lang.Object
  extended by cz.cuni.jagrlib.HashFunction

public final class HashFunction
extends java.lang.Object

General purpose hash function algorithms library. Original author: Arash Partow (2002), http://www.partow.net/ Adopted by: Josef Pelikan (2006). Copyright notice: Common Public License (http://www.opensource.org/licenses/cpl.php).

Since:
0.24
See Also:
HashFunctions.java

Nested Class Summary
static class HashFunction.HashType
          Implemented hash-functions.
 
Field Summary
protected  byte[] byteArray
           
protected  int endIndex
          Points after the data segment to be processed.
protected  char[] charArray
           
protected  int index
          Index to actually processed array.
protected  int[] intArray
           
protected  long[] longArray
           
protected  long param
          Hash-function parameter.
protected  int permutation
          Permutation ordinal number (for coordinate hashing).
protected  short[] shortArray
           
protected  short[] staticArray
           
protected  java.lang.String string
          Hash data: String.
protected  HashFunction.HashType type
          Current hash-function type.
 
Constructor Summary
HashFunction()
           
HashFunction(HashFunction.HashType type)
           
HashFunction(HashFunction.HashType type, long parameter)
           
 
Method Summary
protected  long APHash()
          An algorithm produced by original author of this library: Arash Partow.
protected  long BKDRHash()
          This hash function comes from Brian Kernighan and Dennis Ritchie's book "The C Programming Language".
protected  long DEKHash()
          An algorithm proposed by Donald E.
protected  void distribute(int x)
          Distributes integer value (32-bit) into two staticArray items.
protected  void distribute(long x)
          Distributes long value (64-bit) into four staticArray items.
protected  long DJBHash()
          An algorithm produced by Professor Daniel J.
protected  long ELFHash()
          Similar to the PJW Hash function, but tweaked for 32-bit processors.
protected  void finishDistribution()
          Finish staticArray distribution.
protected  long hash()
          Internal hashing function (switches between the implementations).
 long hash(byte[] a)
          Hashing the whole byte-array.
 long hash(byte[] a, int from, int to)
          Hashing the segment from byte-array.
 long hash(double x, double y)
          Hashing double coordinate tuple.
 long hash(double x, double y, double z)
          Hashing double coordinate triple.
 long hash(float x, float y)
          Hashing float coordinate tuple.
 long hash(float x, float y, float z)
          Hashing float coordinate triple.
 long hash(char[] a)
          Hashing the whole char-array.
 long hash(char[] a, int from, int to)
          Hashing the segment from char-array.
 long hash(int[] a)
          Hashing the whole int-array.
 long hash(int[] a, int from, int to)
          Hashing the segment from int-array.
 long hash(int x, int y)
          Hashing integer coordinate tuple.
 long hash(int x, int y, int z)
          Hashing integer coordinate triple.
 long hash(long[] a)
          Hashing the whole long-array.
 long hash(long[] a, int from, int to)
          Hashing the segment from long-array.
 long hash(short[] a)
          Hashing the whole short-array.
 long hash(short[] a, int from, int to)
          Hashing the segment from short-array.
 long hash(java.lang.String s)
          Hashing the whole string.
 long hash(java.lang.String s, int from, int to)
          Hashing the substring.
protected  void initDistribution()
          Initialize staticArray distribution.
protected  long JSHash()
          A bitwise hash function written by Justin Sobel.
protected  long LCGHash()
          LCG-based hash function (LCG is from "Numerical Recipes in C").
 long maxHash()
          Hash-function returns values: 0 to maxHash()-1.
protected  int next()
          Returns the next item (char, number).
protected  void permutate()
          Permutates the staticArray[ 0 .. endIndex-1 ] by permutation order "permutation".
protected  long PJWHash()
          This hash algorithm is based on work by Peter J.
protected  long RSHash()
          A simple hash function from Robert Sedgwicks Algorithms in C book.
protected  long SDBMHash()
          This is the algorithm of choice which is used in the open source SDBM project.
 long setParam(long parameter)
           
 int setPerm(int perm)
           
 void setType(HashFunction.HashType type)
           
 void setVariation(int var)
          Sets hash-function variation.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

type

protected HashFunction.HashType type
Current hash-function type.


param

protected long param
Hash-function parameter. While used by BKDR, it should be: 31 131 1313 13131 131313, ...


permutation

protected int permutation
Permutation ordinal number (for coordinate hashing).


index

protected int index
Index to actually processed array.


endIndex

protected int endIndex
Points after the data segment to be processed.


string

protected java.lang.String string
Hash data: String.


byteArray

protected byte[] byteArray

shortArray

protected short[] shortArray

charArray

protected char[] charArray

intArray

protected int[] intArray

longArray

protected long[] longArray

staticArray

protected short[] staticArray
Constructor Detail

HashFunction

public HashFunction()

HashFunction

public HashFunction(HashFunction.HashType type)

HashFunction

public HashFunction(HashFunction.HashType type,
                    long parameter)
Method Detail

initDistribution

protected final void initDistribution()
Initialize staticArray distribution. Use distribute() methods afterwards.


distribute

protected final void distribute(int x)
Distributes integer value (32-bit) into two staticArray items.


distribute

protected final void distribute(long x)
Distributes long value (64-bit) into four staticArray items.


finishDistribution

protected final void finishDistribution()
Finish staticArray distribution. Prepares staticArray[] for hash() methods.


next

protected final int next()
Returns the next item (char, number).


hash

protected final long hash()
Internal hashing function (switches between the implementations).


permutate

protected final void permutate()
Permutates the staticArray[ 0 .. endIndex-1 ] by permutation order "permutation".


setType

public final void setType(HashFunction.HashType type)

setParam

public final long setParam(long parameter)

setPerm

public final int setPerm(int perm)

setVariation

public final void setVariation(int var)
Sets hash-function variation.

Parameters:
var - Variation ordinal number (0, 1, ...).

maxHash

public final long maxHash()
Hash-function returns values: 0 to maxHash()-1.


hash

public final long hash(java.lang.String s)
Hashing the whole string.


hash

public final long hash(java.lang.String s,
                       int from,
                       int to)
Hashing the substring.


hash

public final long hash(byte[] a)
Hashing the whole byte-array.


hash

public final long hash(byte[] a,
                       int from,
                       int to)
Hashing the segment from byte-array.


hash

public final long hash(short[] a)
Hashing the whole short-array.


hash

public final long hash(short[] a,
                       int from,
                       int to)
Hashing the segment from short-array.


hash

public final long hash(char[] a)
Hashing the whole char-array.


hash

public final long hash(char[] a,
                       int from,
                       int to)
Hashing the segment from char-array.


hash

public final long hash(int[] a)
Hashing the whole int-array.


hash

public final long hash(int[] a,
                       int from,
                       int to)
Hashing the segment from int-array.


hash

public final long hash(long[] a)
Hashing the whole long-array.


hash

public final long hash(long[] a,
                       int from,
                       int to)
Hashing the segment from long-array.


hash

public final long hash(int x,
                       int y)
Hashing integer coordinate tuple.


hash

public final long hash(int x,
                       int y,
                       int z)
Hashing integer coordinate triple.


hash

public final long hash(float x,
                       float y)
Hashing float coordinate tuple.


hash

public final long hash(float x,
                       float y,
                       float z)
Hashing float coordinate triple.


hash

public final long hash(double x,
                       double y)
Hashing double coordinate tuple.


hash

public final long hash(double x,
                       double y,
                       double z)
Hashing double coordinate triple.


LCGHash

protected final long LCGHash()
LCG-based hash function (LCG is from "Numerical Recipes in C").


RSHash

protected final long RSHash()
A simple hash function from Robert Sedgwicks Algorithms in C book. I've added some simple optimizations to the algorithm in order to speed up its hashing process.


JSHash

protected final long JSHash()
A bitwise hash function written by Justin Sobel.


PJWHash

protected final long PJWHash()
This hash algorithm is based on work by Peter J. Weinberger of AT&T Bell Labs.


ELFHash

protected final long ELFHash()
Similar to the PJW Hash function, but tweaked for 32-bit processors. Its the hash function widely used on most UNIX systems.


BKDRHash

protected final long BKDRHash()
This hash function comes from Brian Kernighan and Dennis Ritchie's book "The C Programming Language". It is a simple hash function using a strange set of possible seeds which all constitute a pattern of 31....31...31 etc, it seems to be very similar to the DJB hash function.


SDBMHash

protected final long SDBMHash()
This is the algorithm of choice which is used in the open source SDBM project. The hash function seems to have a good over-all distribution for many different data sets. It seems to work well in situations where there is a high variance in the MSBs of the elements in a data set.


DJBHash

protected final long DJBHash()
An algorithm produced by Professor Daniel J. Bernstein and shown first to the world on the usenet newsgroup comp.lang.c. It is one of the most efficient hash functions ever published.


DEKHash

protected final long DEKHash()
An algorithm proposed by Donald E. Knuth in The Art Of Computer Programming Volume 3, under the topic of sorting and search chapter 6.4.


APHash

protected final long APHash()
An algorithm produced by original author of this library: Arash Partow. See http://www.partow.net/programming/hashfunctions/ Arash writes: I took ideas from all of the above hash functions making a hybrid rotative and additive hash function algorithm based around four primes 3,5,7 and 11. There isn't any real mathematical analysis explaining why one should use this hash function instead of the others described above other than the fact that I tired to resemble the design as close as possible to a simple LFSR. An empirical result which demonstrated the distributive abilities of the hash algorithm was obtained using a hash-table with 100003 buckets, hashing The Project Gutenberg Etext of Webster's Unabridged Dictionary, the longest encountered chain length was 7, the average chain length was 2, the number of empty buckets was 4579.