[ Home | Twitter | GitHub | LinkedIn | Resume/CV ]

Having fun with hash collisions

Finding murmur3 hash collisions at 200 GH/s

Preamble 🧾

Friend and fellow blogger Sam recently published his third visual guide and the focus this time is on hashing. It is not a pre-requisite to read that before this, but I strongly recommend you do. If only for the delightfully addicting animations. Honestly, they are sublime. Sadly, you will not find any animations here, so get your fix there and then come back here for a little bit of fun finding murmur3 hash collisions.

I will note that this post is entirely for fun, and nothing that is showcased here is intended to end up in your production code. As always all the code is over on GitHub.

The post was bourne from the fact I was asked help to find collisions and then I got addicted to see how many hashes per second I could eek out.

Snakes and ladders 🐍

Sam kindly shared the Python script. On my ageing machine (i7-6700k) I get about 3,311,080 hashes per ten seconds:-

import time
import mmh3
from uuid import uuid4

start = time.perf_counter()
inc = 0

while True:
   val = str(uuid4())
   h = mmh3.hash(val)
   if h == 1228476406:
      print(val)
   stop = time.perf_counter()
   duration = stop - start
   inc = inc + 1
   if duration >= 10:
      print(inc)
      break

I ran it for about thirty minutes and I didn’t find any collisions. My day-to-day is in C#, so let’s get on more familiar ground.

Sharpening our toolkit 🔪

A first pass at naively converting the Python code to C# code, results in 37,759,012 hashes per ten seconds:-

while (true)
{
    var guidString = Guid.NewGuid().ToString();
    var stringBytes = Encoding.ASCII.GetBytes(guidString);
    var computeHash = murmurHash32.ComputeHash(stringBytes);
    var int32 = BitConverter.ToInt32(computeHash);

    if (int32 == 1228476406)
    {
        Console.WriteLine(guidString);
    }
}

Which isn’t bad, but I think there are improvements to be had. Firstly, moving from UTF8 to ASCII gets us to 39,656,732 hashes per ten seconds:-

39,656,732 hashes per ten seconds
Took: 10,000 ms
Allocated: 7,435,745 kb
Peak Working Set: 26,180 kb
Gen 0 collections: 1820
Gen 1 collections: 1
Gen 2 collections: 0

I ran this version with 4-6 instances and managed to find twenty or so collisions in around thirty minutes.

Random bytes 🎲

Right now, we are using Guid.NewGuid() to get v4 GUID. We don’t really care about getting a v4 just a GUID-like value:-

byte[] guidBytes = new byte[16];
var random = new Random();

while (true)
{
    Array.Clear(guidBytes);
    random.NextBytes(guidBytes);

    var guidString = new Guid(guidBytes).ToString();
    // trunc
}

Randomly filling the byte array and then constructing a GUID gets to 49,788,883 hashes per ten seconds:-

49,788,883 hashes per ten seconds
Took: 10,000 ms
Allocated: 9,335,517 kb
Peak Working Set: 26,416 kb
Gen 0 collections: 2285
Gen 1 collections: 1
Gen 2 collections: 0

Switching the byte array out to a Span<byte> nets us a nice little bump:-

Span<byte> guidBytes = stackalloc byte[16];
var random = new Random();

while (true)
{
    guidBytes.Clear();
    random.NextBytes(guidBytes);

    var guidString = new Guid(guidBytes).ToString();
    // trunc
}

53,015,764 hashes per ten seconds
Took: 10,016 ms
Allocated: 9,940,562 kb
Peak Working Set: 26,448 kb
Gen 0 collections: 2434
Gen 1 collections: 1
Gen 2 collections: 0

Faster hash implementation 🏃‍♂️

We can eek out a little more performance if we force the string into a span too:-

Span<byte> guidBytes = stackalloc byte[16];
Span<byte> stringBytes = stackalloc byte[36];

while (true)
{
    guidBytes.Clear();
    stringBytes.Clear();
    random.NextBytes(guidBytes);

    var guidString = new Guid(guidBytes).ToString().AsSpan();
    Encoding.ASCII.GetBytes(guidString, stringBytes);
    // trunc
}

At this point the next biggest thing we could do is remove all strings (because they are evil). But before we do that let us see if there are other implementations of the MurmurHash3, currently we are using FastHashes. We will be using BenchmarkDotNet to ensure that we accurately benchmark other implementations.

A quick Google lands me on a post by Oren that is then further improved in the comments by Stu. I am dubbing this implementation OrenAndStu. Benchmarking both versions with an iteration of one million, we can see the OrenAndStu implementation is much faster than the FastHashes implementation:-

Method	Mean	Error	StdDev	Gen0	Allocated
A_FastHashes	18.247 ms	0.3597 ms	0.4142 ms	7625.0000	32000043 B
B_OrenAndStu	8.658 ms	0.0483 ms	0.0428 ms	-	9 B

The challenge is now to find a version that is faster than the OrenAndStu implementation. Being lazy I just installed a few packages from nuget, and let BenchmarkDotNet do the hard work:-

Method	Mean	Error	StdDev	Gen0	Allocated
A_FastHashes	17.884 ms	0.2983 ms	0.2791 ms	7625.0000	32000043 B
B_OrenAndStu	8.612 ms	0.0099 ms	0.0087 ms	-	9 B
C_HashDepot	12.139 ms	0.0868 ms	0.0769 ms	-	9 B
D_MurmurHash_Net	12.587 ms	0.0677 ms	0.0634 ms	-	9 B
E_MurmurHash_net_core	68.301 ms	0.6198 ms	0.5798 ms	15250.0000	64000123 B
F_System_Data_HashFunction_MurmurHash	98.631 ms	1.9528 ms	2.2489 ms	68800.0000	288000192 B
G_JeremyEspresso_MurmurHash	7.840 ms	0.0371 ms	0.0328 ms	-	9 B

The only implementation that is faster than OrenAndStu is JeremyEspresso. Switching to that breaks the 60,000,000 barrier:-

60,358,517 hashes per ten seconds
Took: 10,000 ms
Allocated: 5,658,711 kb
Peak Working Set: 25,992 kb
Gen 0 collections: 1385
Gen 1 collections: 1
Gen 2 collections: 0

Bytes all the way down 🐢

A lot of stings (and therefore allocations) are typically not great for performance. Currently we are:-

Generating random bytes
Forcing the random bytes into a GUID
Stringing the GUID
Then getting the bytes of the string’ed GUID.

Ideally if we can keep it bytes all the way then we should remove most of the allocations, and hopefully achieve a speed boost. To do this we will need to peek into the GUID class itself and steal a few internal methods:-

Span<byte> guidBytes = stackalloc byte[16];
Span<byte> stringBytes = stackalloc byte[36];

while (true)
{
    guidBytes.Clear();
    stringBytes.Clear();
    random.NextBytes(guidBytes);

    Ugly.GuidBytesToRegularBytes(guidBytes, stringBytes);

    ReadOnlySpan<byte> stringBytes1 = stringBytes;
    var int32 = MurmurHash.MurmurHash3.Hash32(ref stringBytes1, 0);
}

This way we keep it bytes all the way down and this results in a minor speed boost but it does eliminate pretty much all the allocations:-

64,052,282 hashes per ten seconds
Took: 10,031 ms
Allocated: 110 kb
Peak Working Set: 21,224 kb
Gen 0 collections: 0
Gen 1 collections: 0
Gen 2 collections: 0

GUIDless 🔤

Given that we are interested in finding any inputs that have a specific hash (i.e. collisions) we don’t need to stick with GUIDs. Any input is a valid candidate. Let’s try generating an eight-byte array from an alphanumeric input (a-z, A-Z, 0-9) and see what happens:-

Span<byte> stringBytes = stackalloc byte[8];
var chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789".Select(x => (byte)x).ToArray();

while (true)
{
    stringBytes.Clear();
    stringBytes[0] = chars[random.Next(0, chars.Length)];
    stringBytes[1] = chars[random.Next(0, chars.Length)];
    stringBytes[2] = chars[random.Next(0, chars.Length)];
    stringBytes[3] = chars[random.Next(0, chars.Length)];
    stringBytes[4] = chars[random.Next(0, chars.Length)];
    stringBytes[5] = chars[random.Next(0, chars.Length)];
    stringBytes[6] = chars[random.Next(0, chars.Length)];
    stringBytes[7] = chars[random.Next(0, chars.Length)];
    // trunc
}

156,659,240 hashes per ten seconds
Took: 10,047 ms
Allocated: 102 kb
Peak Working Set: 22,132 kb
Gen 0 collections: 0
Gen 1 collections: 0
Gen 2 collections: 0

What about if we remove the alphanumeric input and just have the eight byte array randomly filled as before?

Span<byte> stringBytes = stackalloc byte[8];

while (true)
{
    stringBytes.Clear();
    random.NextBytes(stringBytes);
}

329,339,000 hashes per ten seconds
Took: 10,031 ms
Allocated: 102 kb
Peak Working Set: 21,076 kb
Gen 0 collections: 0
Gen 1 collections: 0
Gen 2 collections: 0

Whilst we reach an insane level of hashes per ten seconds, a lot of the collisions are not really usable as they are not easily (if at all?) printable. I would probably say the alphanumeric version is probably the best one to use.

(Hash)cat amongst the pigeons 🕊

So far we have been bottlenecked by CPU. But this kind of workload isn’t suited for CPUs really, GPUs are where this workload can be really tackled. Hashcat is a command line only program that is designed to crack password hashes (fun story we used it internally to crack production passwords - but that is for another time); there is an extensive man page to consult.

There is a slight gotcha here in that hashcat expects the target to be in a certain format that is usually hash:salt. Running hashcat.exe --hash-info -m 27800 shows that the plaintext value of hashcat has a hash of 23e93f65. This is a little bit problematic, as so far we’ve been using an integer to represent the hash. If we run the value hashcat through one of the versions we get an integer of 602488677, and then to get the same string we need to generate a hexadecimal string after we reverse the hashed bytes array:-

var buffer = Encoding.ASCII.GetBytes("hashcat");
var computeHash = new FastHashes.MurmurHash32().ComputeHash(buffer);
Console.WriteLine(BitConverter.ToString(computeHash.Reverse().ToArray()).Replace("-", "").ToLower());
Console.WriteLine(BitConverter.ToInt32(computeHash, 0));

I am pretty sure we need to reverse the hashed array because of the endianness of the system, so double check if you are playing along. Once we do the same for one of the known input values of the target hash-int of 1228476406 we know that the hex string we are looking for is 49390ff6.

.\hashcat.exe -a 3 -m 27800 49390ff6:00000000 ?a?a?a?a?a?a?a?a --keep-guessing --runtime=10

On my ancient HD6950 spits out 31 inputs that have the hash collisions we are looking for. Hashcat also generates stats at the end of the run:-

Session..........: hashcat
Status...........: Aborted (Runtime)
Hash.Mode........: 27800 (MurmurHash3)
Hash.Target......: 49390ff6:00000000
Time.Started.....: Sun Jun 18 00:04:22 2023 (10 secs)
Time.Estimated...: Sun Jun 25 01:15:33 2023 (7 days, 1 hour; Runtime limit exceeded)
Kernel.Feature...: Optimized Kernel
Guess.Mask.......: ?a?a?a?a?a?a?a?a [8]
Guess.Queue......: 1/1 (100.00%)
Speed.#1.........: 10892.3 MH/s (4.10ms) @ Accel:128 Loops:256 Thr:64 Vec:4
Recovered........: 0/1 (0.00%) Digests
Progress.........: 107105746944/6634204312890625 (0.00%)
Rejected.........: 0/107105746944 (0.00%)
Restore.Point....: 0/7737809375 (0.00%)
Restore.Sub.#1...: Salt:0 Amplifier:544512-544768 Iteration:0-256
Candidate.Engine.: Device Generator
Candidates.#1....: Uz#erane -> C3$B?tan

Renting a GPU 💸

Given my GPU was released in 2010; I’m pretty sure the seven day remaining estimate can be obliterated with modern day hardware. I decided to rent a RTX 3090 from genesiscloud.com (transparency note - that is a referral link - you get 50 USD in free credits). Spinning up a new instance is pretty easy, installing hashcat on ubuntu was not easy as I am mostly used to Windows but I eventually got it working running (yes I did copy and paste a bunch of commands from Google until I got it working 😉) the exact same Hashcat workload spits out 433 inputs!

Session..........: hashcat
Status...........: Aborted (Runtime)
Hash.Mode........: 27800 (MurmurHash3)
Hash.Target......: 49390ff6:00000000
Time.Started.....: Sat Jun 17 23:21:21 2023 (10 secs)
Time.Estimated...: Sun Jun 18 07:57:16 2023 (8 hours, 35 mins; Runtime limit exceeded)
Kernel.Feature...: Optimized Kernel
Guess.Mask.......: ?a?a?a?a?a?a?a?a [8]
Guess.Queue......: 1/1 (100.00%)
Speed.#1.........:   214.3 GH/s (6.32ms) @ Accel:64 Loops:1024 Thr:256 Vec:1
Recovered........: 0/1 (0.00%) Digests
Progress.........: 2112133758976/6634204312890625 (0.03%)
Rejected.........: 0/2112133758976 (0.00%)
Restore.Point....: 1343488/7737809375 (0.02%)
Restore.Sub.#1...: Salt:0 Amplifier:714752-715776 Iteration:0-1024
Candidate.Engine.: Device Generator
Candidates.#1....: UcX}eTON -> xw(>N223
Hardware.Mon.#1..: Temp: 49c Fan:  0% Util: 99% Core:1890MHz Mem:9501MHz Bus:16

Lesson learned. If you need to find hash collisions (for non-cryptographic hashes), don’t bother writing your own code. Just jump straight to hashcat!

[ Home | Twitter | GitHub | LinkedIn | Resume/CV ]