What Is Hash Function? What Is It Used For and Why Is It Important?

Hash Functions What Are They Used For and Why Are They Important

In the world of information technology and cybersecurity the term “hash”, or “hash function” appears abundantly. Hundreds of hashing algorithms exist. You have probably run into some of the most frequently utilized ones, like SHA1/2/3, SHA256/512 (which belong to SHA2 category), MD5, or B-Crypt. So what is hash and why should you possess a basic knowledge about this mathematical entity?

A hash function in general is a mathematical construct, which possesses a few specific qualities. First, it is a one way function. Any data you put in, will be transformed into a result, that cannot be transformed back to regain the original input. Second, the output is of the same length regardless of the length or contents of your input. This may not be true for a few functions for special use cases, including extendable-output functions (XOFs). Third, a slightest change in the input causes notable change in the output.

 

Our digital forensics lab and client-centric team offer a tailored solution for your digital forensic investigation requirements. Diligent collection and analysis help provide court-admissible evidence that will aid your company or the law enforcement in court proceedings.

 

A secure hash function further guaranties high collision resistance. There is a minimal chance finding two different inputs that would produce the same output hash. The function should also have optimal calculation speed. Calculation too slow would impair usage efficiency, while too fast could make brute forcing of input and searching for collisions feasible. These features are especially important for password protection.

Password Protection and Authentication

One of the most important applications of hashing algorithms is password protection. When you sign in to a webservice like your webmail or e-shop account,  the website usually sends your password in plaintext over to the server in encrypted HTTPS traffic. The server should not store your password in plaintext. In case of breach the attackers could directly obtain the passwords this way. Instead, usually hash of your password is stored and compared to the hash of the password you log in with. Even better practice is using the SRP protocol (Secure Remote Password). This way neither passwords, nor hashes are shared with the server, nor travel through any network.

Other use of hashes is with public key authentication/signing. This is utilized for example by SSH protocol, or PGP signing and encryption of emails or files. Each and every public key has its own fingerprint. In fact, it is a hash value of certain mathematical operations performed on the key (SHA1 for PGP, SHA256 for SSH (or MD5 for legacy versions)). Fingerprints can be used to manually verify identity of the key.

Data Verification

One more use we will discuss here is with verification of authenticity and integrity of files you download and store. What is hash value of use regarding data? An identical copy of a file would produce same hash value as its original. We can use hashing to recognize whether file corruption happened, or whether an adversary tinkered with it. For example, if the attacker injected a malicious code to that file the hashes will not match. You can easily calculate hash of any of your files for yourself:

Windows Powershell command:

Get-FileHash <filepath> -Algorithm <MD5/SHA1/SHA256/…>

Linux bash command:

<md5/sha1/sha256/…>sum <filepath>

Cybercrime forensic investigators use this process as well. They need to validate the evidence collected for forensic analysis to confirm that they didn’t accidentally change the files during the analysis. Such case would render the investigation results unusable. Malware protection services use comparison of hashes of suspicious files with malicious files hash databases as one of the tools, as well.

 

References

FIPS PUB 202SHA-3 Standard: Permutation-Based Hash and Extendable-Output Functions

To Password or Not To Password