Wednesday, April 29, 2015

How to get a hash of a file in Exilir

As the next step in learning about elixir I wanted to add a file validator module to my elixgrep project. Of course this requires as a first step taking the cryptographic hash of a file. I found this handy blog entry, but it didn't answer the whole problem. For small files, you can just read in the whole file into a string and hash it.

iex> :crypto.hash(:sha256,File.read!("./known_hosts.txt")) |> Base.encode16       "97368E46417DF00CB833C73457D2BE0509C9A404B255D4C70BBDC792D248B4A2" 

But there reaches a point were the file size is large enough that loading the whole contents in memory isn't performant and in some cases not feasible. My next idea was to use File.stream!
iex> File.stream!("./known_hosts.txt") |>
Enum.reduce(:crypto.hash_init(:sha256),
fn(line, acc) -> :crypto.hash_update(acc,line) end ) |>
:crypto.hash_final |> Base.encode16
"97368E46417DF00CB833C73457D2BE0509C9A404B255D4C70BBDC792D248B4A2" 

However there is still a problem with this in that it assumes the file has appropriate line endings. For a cryptographic hash it makes more sense to divide up the file into equal byte length chunks. File.stream!/3 has two default arguements, modes and lines_or_bytes, if you want to stream in by byte_length use this form.

 iex> File.stream!("./known_hosts.txt",[],2048) |> Enum.reduce(:crypto.hash_init(:sha256),
fn(line, acc) -> :crypto.hash_update(acc,line) end ) |> :crypto.hash_final |> Base.encode16 "97368E46417DF00CB833C73457D2BE0509C9A404B255D4C70BBDC792D248B4A2" 


 Now the interesting question becomes is there an optimal byte size to use for this hashing? STAY TUNED...

No comments: