Thursday, May 14, 2015

Benchmarking Elixir File Hashing Step 1. Creating test data

In my previous post I outlined how to get a cryptographic hash of a file using Elixir. For large files, you want to chunk the file rather than read the whole thing into memory, it would be good to determine if there is an optimal chunk size to use to for passing into the encryption functions.
 iex> File.stream!("./known_hosts.txt",[],2048)
|> Enum.reduce(:crypto.hash_init(:sha256),fn(line, acc) -> :crypto.hash_update(acc,line) end )
|> :crypto.hash_final |> Base.encode16 "97368E46417DF00CB833C73457D2BE0509C9A404B255D4C70BBDC792D248B4A2" 

The first step in doing this is to create test data files of a specific length. An obvious approach is to simply open /dev/random and read until you've got enough sample data. A simple shell example.
 head -c 1024 < /dev/random > test.data 

Translating that to Elixir looks like
  iex(1)> File.stream!("/dev/random",[],1024) |> Enum.take(1) ** (File.Error) could not stream /dev/random: illegal operation on a directory (elixir) lib/file/stream.ex:81: anonymous fn/2 in Enumerable.File.Stream.reduce/3 (elixir) lib/stream.ex:1012: anonymous fn/5 in Stream.resource/3 (elixir) lib/enum.ex:1740: Enum.take/2
That sure looks like a bug, /dev/random is not a directory, but a char special device file. While the error message is misleading, there is no actual bug. Erlang will not open files for reading that it considers dangerous to the overall scheduler. In this case, /dev/random is a character special device file and since these kinds of files usually block on I/O, Erlang errs on the side of caution and will refuse to open the file. There is an exception in the Erlang code for /dev/null since that is considered safe for the scheduler. This post goes into the details.

Reading Device Files in Erlang

There are several ways to get around this problem. The first solution that springs to mind is actually one of the more difficult ones to do in Elixir. In many languages there is a system call that you can use to execute shell commands. Elixir has System.cmd, but it is relatively limited. You can specify the command to execute and the argument list, but you cannot use shell based I/O redirection.

The most straightforward Elixir solution is to use the rand_bytes function from the Erlang crypto library.
  iex(1)> File.write("test.data",:crypto.rand_bytes(1024))


But that isn't much fun, and while it solves this problem it doesn't give us a tool for interacting with external programs. We'll look at more general solutions in the next post...

Wednesday, April 29, 2015

How to get a hash of a file in Exilir

As the next step in learning about elixir I wanted to add a file validator module to my elixgrep project. Of course this requires as a first step taking the cryptographic hash of a file. I found this handy blog entry, but it didn't answer the whole problem. For small files, you can just read in the whole file into a string and hash it.

iex> :crypto.hash(:sha256,File.read!("./known_hosts.txt")) |> Base.encode16       "97368E46417DF00CB833C73457D2BE0509C9A404B255D4C70BBDC792D248B4A2" 

But there reaches a point were the file size is large enough that loading the whole contents in memory isn't performant and in some cases not feasible. My next idea was to use File.stream!
iex> File.stream!("./known_hosts.txt") |>
Enum.reduce(:crypto.hash_init(:sha256),
fn(line, acc) -> :crypto.hash_update(acc,line) end ) |>
:crypto.hash_final |> Base.encode16
"97368E46417DF00CB833C73457D2BE0509C9A404B255D4C70BBDC792D248B4A2" 

However there is still a problem with this in that it assumes the file has appropriate line endings. For a cryptographic hash it makes more sense to divide up the file into equal byte length chunks. File.stream!/3 has two default arguements, modes and lines_or_bytes, if you want to stream in by byte_length use this form.

 iex> File.stream!("./known_hosts.txt",[],2048) |> Enum.reduce(:crypto.hash_init(:sha256),
fn(line, acc) -> :crypto.hash_update(acc,line) end ) |> :crypto.hash_final |> Base.encode16 "97368E46417DF00CB833C73457D2BE0509C9A404B255D4C70BBDC792D248B4A2" 


 Now the interesting question becomes is there an optimal byte size to use for this hashing? STAY TUNED...

Thursday, April 16, 2015

Elixir functions are not Functions

I was banging my head against a simple elixir test this morning. It's a test that's been in my code for months and that never worked for some reason. This is a boiled down example:


defmodule RaiseTest do
        def testraise do
                raise "This is an error"
        end
end

defmodule RaiseTestTest do
  use ExUnit.Case
  test "assert_raise works" do
    assert_raise(RuntimeError, RaiseTest.testraise )
  end
end

mix test

  1) test assert_raise works (RaiseTestTest)
     test/raise_test_test.exs:4
     ** (RuntimeError) This is an error
     stacktrace:
       (raise_test) lib/raise_test.ex:4: RaiseTest.testraise/0
       test/raise_test_test.exs:5


Finished in 0.03 seconds (0.03s on load, 0.00s on tests)
1 tests, 1 failures
My error was finally pointed out to me this morning on the elixir list and opened my eyes to a consistent flaw in my thinking about elixir. The fix of course is to actually provide a function to assert_raise, what I was actually providing was an expression that could return anything. 

 test "assert_raise works" do
    assert_raise(RuntimeError, fn -> RaiseTest.testraise end ) 
end

The key misunderstanding in my head was that function when used in the elixir context actually maps to what I think of as a function reference from my years of C programming. Module functions return expressions which can be anything. The test failed because it was calling TestRaisetest to see if it returned a function it could use in the test. 
When an elixir function requires a "function" as an argument, it really means a closure that can be executed.

Friday, April 3, 2015

Remember JSON is always valid YAML

While I don't mind YAML for relatively simple data files, I find it to be very difficult to use when there are more than one or two levels of nesting. If you have a relatively complex config file, then I find JSON to be a much easier to reason about and make any necessary changes.

For current versions of YAML, JSON files are a subset of the YAML standard. Thus any compliant YAML parser will also parson JSON.

The place where I use this trick the most often is in .kitchen.yml files used by TestKitchen when writing Chef cookbooks.  I start out by taking an existing .kitchen.yml file and converting it to
.kitchen.json using this website http://codebeautify.org/yaml-to-json-xml-csv. I then copy the
.kitchen.json to .kitchen.yml.


Tuesday, February 24, 2015

Writing Effective Examples in Elixir docstrings.

I had an email exchange on the elixir-core mailing list recently that I think is worth preserving as a blog post. It shows a great approach to how to write examples for functions with complex output.

We had a meetup this weekend for hacking the docs. The intent was to add examples to the standard library. 

I started looking around in code.ex and the question I run into is the following:

Should examples be functionally accurate or readable? 

Here's what I mean. I was attempting to add an example for 

Code.get_docs. 

This is what I came up with. 

## Examples

      iex> Code.get_docs(Atom, :docs)
      [{{:to_char_list, 1}, 36, :def, [{:atom, [], nil}],
      "Converts an atom to a char list.\\n\\nInlined by the compiler.\\n\\n## Examples\\n\\n    iex> Atom.to_char_list(:\"An atom\")\\n    'An atom'\\n\\n"},
      {{:to_string, 1}, 20, :def, [{:atom, [], nil}],
      "Converts an atom to a string.\\n\\nInlined by the compiler.\\n\\n## Examples\\n\\n    iex> Atom.to_string(:foo)\\n    \"foo\"\\n\\n"}]
      
      iex(1)> Code.get_docs(Atom, :all )
      [docs: [{{:to_char_list, 1}, 36, :def, [{:atom, [], nil}],
      "Converts an atom to a char list.\\n\\\nInlined by the compiler.\\n\\n## Examples\\n\\n    iex> Atom.to_char_list(:\"An atom\")\\n    'An atom'\\n\\n"},
      {{:to_string, 1}, 20, :def, [{:atom, [], nil}],
      "Converts an atom to a string.\\n\\nInlined by the compiler.\\n\\n## Examples\\n\\n    iex> Atom.to_string(:foo)\\n    \"foo\"\\n\\n"}],
      moduledoc: {1,"Convenience functions for working with atoms.\\n\\nSee also `Kernel.is_atom/1`.\\n"}]
 
It's really messy to read and I'm still working on getting the quoting correct for ex_doc. 
Atom is about the simplest standard module available, so picking anything else just 
gets worse. Also it suffers from the problem that it doesn't automatically update when
the docs for Atom update. 

My initial thought was to dynamically create a Sample module and then use Code.get_docs
on that example ( i.e. similar to the test for Code.get_docs ).  

So I guess the question is: 

Should any examples in the documentation exactly match the results from running the code in iex? 

i.e. while the standard lib does not use doctest as far as I know, any examples included should be
doctest aware. Or is it okay to simplify the results for clarity? 

- Booker C. Bense

José Valim 
Feb 23
Should any examples in the documentation exactly match the results from running the code in iex?

Yes. But you can always "cheat"!

# Get the doc for the first function
iex> [fun|_] = Code.get_docs(Atom, :docs) |> Enum.sort()
# Each clause is in the format
iex> {{_function, _arity}, _line, _kind, _signature, text} = fun
# Let's get the first line of the text
iex> String.split(text, "\n") |> Enum.at!(0)
"NOW YOU HAVE A STRING"

So you are showing the whole process without printing it all the way.


This is a really nice way to take very complex dense output and recast it in a simple example that makes the return value of the function much clearer. 

Thursday, January 8, 2015

Using ex_doc to create a github web site for your elixir project.

1. Use mix docs to create docs dir.

You'll need to add these to the deps section for your project in mix.exs

[{:earmark, "~> 0.1", only: :dev},
     {:ex_doc, "~> 0.5", only: :dev}]

Then run

mix deps.get
mix deps.compile
mix docs

This will create a doc subdir with an html documention for your project very similar to
that used by the standard elixir docs.

What you need to do know is to install those html pages into the special git branch that
allows you to create a user.github.io/project website.

2. Follow manual gh_pages instructions on github.

https://help.github.com/articles/creating-project-pages-manually/

git clone git@github.com:user/project.git project-gh-pages

cd project-gh-pages

git checkout --orphan  gh-pages

cp -r ../project/doc/* .
git add --all

( Note this is about the only time I would ever use git add --all )

git commit -a -m "Ex doc output"
git push origin gh-pages

After a few minutes the web pages should show at the url

http://user.github.io/project

Friday, November 14, 2014

Direct link to host in Nagios via Apache Rewrite

One of the people using our nagios server asked the other day:

"I know this doesn't work, but can  https://nagios.our.site/ourhost go directly to the page for host?"

I explained that the frames inside the current version of Nagios that we use were simply
links to cgi urls like

https://nagios.our.site/cgi-bin/status.cgi?type=1&host=ourhost

and that you could type that url into the browser and get a direct link. As I was explaining this I realized that it would not be too hard to use Apache Rewrite to create a url like this

https://nagios.our.site/host/ourhost

This is what I added to the nagios.conf file in /etc/httpd/conf.d to accomplish this.

 RewriteEngine On
 RewriteLog /etc/httpd/logs/ssl_rewrite_log

 # Rewrite host/foo to basic cgi view url

 RewriteRule   ^/host/(.*\.cgi)$               /cgi-bin/$1      [L,PT]        
 RewriteRule   ^/host/([a-z0-9\-]+*)$          /cgi-bin/status.cgi?host=$1 [L,PT]
    

The first rule rewrites any links clicked from with the initial page to correctly redirect to the appropriate CGI, the second rewrites the /host/ourhost to be a CGI url pointing to the basic status page.  The options are important, the first L means "last rule, stop attempting to rewrite the url", PT means that it is not a plain file, but needs to go back through the URI engine to be resolved to a cgi.

It's a simple hack, but people seem to like it. Which is probably more a statement about the Nagios dashboard than the utility of this tweak.