Elixir Trickery: Cheating on Structs, And Why It Pays Off

While we can't say cheating on anyone is okay, we're not as absolutistic when it comes to cheating on Elixir at times.

Structs are there for a reason (we'll start from a brief overview), and that's certainly not for us to cheat on them. But we can if we have to - and we'll sometimes even justify that and get away with it!

Today's article will come in handy especially for those who are interested in developing libraries for Elixir and making them usable across different dependency versions, which is always a problem when writing code intended to be pluggable into different applications.

Table of contents

    Welcome to Elixir Trickery, a series of articles telling stories about utilizing little-known language features, applying out-of-the-box thinking to programming, and going inventive and creative in your coding.

    Introduction to Structs

    What is a struct? According to Elixir's Getting Started tutorial, Structs are extensions built on top of maps that provide compile-time checks and default values. So there are maps, which is one of Elixir's basic data structures, providing a means to store key-value pairs. So just to recap, or to show it off:

    # Defining a map
    > map = %{:key => :value}
    %{key: :value} # alternative syntax when a key is an Atom
    
    # Retrieving a value from a map
    > Map.get(map, :key)
    :value
    > map.key
    :value
    > map[:key] # This is the Access behaviour - we'll talk about it later
    :value
    
    # Trying to retrieve nonexistent key
    > Map.get(map, :foo)
    nil
    > map.foo
    ** (KeyError) key :foo not found in: %{key: :value}
    > map[:foo]
    nil

    Values can be retrieved from maps in three ways: Map.get/3 (optionally, the third argument is a default value), the "dot" syntax (which is, as you can see, quite strict, because it fails when given key isn't in the map), and the [] syntax, which is courtesy of Elixir's Access behaviour - we'll return to that.

    When it comes to updating maps, you're really not doing what you might be used to in all sorts of different languages, because - since values in Elixir are immutable - you're creating a new map.

    # Returning a new map with a new key, or an updated value under :key
    > Map.put(map, :new_key, :new_value)
    %{a: :b, key: :value}
    > Map.put(map, :key, :new_value)
    %{key: :new_value}
    
    # Merging maps
    > Map.merge(map, %{new_key: :new_value, foo: :bar})
    %{foo: :bar, key: :value, new_key: :new_value}
    
    # Shorthand for returning a new map with updated value under :key
    > %{map | key: :new_value}
    %{key: :new_value}
    
    # ...the shorthand doesn't work for putting new keys, though:
    > %{map | new_key: :value}
    ** (KeyError) key :new_key not found in: %{}

    Maps can be pattern matched on:

    # Pattern matching on a map
    > %{key: matched_value} = map
    %{key: :value}
    > matched_value
    :value
    
    # Pattern matching on a map in a function argument
    > function = fn %{key: matched_value} ->
    >   String.upcase(matched_value)
    > end
    #Function<6.128620087/1 in :erl_eval.expr/5>
    > function.(%{key: "Awesome!"})
    "AWESOME!"

    The pattern matching part is particularly awesome, because you can pattern match on nested maps as well:

    > map = %{outer_key: :outer_value, inner_map: %{inner_key: :inner_value}}
    > %{outer_key: outer_match, inner_map: %{inner_key: inner_match}} = map
    > inner_match
    :inner_value
    > outer_match
    :outer_value

    Finally, Structs!

    Now, structs are an extension of maps. Defining a struct like this:

    defmodule CuriosumTime do
      defstruct [:hour, :minute, :second]
    end

    ...allows you to create maps on steroids, that is, maps that must only contain specific keys. In this case, we've created a module named CuriosumTime, which uses the Kernel.defstruct/1 macro to define a set of fields that all structs following the CuriosumTime contract will be restricted to. How to use this restriction? Here's an example:

    > time1 = %CuriosumTime{hour: 21, minute: 37, second: 42}
    %CuriosumTime{hour: 21, minute: 37, second: 42}
    
    # Missing values will be filled with nil
    > time2 = %CuriosumTime{}
    %CuriosumTime{hour: nil, minute: nil, second: nil}
    
    # Unknown keys will be rejected
    > %CuriosumTime{foo: 1}
    ** (KeyError) key :foo not found

    As you can see, default values for defined struct keys are nil, unless you use defstruct with a keyword list:

    defstruct [hour: 12, minute: 0, second: 0] # [] can be omitted

    ...so that these will default to what you've specified. So when you access the structs' keys, the following will be returned:

    > time1.hour
    21
    
    > time2.hour
    nil

    The standard way to retrieve values under struct keys is to use the dot syntax because it'll disallow you to retrieve the value of a nonexistent key. You can also use Map.get/3 if you need to. How about the [] syntax, though?

    > time1[:hour]
    ** (UndefinedFunctionError) function CuriosumTime.fetch/2 is undefined (CuriosumTime does not implement the Access behaviour)
        CuriosumTime.fetch(%CuriosumTime{hour: 21, minute: 37, second: 42}, :hour)

    This is because the [] syntax is a shorthand for CuriosumTime.fetch/2, and fetch/2 is a callback of Elixir's Access behaviour. For a struct to be able to be accessed with [], you need to implement this behaviour in your struct's module, which means e.g. defining the fetch/2 function - we won't get into much detail on it, but let a library named StructAccess serve as an example of you can do that.

    To cap off our brief introduction to structs, let's stress that you can also pattern match on the other side of your expression being a specific struct type:

    def process_time(%CuriosumTime{} = time) do # our custom time struct
      # ...
    end
    
    def process_time(%Time{}) = time) do # Elixir's native time struct
      # ...
    end

    This is useful for cases where you need a single function to process differently structured data.

    And lastly, which is important for our further reasoning, it is important to know that internally, a struct is just a map with the __struct__ key referring to a specific module. Simple, ain't it?

    > time = %CuriosumTime{hour: 10, minute: 0, second: 0}
    > time.__struct__
    CuriosumTime

    Pattern matching: %StructName{} vs. %{__struct__: StructName}

    As we've noted, in Elixir, the defstruct construct is used to define a specific structure that describes a Map's requirement for the keys it contains, as well as their default values. For example:

    > defmodule Dog, do: defstruct breed: :mongrel, age: nil
    > dog = %{__struct__: Dog, age: 5, breed: :husky}
    %Dog{age: 5, breed: :husky}

    What's underlying is just an ordinary Map where Dog is put under the :__struct__ key. This means that you can match it with both of the following syntaxes:

    > %Dog{} = dog
    %Dog{age: 5, breed: :husky}
    > %{__struct__: Dog} = dog
    %Dog{age: 5, breed: :husky}

    Is the %Dog{} syntax just a syntactic sugar, then? Well, not exactly. Suppose you have an animal variable and you want to check whether it is a Dog or a Cat... but you don't have the Cat struct defined yet.

    > case animal do
    >   %Dog{} -> IO.puts("Woof!")
    >   %Cat{} -> IO.puts("Meow!")
    > end
    ** (CompileError) iex:37: Cat.__struct__/0 is undefined, cannot expand struct Cat
    
    > case animal do
    >   %{__struct__: Dog} -> IO.puts("Woof!")
    >   %{__struct__: Cat} -> IO.puts("Meow!")
    > end
    Woof!
    :ok

    See the difference? defstruct introduces an additional compile-time check for the actual existence of matched struct, while when simply matching the __struct__ key, Dog and Cat are just plain Erlang atoms!

    This can make a huge difference when developing a library that needs to be compatible with multiple versions of a dependency - for instance, when dealing with and Ecto.Query's from key, which was a tuple in Ecto 2, but is an Ecto.Query.FromExpr struct (undefined in Ecto 2) from Ecto 3 on.

    Cheats (never) prosper

    As we've proven that you can cheat on Elixir when it comes to using struct definitions, you can also do it with the keys of a defined struct. Consider the following example, where we define a struct that has an enforced key - note that it is merely a compile-time check and doesn't come with any kind of validation, hence we're able to do this:

    defmodule Foo do
      @enforce_keys [:bar]
      defstruct @enforce_keys
    end
    
    good_foo = %Foo{bar: 1337} # OK
    bad_foo = %Foo{} # error - enforced key missing
    bad_foo = %Foo{bar: 1337, baz: 42} # error - key not found
    cheat_foo = %{__struct__: Foo} # apparently OK!
    cheat_foo = %{__struct__: Foo, bar: 1337, baz: 42} # apparently OK!

    Fine, but where to look for practical applications of this hack? Library developers usually avoid removing keys when creating new library versions, but this may not always be the case. While it's rare, it might turn out that an expected list of a struct's fields, often representing e.g. configuration options, will have an item removed or renamed in a future library revision. This might not sound exciting, but, realistically, you could find it handy in the future when pattern matching against such structs.

    Structs from Maps: Kernel.struct/2

    When dealing with data coming from external sources, perhaps provided from an import or an external API, the need to sanitize the data often arises, and structs provide the basic means to do this.

    So let's suppose that you've parsed a dataset into a map, you can call Kernel.struct/2 to annotate it as a specific struct, and what's important is that you can control the behaviour of handling unknown key occurrences.

    Specifically, there are two similar functions defined in Kernel: struct/2 will filter out keys undefined in the struct's defstruct definition, and will not fail on missing keys defined in @enforce_keys. On the contrary, struct!/2 has a rather more strict behaviour, failing on encountering an unknown key or on an enforced key not being present.

    defmodule CuriosumTime do
      @enforce_keys [:hour, :minute, :second]
      defstruct @enforce_keys
    end

    Since @enforce_keys is just a module attribute, you can directly reuse it in defstruct/1; alternatively, you can just provide a plain list, if you only want specific keys to be enforced.

    > data = %{hour: 12, minute: 30, millisecond: 45} # missing :second, extra :millisecond
    
    > struct(CuriosumTime, data)
    %CuriosumTime{hour: 12, minute: 30, second: nil}
    
    > struct!(CuriosumTime, data)
    ** (KeyError) key :millisecond not found in: %CuriosumTime{hour: 12, minute: nil, second: nil}
    
    > struct!(CuriosumTime, data |> Map.delete(:millisecond))
    ** (ArgumentError) the following keys must also be given when building struct CuriosumTime: [:second]

    Interestingly, a well-adopted library for parsing JSON data named Poison contains a decode!/2 function that will do the struct wrapping for you directly from a JSON dataset when passing a specific :as option. However, it looks to be flawed. While the following examples indicate that it's working:

    defmodule CuriosumTime do
      defstruct [:hour, :minute, :second]
    end
    
    json = ~s([
      {
        "hour": 12,
        "minute": 30,
        "second": 40
      },
      {
        "hour": 23,
        "minute": 15,
        "second": 50
      }
    ])
    
    > Poison.decode!(json)
    [
      %{"hour" => 12, "minute" => 30, "second" => 40},
      %{"hour" => 23, "minute" => 15, "second" => 50}
    ]
    
    > Poison.decode!(json, keys: :atoms)
    [%{hour: 12, minute: 30, second: 40}, %{hour: 23, minute: 15, second: 50}]
    
    > Poison.decode!(json, keys: :atoms, as: [%CuriosumTime{}])
    [
      %CuriosumTime{hour: 12, minute: 30, second: 40},
      %CuriosumTime{hour: 23, minute: 15, second: 50}
    ]

    ...problems arise when trying to use @enforce_keys:

    defmodule CuriosumTime do
      @enforce_keys [:hour, :minute, :second]
      defstruct @enforce_keys
    end
    
    > Poison.decode!(json)
    # same as above
    
    > Poison.decode!(json, keys: :atoms)
    # same as above
    
    > Poison.decode!(json, keys: :atoms, as: [%CuriosumTime{}])
    ** (ArgumentError) the following keys must also be given when building struct CuriosumTime: [:hour, :minute, :second]

    Something's looking rather off here - this is just to indicate that Poison.decode!/2 is fine to be used with most of its options, when it comes to creating structs and not just maps from your JSON data, it's better to use Kernel.struct/2 to process data in a way that you control.

    To go further...

    So we've discussed what Elixir has at its core about structs - they're very useful and used extensively throughout all sorts of well-adopted libraries such as Ecto, where each object retrieved from the database is represented as a struct.

    There are also several cool ways to build upon structs. As you may have noticed, structs are untyped, which means that our CuriosumTime struct can take :ten, "Ten" or anything as the hour - hell, in fact, Elixir's native Time struct also can. If you're into typed structs, it might be worth having a look at a library named typed_struct - though be aware that it relies on typespecs, which is not a true replacement for typing systems known from strongly typed languages.

    If you've got something interesting to add to the topic of structs - let us know and drop a comment below!

    Download our ebook
    Michał Buszkiewicz, Elixir Developer
    Michał Buszkiewicz Curiosum Founder & CTO

    Read more
    on #curiosum blog

    What's the difference between alias, import, require and use in Elixir? A complete guide with use cases

    In most programming languages we often deal with instructions responsible for handling dependencies. Elixir is no different.

    In Elixir, dependency is nothing more than compiled module which for some reason you want to use in another module. There are a couple of instructions that we use in Elixir to either make it easier or possible to interact with modules.

    In this blog post I'll explain and present use case examples of four of them:

    • alias,
    • require,
    • import,
    • use.

    5 top-tier companies that use Elixir

    Elixir is a pretty capable language - and it consistently ranks near the top of most loved and wanted languages rankings. It has a large following and some very persuasive preachers as well. But that would not be enough to make me like it – what I need as real proof of its strengths is real businesses that strive with Elixir.

    That’s what this list is all about – a bunch of stories from top companies that chose Elixir and never looked back. Let us show you how its power and versatility shows in practice.