Storing data as an array type is not as popular as for example creating another table with the association. Arrays are useful in case when you are working with array-like data. It allows for a simpler schema for one-to-many relations and avoid lookup tables. Arrays can be also faster than using external tables and relations, but only for given operations. If we need just to retrieve all data from an array, then it can be done with a single query, rather than having to perform joins. Let's see some usages of arrays!

Table of contents

    We are going to work with the given table:

     postgres=# CREATE TABLE posts(
     name TEXT,
     tags INTEGER[]);

    And we are going to insert that data:

     postgres=# INSERT INTO posts VALUES('How to revert commit in Git', '{1, 2}');
     INSERT 0 1
     postgres=# INSERT INTO posts VALUES('How to get the struct type in Elixir', '{3, 2}');
     INSERT 0 1

    The [] after INTEGER means, that we are creating list of integers. We could also specify the size of that array by passing the value, eg: tags INTEGER[4] allows us to add only 4 values.

    Processing data

    Our values look like this right now:

     postgres=# SELECT * FROM posts;
     -[ RECORD 1 ]------------------------------
     name | How to revert commit in Git
     tags | {1,2}
     -[ RECORD 2 ]------------------------------
     name | How to get the struct type in Elixir
     tags | {3,2}

    While being used appropriately, we can perform amazing operations on it. To see records with tag' = '1, we may use that query:

     postgres=# SELECT * FROM posts WHERE 1 = ANY(tags);
     -[ RECORD 1 ]---------------------
     name | How to revert commit in Git
     tags | {1,2}

    If we would like to find 'posts' with 'tag' only equal to exact 1, 2 that query will help us:

     postgres=# SELECT * FROM posts WHERE '{1, 2}' = tags;
     -[ RECORD 1 ]---------------------
     name | How to revert commit in Git
     tags | {1,2}

    unnest

    Another useful function is also unnest. This function will expand an array to a set of rows. Let's see an example:

     postgres=# SELECT unnest(tags) FROM posts;
     -[ RECORD 1 ]
     unnest | 1
     -[ RECORD 2 ]
     unnest | 2
     -[ RECORD 3 ]
     unnest | 3
     -[ RECORD 4 ]
     unnest | 2

    As we may see, this function has split our tags to separate rows. But what if we would like to have also 'post' name here?

     postgres=# SELECT name, unnest(tags) AS "tag" FROM posts;
     -[ RECORD 1 ]------------------------------
     name | How to revert commit in Git
     tag  | 1
     -[ RECORD 2 ]------------------------------
     name | How to revert commit in Git
     tag  | 2
     -[ RECORD 3 ]------------------------------
     name | How to get the struct type in Elixir
     tag  | 3
     -[ RECORD 4 ]------------------------------
     name | How to get the struct type in Elixir
     tag  | 2

    Well, as you can see, it is not a big deal :). Let's imagine a situation, where we would like to filter posts by tag. In that case, we would need all, unique tags. How would we accomplish it?

     postgres=# SELECT DISTINCT unnest(tags) AS "tag" FROM posts ORDER BY 1;
     -[ RECORD 1 ]
     tag | 1
     -[ RECORD 2 ]
     tag | 2
     -[ RECORD 3 ]
     tag | 3

    As you may see, the unnested data may be treated as ordinary integers in that case.

    Code example in Elixir

    We are going to handle the same database for that example. Unfortunately, Ecto does not provide us with the unnest function, so we have to use fragment() function. Let's create a list_tags function for filtering. For that, I am going to use the composable query model - if you have not heard about it, I highly encourage you to read our valuable blog post: How to compose and refactor Ecto queries with Queries Modules

    post_queries.ex ...

    def unnest_tags(query \\ base()) do
        query
        |> select([posts: p],
            fragment("unnest(tags)")
        )
    end
    
    def order_by_tag(query \\ base()) do
        query
        |> order_by(1)
    end
    
    def distinct_list(query \\ base()) do
        query
        |> distinct(true)
    end

    posts.ex

    alias Repo
    alias PostQueries
    ...
    def list_tags do
        PostQueries.all()
        |> PostQueries.unnest_tags()
        |> PostQueries.order_by_tag()
        |> PostQueries.distinct_list()
        |> Repo.all()
    end

    Thanks to that, we will be given the list of distinct tags. If we use tags just as a piece of extra information, then is good (and faster!) to use arrays. Otherwise, if we would like to e.g. list tags, then we should definitely consider using an external table.

    Summary

    As you can see, stored data as an array are also a great option to omit creating one-to-many relations and using joins. However, the choice of using arrays must be carefully considered. Arrays can be used, when we are 100% sure we do not need any complex nested relationships. As presented in the above examples, the arrays may be used for storing lists of scalar values when you're sure that listing them globally won't be needed. Another case that we should take into account is portability - if we consider working with another database, then we should not use arrays (e.g. MySQL does not provide array data type). In general, it's best to choose the storage method that best fits your needs. What are your thoughts about storing data as an array type? Would you like to hear more about databases? Let us know in the comments!

    Mateusz Osiński - Elixir Developer
    Mateusz Osiński Elixir Developer

    Read more
    on #curiosum blog

    Continuous integration (CI) Elixir

    Mastering Elixir CI pipeline

    Developer time is precious for business. Don't waste it on things that can be easily automated. One of these things is a part of the code review process - quality checks. Continuous integration (CI) is where this automation is put in place, and in this article, I'd love to walk you through our setup.