Storing data as an array type is not as popular as for example creating another table with the association.

Table of contents

    Arrays are useful in case when you are working with array-like data. It allows for a simpler schema for one-to-many relations and avoid lookup tables. Arrays can be also faster than using external tables and relations, but only for given operations. If we need just to retrieve all data from an array, then it can be done with a single query, rather than having to perform joins. Let's see some usages of arrays!

    We are going to work with the given table:

     postgres=# CREATE TABLE posts(
     name TEXT,
     tags INTEGER[]);

    And we are going to insert that data:

     postgres=# INSERT INTO posts VALUES('How to revert commit in Git', '{1, 2}');
     INSERT 0 1
     postgres=# INSERT INTO posts VALUES('How to get the struct type in Elixir', '{3, 2}');
     INSERT 0 1

    The [] after INTEGER means, that we are creating list of integers. We could also specify the size of that array by passing the value, eg: tags INTEGER[4] allows us to add only 4 values.

    Processing data

    Our values look like this right now:

     postgres=# SELECT * FROM posts;
     -[ RECORD 1 ]------------------------------
     name | How to revert commit in Git
     tags | {1,2}
     -[ RECORD 2 ]------------------------------
     name | How to get the struct type in Elixir
     tags | {3,2}

    While being used appropriately, we can perform amazing operations on it. To see records with tag' = '1, we may use that query:

     postgres=# SELECT * FROM posts WHERE 1 = ANY(tags);
     -[ RECORD 1 ]---------------------
     name | How to revert commit in Git
     tags | {1,2}

    If we would like to find 'posts' with 'tag' only equal to exact 1, 2 that query will help us:

     postgres=# SELECT * FROM posts WHERE '{1, 2}' = tags;
     -[ RECORD 1 ]---------------------
     name | How to revert commit in Git
     tags | {1,2}

    unnest

    Another useful function is also unnest. This function will expand an array to a set of rows. Let's see an example:

     postgres=# SELECT unnest(tags) FROM posts;
     -[ RECORD 1 ]
     unnest | 1
     -[ RECORD 2 ]
     unnest | 2
     -[ RECORD 3 ]
     unnest | 3
     -[ RECORD 4 ]
     unnest | 2

    As we may see, this function has split our tags to separate rows. But what if we would like to have also 'post' name here?

     postgres=# SELECT name, unnest(tags) AS "tag" FROM posts;
     -[ RECORD 1 ]------------------------------
     name | How to revert commit in Git
     tag  | 1
     -[ RECORD 2 ]------------------------------
     name | How to revert commit in Git
     tag  | 2
     -[ RECORD 3 ]------------------------------
     name | How to get the struct type in Elixir
     tag  | 3
     -[ RECORD 4 ]------------------------------
     name | How to get the struct type in Elixir
     tag  | 2

    Well, as you can see, it is not a big deal :). Let's imagine a situation, where we would like to filter posts by tag. In that case, we would need all, unique tags. How would we accomplish it?

     postgres=# SELECT DISTINCT unnest(tags) AS "tag" FROM posts ORDER BY 1;
     -[ RECORD 1 ]
     tag | 1
     -[ RECORD 2 ]
     tag | 2
     -[ RECORD 3 ]
     tag | 3

    As you may see, the unnested data may be treated as ordinary integers in that case.

    Code example in Elixir

    We are going to handle the same database for that example. Unfortunately, Ecto does not provide us with the unnest function, so we have to use fragment() function. Let's create a list_tags function for filtering. For that, I am going to use the composable query model - if you have not heard about it, I highly encourage you to read our valuable blog post: How to compose and refactor Ecto queries with Queries Modules

    post_queries.ex ...

    def unnest_tags(query \\ base()) do
        query
        |> select([posts: p],
            fragment("unnest(tags)")
        )
    end
    
    def order_by_tag(query \\ base()) do
        query
        |> order_by(1)
    end
    
    def distinct_list(query \\ base()) do
        query
        |> distinct(true)
    end

    posts.ex

    alias Repo
    alias PostQueries
    ...
    def list_tags do
        PostQueries.all()
        |> PostQueries.unnest_tags()
        |> PostQueries.order_by_tag()
        |> PostQueries.distinct_list()
        |> Repo.all()
    end

    Thanks to that, we will be given the list of distinct tags. If we use tags just as a piece of extra information, then is good (and faster!) to use arrays. Otherwise, if we would like to e.g. list tags, then we should definitely consider using an external table.

    Summary

    As you can see, stored data as an array are also a great option to omit creating one-to-many relations and using joins. However, the choice of using arrays must be carefully considered. Arrays can be used, when we are 100% sure we do not need any complex nested relationships. As presented in the above examples, the arrays may be used for storing lists of scalar values when you're sure that listing them globally won't be needed. Another case that we should take into account is portability - if we consider working with another database, then we should not use arrays (e.g. MySQL does not provide array data type). In general, it's best to choose the storage method that best fits your needs. What are your thoughts about storing data as an array type? Would you like to hear more about databases? Let us know in the comments!

    FAQ

    What are arrays in PostgreSQL?

    Arrays in PostgreSQL are a way to store multiple values in a single column, facilitating simpler schemas for one-to-many relations and avoiding lookup tables.

    How are arrays created in PostgreSQL?

    Arrays are created by specifying the array data type in the column definition, such as INTEGER[] for a list of integers.

    What are the benefits of using arrays in PostgreSQL?

    Arrays offer faster operations for specific use cases like retrieving all data with a single query, avoiding joins, and simplifying data structures.

    How can you perform operations on arrays in PostgreSQL?

    Operations on arrays can include checking for the presence of a value, comparing array contents directly, or using functions like unnest to convert arrays into separate rows.

    What is the unnest function in PostgreSQL?

    unnest is a PostgreSQL function that expands an array into a set of rows, making it easier to work with individual array elements.

    How can arrays impact database portability?

    Arrays should be used cautiously as they can affect database portability; for instance, MySQL does not support the array data type.

    When should arrays be used in PostgreSQL?

    Arrays are best used when dealing with list-like data that doesn't require complex nested relationships or when global listing of array elements is not needed.

    Mateusz Osiński
    Mateusz Osiński Elixir Developer

    Read more
    on #curiosum blog

    Continuous integration (CI) Elixir

    Mastering Elixir CI pipeline

    Developer time is precious for business. Don't waste it on things that can be easily automated. One of these things is a part of the code review process - quality checks. Continuous integration (CI) is where this automation is put in place.