Dealing with arrays in PostgreSQL

Storing data as an array type is not as popular as for example creating another table with the association. Arrays are useful in case when you are working with array-like data. It allows for a simpler schema for one-to-many relations and avoid lookup tables. Arrays can be also faster than using external tables and relations, but only for given operations. If we need just to retrieve all data from an array, then it can be done with a single query, rather than having to perform joins. Let's see some usages of arrays!

Table of contents

We are going to work with the given table:

 postgres=# CREATE TABLE posts(
 name TEXT,
 tags INTEGER[]);

And we are going to insert that data:

 postgres=# INSERT INTO posts VALUES('How to revert commit in Git', '{1, 2}');
 INSERT 0 1
 postgres=# INSERT INTO posts VALUES('How to get the struct type in Elixir', '{3, 2}');
 INSERT 0 1

The [] after INTEGER means, that we are creating list of integers. We could also specify the size of that array by passing the value, eg: tags INTEGER[4] allows us to add only 4 values.

Processing data

Our values look like this right now:

 postgres=# SELECT * FROM posts;
 -[ RECORD 1 ]------------------------------
 name | How to revert commit in Git
 tags | {1,2}
 -[ RECORD 2 ]------------------------------
 name | How to get the struct type in Elixir
 tags | {3,2}

While being used appropriately, we can perform amazing operations on it. To see records with tag' = '1, we may use that query:

 postgres=# SELECT * FROM posts WHERE 1 = ANY(tags);
 -[ RECORD 1 ]---------------------
 name | How to revert commit in Git
 tags | {1,2}

If we would like to find 'posts' with 'tag' only equal to exact 1, 2 that query will help us:

 postgres=# SELECT * FROM posts WHERE '{1, 2}' = tags;
 -[ RECORD 1 ]---------------------
 name | How to revert commit in Git
 tags | {1,2}

unnest

Another useful function is also unnest. This function will expand an array to a set of rows. Let's see an example:

 postgres=# SELECT unnest(tags) FROM posts;
 -[ RECORD 1 ]
 unnest | 1
 -[ RECORD 2 ]
 unnest | 2
 -[ RECORD 3 ]
 unnest | 3
 -[ RECORD 4 ]
 unnest | 2

As we may see, this function has split our tags to separate rows. But what if we would like to have also 'post' name here?

 postgres=# SELECT name, unnest(tags) AS "tag" FROM posts;
 -[ RECORD 1 ]------------------------------
 name | How to revert commit in Git
 tag  | 1
 -[ RECORD 2 ]------------------------------
 name | How to revert commit in Git
 tag  | 2
 -[ RECORD 3 ]------------------------------
 name | How to get the struct type in Elixir
 tag  | 3
 -[ RECORD 4 ]------------------------------
 name | How to get the struct type in Elixir
 tag  | 2

Well, as you can see, it is not a big deal :). Let's imagine a situation, where we would like to filter posts by tag. In that case, we would need all, unique tags. How would we accomplish it?

 postgres=# SELECT DISTINCT unnest(tags) AS "tag" FROM posts ORDER BY 1;
 -[ RECORD 1 ]
 tag | 1
 -[ RECORD 2 ]
 tag | 2
 -[ RECORD 3 ]
 tag | 3

As you may see, the unnested data may be treated as ordinary integers in that case.

Code example in Elixir

We are going to handle the same database for that example. Unfortunately, Ecto does not provide us with the unnest function, so we have to use fragment() function. Let's create a list_tags function for filtering. For that, I am going to use the composable query model - if you have not heard about it, I highly encourage you to read our valuable blog post: How to compose and refactor Ecto queries with Queries Modules

post_queries.ex ...

def unnest_tags(query \\ base()) do
    query
    |> select([posts: p],
        fragment("unnest(tags)")
    )
end

def order_by_tag(query \\ base()) do
    query
    |> order_by(1)
end

def distinct_list(query \\ base()) do
    query
    |> distinct(true)
end

posts.ex

alias Repo
alias PostQueries
...
def list_tags do
    PostQueries.all()
    |> PostQueries.unnest_tags()
    |> PostQueries.order_by_tag()
    |> PostQueries.distinct_list()
    |> Repo.all()
end

Thanks to that, we will be given the list of distinct tags. If we use tags just as a piece of extra information, then is good (and faster!) to use arrays. Otherwise, if we would like to e.g. list tags, then we should definitely consider using an external table.

Summary

As you can see, stored data as an array are also a great option to omit creating one-to-many relations and using joins. However, the choice of using arrays must be carefully considered. Arrays can be used, when we are 100% sure we do not need any complex nested relationships. As presented in the above examples, the arrays may be used for storing lists of scalar values when you're sure that listing them globally won't be needed. Another case that we should take into account is portability - if we consider working with another database, then we should not use arrays (e.g. MySQL does not provide array data type). In general, it's best to choose the storage method that best fits your needs. What are your thoughts about storing data as an array type? Would you like to hear more about databases? Let us know in the comments!

FAQ

What are arrays in PostgreSQL?

Arrays in PostgreSQL are a way to store multiple values in a single column, facilitating simpler schemas for one-to-many relations and avoiding lookup tables.

How are arrays created in PostgreSQL?

Arrays are created by specifying the array data type in the column definition, such as INTEGER[] for a list of integers.

What are the benefits of using arrays in PostgreSQL?

Arrays offer faster operations for specific use cases like retrieving all data with a single query, avoiding joins, and simplifying data structures.

How can you perform operations on arrays in PostgreSQL?

Operations on arrays can include checking for the presence of a value, comparing array contents directly, or using functions like unnest to convert arrays into separate rows.

What is the `unnest` function in PostgreSQL?

unnest is a PostgreSQL function that expands an array into a set of rows, making it easier to work with individual array elements.

How can arrays impact database portability?

Arrays should be used cautiously as they can affect database portability; for instance, MySQL does not support the array data type.

When should arrays be used in PostgreSQL?

Arrays are best used when dealing with list-like data that doesn't require complex nested relationships or when global listing of array elements is not needed.

Dealing with arrays in PostgreSQL

Processing data

unnest

Code example in Elixir

Summary

FAQ

What are arrays in PostgreSQL?

How are arrays created in PostgreSQL?

What are the benefits of using arrays in PostgreSQL?

How can you perform operations on arrays in PostgreSQL?

What is the `unnest` function in PostgreSQL?

How can arrays impact database portability?

When should arrays be used in PostgreSQL?

Don't forget to share this post!

Read more
on #curiosum blog

Mastering Elixir CI pipeline

Elixir vs Ruby in Depth Comparison

Maximize Efficiency with the Top Property Maintenance Software Solutions

Dealing with arrays in PostgreSQL

Processing data

unnest

Code example in Elixir

Summary

FAQ

What are arrays in PostgreSQL?

How are arrays created in PostgreSQL?

What are the benefits of using arrays in PostgreSQL?

How can you perform operations on arrays in PostgreSQL?

What is the unnest function in PostgreSQL?

How can arrays impact database portability?

When should arrays be used in PostgreSQL?

Don't forget to share this post!

Read more on #curiosum blog

Mastering Elixir CI pipeline

Elixir vs Ruby in Depth Comparison

Maximize Efficiency with the Top Property Maintenance Software Solutions

What is the `unnest` function in PostgreSQL?

Read more
on #curiosum blog