Dealing with arrays in PostgreSQL
Storing data as an array type is not as popular as for example creating another table with the association. Arrays are useful in case when you are working with array-like data. It allows for a simpler schema for one-to-many relations and avoid lookup tables. Arrays can be also faster than using external tables and relations, but only for given operations. If we need just to retrieve all data from an array, then it can be done with a single query, rather than having to perform joins. Let's see some usages of arrays!
We are going to work with the given table:
postgres=# CREATE TABLE posts(
name TEXT,
tags INTEGER[]);
And we are going to insert that data:
postgres=# INSERT INTO posts VALUES('How to revert commit in Git', '{1, 2}');
INSERT 0 1
postgres=# INSERT INTO posts VALUES('How to get the struct type in Elixir', '{3, 2}');
INSERT 0 1
The []
after INTEGER
means, that we are creating list of integers. We could also specify the size of that array by passing the value, eg:
tags INTEGER[4]
allows us to add only 4 values.
Processing data
Our values look like this right now:
postgres=# SELECT * FROM posts;
-[ RECORD 1 ]------------------------------
name | How to revert commit in Git
tags | {1,2}
-[ RECORD 2 ]------------------------------
name | How to get the struct type in Elixir
tags | {3,2}
While being used appropriately, we can perform amazing operations on it.
To see records with tag' = '1
, we may use that query:
postgres=# SELECT * FROM posts WHERE 1 = ANY(tags);
-[ RECORD 1 ]---------------------
name | How to revert commit in Git
tags | {1,2}
If we would like to find 'posts' with 'tag' only equal to exact 1, 2
that query will help us:
postgres=# SELECT * FROM posts WHERE '{1, 2}' = tags;
-[ RECORD 1 ]---------------------
name | How to revert commit in Git
tags | {1,2}
unnest
Another useful function is also unnest
. This function will expand an array to a set of rows. Let's see an example:
postgres=# SELECT unnest(tags) FROM posts;
-[ RECORD 1 ]
unnest | 1
-[ RECORD 2 ]
unnest | 2
-[ RECORD 3 ]
unnest | 3
-[ RECORD 4 ]
unnest | 2
As we may see, this function has split our tags to separate rows. But what if we would like to have also 'post' name here?
postgres=# SELECT name, unnest(tags) AS "tag" FROM posts;
-[ RECORD 1 ]------------------------------
name | How to revert commit in Git
tag | 1
-[ RECORD 2 ]------------------------------
name | How to revert commit in Git
tag | 2
-[ RECORD 3 ]------------------------------
name | How to get the struct type in Elixir
tag | 3
-[ RECORD 4 ]------------------------------
name | How to get the struct type in Elixir
tag | 2
Well, as you can see, it is not a big deal :). Let's imagine a situation, where we would like to filter posts by tag. In that case, we would need all, unique tags. How would we accomplish it?
postgres=# SELECT DISTINCT unnest(tags) AS "tag" FROM posts ORDER BY 1;
-[ RECORD 1 ]
tag | 1
-[ RECORD 2 ]
tag | 2
-[ RECORD 3 ]
tag | 3
As you may see, the unnested data may be treated as ordinary integers in that case.
Code example in Elixir
We are going to handle the same database for that example.
Unfortunately, Ecto does not provide us with the unnest function, so we have to use fragment()
function. Let's create a list_tags
function for filtering.
For that, I am going to use the composable query model - if you have not heard about it, I highly encourage you to read our valuable blog post:
How to compose and refactor Ecto queries with Queries Modules
post_queries.ex
...
def unnest_tags(query \\ base()) do
query
|> select([posts: p],
fragment("unnest(tags)")
)
end
def order_by_tag(query \\ base()) do
query
|> order_by(1)
end
def distinct_list(query \\ base()) do
query
|> distinct(true)
end
posts.ex
alias Repo
alias PostQueries
...
def list_tags do
PostQueries.all()
|> PostQueries.unnest_tags()
|> PostQueries.order_by_tag()
|> PostQueries.distinct_list()
|> Repo.all()
end
Thanks to that, we will be given the list of distinct tags. If we use tags just as a piece of extra information, then is good (and faster!) to use arrays. Otherwise, if we would like to e.g. list tags, then we should definitely consider using an external table.
Summary
As you can see, stored data as an array are also a great option to omit creating one-to-many relations and using joins. However, the choice of using arrays must be carefully considered. Arrays can be used, when we are 100% sure we do not need any complex nested relationships. As presented in the above examples, the arrays may be used for storing lists of scalar values when you're sure that listing them globally won't be needed.
Another case that we should take into account is portability - if we consider working with another database, then we should not use arrays (e.g. MySQL does not provide array data type).
In general, it's best to choose the storage method that best fits your needs.
What are your thoughts about storing data as an array
type? Would you like to hear more about databases? Let us know in the comments!