Lateral Joins Instead of Window Functions
Every time I need to use a window function, I cringe a little. They’re not that hard to use. But I find them unintuitive, and I use them so infrequently that I always have difficulty figuring out what I’m partitioning over which rows.
And most times, when I reach for a window function, I’m not trying to do
anything complex. I’m usually calculating something for each row of my
results. If only Postgres had a for_each
function… 🤔
The problem
Let’s dive into a concrete problem. Suppose we have the following users
and
events
tables:
Users | ||
id | name | |
1 | John | john@example.com |
2 | Jean | jean@example.com |
… | ||
100030 | Sandra | sandra@example.com |
Events | ||
user_id | name | created_at |
1 | log_in | 2021-02-23 |
2 | log_in | 2021-02-23 |
9 | purchase | 2021-02-24 |
… | ||
999 | log_in | 2021-12-23 |
Someone asks you to get the latest log_in
event for a given user (e.g. id =
35
).
No problem. You’ve done an ORDER BY _ DESC LIMIT 1
query many a time.
SELECT user_id, created_at
FROM events
WHERE events.name = "log_in"
AND user_id = 35
ORDER BY created_at DESC
LIMIT 1
You probably didn’t even break a sweat.
But now, what query would you write if someone asks you to get a list of all
user emails with their latest log_in
events?
SELECT users.email, events.created_at AS latest_log_in
FROM users
INNER JOIN events
...
WHERE events.name = "log_in"
🤨 Not as easy, right?
It’d be nice if we could use our first query inside our second query. Something like this:
-- pseudocode
SELECT users.email, events.created_at AS latest_log_in
FROM users
INNER JOIN (
SELECT created_at
FROM events
WHERE events.name = "log_in"
AND user_id = 35
ORDER BY created_at DESC
LIMIT 1
) as events
The problem is that we have to do that for each user, not just user 35
. How
can we do that?
As it turns out, lateral joins are just the thing we need.
Using LATERAL joins
A LATERAL item can appear… within a JOIN tree… [It] can… refer to any items that are on the left-hand side of a JOIN that it is on the right-hand side of.
Using JOIN LATERAL
allows us to reference values from the main query inside
the subquery. So, we can access the users.id
from within our subquery and
remove the hard-coded user_id
:
SELECT users.email, events.created_at AS latest_log_in
FROM users
JOIN LATERAL (
SELECT created_at
FROM events
WHERE events.user_id = users.id
AND events.name = "log_in"
ORDER BY created_at DESC
LIMIT 1
) events ON true
With JOIN LATERAL
, Postgres will grab each row in FROM
, and it will evaluate
the LATERAL
item using that row’s value. Then it’ll join the tables as usual.
So, we select all of our users, and for each row in the users
table, we
evaluate the most recent log_in
event for that given user. Then we join those
results.
Isn’t that amazing?! 🤯
More resources
These resources helped me understand LATERAL
joins. I hope they’re helpful to
you too: