Best Practices·5 minutes read

Leetcode runtime analysis is not always so great on real-time cases

For some time I have been solving some SQL, Python and Java questions on LeetCode. However, some problems do not worth time posting but this case seems to be different. I guarantee that after this posts you will know why Leetcode runtime analysis is not always so great on real-time cases.

Linas Kapočius

Solutions Architect at Corgineering.com

December 30, 2024

Leetcode runtime analysis is not always so great on real-time cases

You can trust me that queries above gives identical results, code passes all tests. However, one is better than other regarding efficiency and could be cheaper in real world scenarios. So the differences lie in the questions:

1. Should we filter data before or after aggregation? (where vs having).

Filtering unneeded rows beforehand is always better because internally SQL does not need to group the whole pool of data. It takes just a little part of it. That's why partitioning by date is usually a very good idea while working on Data Lake.

However, it does not mean having clause is useless. For example, finding specific days when active users count was between 50 and 200 without having clause would be kind of hard to read and to write.

select
    activity_date as day,
    count(distinct user_id) as active_users
from
    activity
where
    activity_date > "2019-06-27" and
    activity_date <= "2019-07-27"
group by
    activity_date

2. Filtering logic by using additional SQL functions like DATEDIFF.

Leetcode runtime analysis would tell you that code without those methods is more efficient. Furthermore, explanation is quite easy to understand. We are adding additional calculation before filtering. However, Kaggle, Leetcode and all those platforms do not take into account two important aspects: code readability and abstractions.

select
    activity_date as day,
    count(distinct user_id) as active_users
from
    activity
group by
    activity_date
having
    datediff("2019-07-27", activity_date) between 0 and 29

In this case, reading where clause is quite easy but we do not understand why those days (2019-06-27 and 2019-07-27) have been chosen without any prior knowledge. But by adding DATEDIFF we could exactly tell that we are taking this period intentionally and not just some random timeframe. Also, by using bold where clause we needed to calculate 30 day period in our heads. So code is not as dynamic as in the first example.

As you can tell I am not entirely convinced which method is better. As a data engineer I need to think not just about time complexity (what Leetcode does almost flawlessly) but space complexity also important, so think of runtime analysis as just a one aspect of the whole picture. Don't forget it while reviewing someones' code.

This article is part of our Best Practices series. Check out our other articles.

**1. Should we filter data before or after aggregation? (where vs having). **

**2. Filtering logic by using additional SQL functions like DATEDIFF. **

1. Should we filter data before or after aggregation? (where vs having).

2. Filtering logic by using additional SQL functions like DATEDIFF.