Here’s the problem: I have 1 timeline, but 2 types of events appear on this timeline. I want to use events of type 1 as borders in my timeline, these borders will separate events of type 2 into groups.
For example: I want to know which bird sings first every morning after sunrise. I have a list of sunrises for the last month and a list of singing events. How do I get the result without doing a 2-level nested loop?
A 2-level loop will always work (but it will be the slowest)
I want to end up with a list of pairs of the sunrise time and the first song time.
A loop like this will always work:
for sunrise in sunrises:
for bird_song in bird_songs:
# if the song happened before the sunrise, check the next song
if bird_song.start_at < sunrise.at:
continue
# if the song happened on the next day, then let's move to the next sunrise
if bird_song.start_at > sunrise.day_ends_at:
break
sunrise.first_bird_song = bird_song
# once we found our first song, we can move on to the next sunrise
break
But it is looping over the bird_songs
too many times. For each sunrise, we start looping over bird_songs
from the 1st song on. Time-wise a classical 2-level loop is considered to take about $O(N^2)$ time.
True, in our case, we loop over 2 different lists, we have S number of sunrises and B number of bird songs, so we get O(S * B). Then we also break
out from the loop when we find the correct song, so we don’t really do the full B number of loops for every sunrise. But these estimations are always made with the worst-case scenario in mind, which is in our case: we have 10 million bird songs and all of them happen on the first day and before the sunrise, which brings us right into O(S * B).
One loop, two counters moving it forward
To make the above loop faster, we would need to loop over sunrises
and bird_songs
in alternation. We need to stop the bird_songs
loop when we find the desired song, move sunrises
1 iteration further and then continue looping over bird_songs
at the exact location, where we stopped looping before.
sunrise_counter = 0
song_counter = 0
while sunrise_counter < len(sunrises) and song_counter < len(bird_songs):
bird_song = bird_songs[song_counter]
sunrise = sunrises[sunrise_counter]
# if the song happened before the sunrise, check the next song
if bird_song.start_at < sunrise.at:
song_counter += 1
continue
# if the song happened on the next day, then let's move to the next sunrise
if bird_song.start_at > sunrise.day_ends_at:
sunrise_counter += 1
continue
sunrise.first_bird_song = bird_song
# once we found our first song, we can move on to the next sunrise and next song
song_counter += 1
sunrise_counter += 1
This while
-loop is extremely similar to the 2-level loop from above. It has the same if
-statements and the same comments
, but its time complexity is $O(S + B)$ instead of $(S * B)$. With 30 sunrises and 10M bird songs, we land at ~10M operations instead of 300M operations.
The biggest drawback of this approach: it’s much more difficult to read and maintain. This too is an important aspect of the code.
Choose the one, which brings you more benefits.