I am starting a series of technical essays with aim to describe advanced techniques/heuristics in algorithms’ design, and this article is going to be the opening one. This topic is often under-represented in textbooks and algorithm handbooks. Quite often you can find in there the final solution, but you get zero hints about how the author actually invented this algorithm. In addition, the logical, deductive way of explanations how the solution was found is more a wish of brain to stick labels and see direct road, rather than objective reflections about how all elements of puzzle were actually put together. The internal kitchen, the secret laboratory of those subtle thought processes — that is what we will be looking for.

The technical complexity of problems which I picked up as examples is not very high, the full solution can be designed and coded in 10–15 minutes, and due to this very fact one can quite likely meet these (or similar) problems on interview at FAANG companies

Let’s get started! Today we will take a look at one simple problem tied with re-shuffling the parts of string.

You are given a binary string s. In one second, all occurrences of 01 are simultaneously replaced with 10. This process repeats until no occurrences of 01 exist.

Return the number of seconds needed to complete this process.

And here is an example:

t = 0 sec : 001011

t = 1 sec : 010101

t = 2 sec : 101010

t = 3 sec : 110100

t = 4 sec : 111000

We are expecting to get 4 seconds as an answer

This problem is from the class of “transformations of something (binary strings in our case) in accordance to some rules”. Usually we are asked about the number of steps to reach some specific state, or any other questions about the process and its result.

This problem can be solved in many ways, but it’s a good idea to start from the most straightforward one — a brute force approach. To do that we have first of all to identify the action (transformation of string in our case) and then to build the simulation of this process. Working simulation is going to be a tool which gives us the answer to all our questions.

For our case we have to code two methods — the first one to check that we indeed have reached the final state, and the second one — to implement transformation itself.

The implementation of the first method is obvious — we have to iterate through all characters comparing two neighboring ones, looking for the pattern 01. If found, this means we do not have a solution yet, because it’s easy to see that the final string — the result of transformation steps — can only contain patterns 11, or 00, or 10 (true — if there is no 01 in string, it cannot be transformed anymore in accordance with described procedure, can it?).

The transformation procedure itself is straightforward as well — on each step we have to make a copy of input string and generate a new one on its basis in accordance with rules defined in problem statement.

Obviously we have to count the number of steps taken — because this is exactly what we are asked.

What is the time and space complexity of this approach? Well, we are using a double buffer for initial string, so if the size of string is n characters, we will need an additional space of size n, hence the space complexity is O(n). We can improve it to O(1), but it’s out of scope of this article.

As for the time complexity, we’re going through all characters in the given string on each iteration-transformation, and in worst case scenario we will need to perform n such iterations, so the answer is O(n²) which is not very fast.

Now we have to take a short break, look around and answer one simple question:

why did we consider brute force approach at all if it’s so slow and inefficient?

The answer is — to create a correct solution which works and which gives us the correct answer to any input. As a result we can generate a golden set of test cases to validate the correctness of more sophisticated solutions, with more subtle and non-trivial logic. Sometimes (not always, but sometimes) to find a mathematical proof for correctness of some non-trivial algorithm could be more difficult than to solve the problem itself. The validation of such algorithm on input-output pairs from the generated golden set can be the easiest way to get some more or less solid evidences of correctness of this algorithm (this is also known as empirical proof of algorithm — in real world of advanced mathematics one can meet those proofs more often than you think)

In order to demonstrate the logic and to visualize the step-by-step dynamic of transformations, I used a brute force approach to generate 20 steps of transformations for the following random string (it contains 20 ones and 17 zeros)

1001111111110001011001110000000110101

And here is a spaghetti-like visualization of these transformations. I marked with yellow color the paths of 1s on the their way to final position, so you can literally see the dynamics of their moves.

It’s easy to see, that the answer to the question asked in this problem is the length of the rightmost path, which shows the trajectory of the rightmost 1.

It’s obvious, that the shortest path should be a diagonal with length equals to the number of 0s to the left of the aforementioned rightmost 1.

In this case we have a 17 such zeros, and the lower boundary for our answer is 17.

But we can see that the actual length of path is increased due to some delays in moves of 1s (I marked these delays with red color). This is because one can move 1 to the left if and only if the left-side neighbor is 0. One can see that in case of contiguous sequence of 1s we have a blocker, or queue, which actually causes these delays.

Also it’s easy to see that these queues are gradually disappearing on each transformation, and this disappearing is happening with speed modulated by the flow of 0s and 1s (the queue is growing when we meet 1s and shrinking otherwise)

This gives us the hint how to build a super efficient algorithm to crack the problem.

First, we will start from evaluation of the lower boundary for the answer, identifying the rightmost 1 and counting the number of 0s to the left of it.

Next, iterating through all characters in string starting from the beginning till the last 1, we have to evaluate the possible delays in movement of 1s due to contiguous sequence of 1s. So each time we meet such continuous blocks we are increasing the excess delay, otherwise in case of 0s we are decreasing the excess. The answer to the question asked in problem will be our lower boundary plus the excess delay we’ve just calculated on the last step.

What is the time and space complexity of this approach? Well, we are not using any buffers for initial string now, so the space complexity is O(1), i.e. constant space.

As for the time complexity, we have to iterate through all chars in string twice, so it’s going to be O(2n) which is close to theoretically best possible solution O(n).

And this is what we are eating today thinking about binary strings:

Pasta Puttanesca con aglio, olive e carpisti

References

[1] the source code of solutions — brute force and optimized

[2] a recipe for todays dinner: https://www.simplyrecipes.com/recipes/pasta_puttanesca/