Software quality notes: nested "for" loops considered harmful

2020-09-30
Dev
Software quality

While working on Exomars as a software engineer, I was notified an issue to fix. I thought it was an interesting one, and spent some moments to think about how it could have been prevented in the first place. I care about code quality, and always try to build reliable software. For this purpose, I do not believe in rockstar developers (we are all fallible humans), but rather in (automatically enforced) processes and tools. In this article, I want to share my thoughts about this bug in particular, but more articles about quality in general will probably follow.

Issue

The bug was located in the thermal control algorithms. It is a low criticality part (thermal is slow to react and change), and less tested area than GNC.

Here is the code, slightly reworked to remove unnecessary fluff and keep only the relevant portion.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29


/// Our spacecraft has a list of heaters. They are grouped by two, for redundancy.
typedef struct _s_heater {
    uint32_t nominal;
    uint32_t redundant;
} s_heater;

/// Function to switch off a list of heaters.
void TCS_HEAT_switchOffHeaters(
    uint32_t len, uint32_t *heatersOff,
    s_heater spacecraftHeaters[TCS_MAX_HEATERS]
) {
    uint32_t i = 0, j = 0;
    int32_t heaterId = -1;

    // For all the heaters to switch off
    for (i=0; i<len; i++)
    {
        // Search its id in the spacecraft's heaters list
        for (j=0; j<TCS_MAX_HEATERS; j++)
        {
            if (spacecraftHeaters[i].nominal == heatersOff[j])
            {
                heaterId = i;
            }
        }

        /* ... more code: searching heater id as redundant, switching off... */
    }
}

Even with sharp eyes, the bug is very difficult to see. Moreover, the compiler and advanced static analysis tools are probably unable to help us with these arrays (the fact that the parameter uint32_t *heaters is dynamic makes it even worse).

Here is the answer: the bug is line 21, where indexes i and j are mixed up.

Actually, this is made up, it is not even the bug which was reported to me. The bug I had was similar (issues with array indexes): for (j=0; i<TCS_MAX_HEATERS; j++). Again, indexes are mixed up, in an even worse way this time.

The particular bug does not matter much, as long as it involves two for loops. I wanted to talk about this kind of bugs because double for loops are quite common, and there are several ways to avoid entirely this class of bugs.

Protection measures

Variable name

The easiest and most obvious measure is to use clearer variable identifiers.

On one hand, using unambiguous names such as i_group and i_heater would probably have prevented the bug. On the other hand, the bug is still possible because this measure is very primitive and depends entirely on the (sometimes stupid) human brain.

To fully avoid the bug, the measure needs to be enforced at all times, by a static analysis tool or, even better, the compiler.

Variable scope

A first attempt to use the compiler to prevent us from using the incorrect variable can be achieved with scope: one can declare the variables in the smallest scope. In our example, this would prevent j from being used in the outer loop. Sadly, this is imperfect, because the other way around, using i in the inner loop, is still possible.

Many (old) people will advise against this and require variables to be declared at the beginning of the function. This probably comes from old C standards, such as ANSI C (C90), where variables must be declared immediately after an opening brace. I believe this is wrong, because it is an outdated custom which is not relevant anymore. Indeed, since C99 and onwards, variables can be declared anywhere.

In any case, even if this measure is not very effective, it is still better than nothing. Good protection is defense in depth, and is made of several layers.

Function length and purpose

The next measure uses the scope again, but more effectively.

Functions can be made shorter, with a more limited role. For example, a searchHeater() function would allow splitting the two for loops in two different functions, preventing the use of the wrong index.

An advantage of shorter functions is that it reduces the complexity, by having less variables and less branches. But too many small functions (here, 5-10 lines) could also be a disadvantage, as it could make the code more complicated with more indirections and layers of abstractions. Everything is a trade-off.

Iterators

The last and, in my opinion, best protection measure is to not use indexes at all. Yes, I kinda cheat, I just remove the issue entirely.

I rewrote the above code in Rust, using Rust’s Iterator feature. The Python version, which supports iterators as well, would look similar, with index() instead of position().

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21


/// Our spacecraft has a list of heaters. They are grouped by two, for redundancy.
struct Heater {
    nominal: u32,
    redundant: u32,
}

/// Function to switch off a list of heaters.
fn TCS_HEAT_switchOffHeaters(
    switchHeaters: [Heater ; TCS_MAX_HEATERS],
    heatersOff: &[u32],
) {
    let mut heaterId: Option<usize> = None;

    // For all the heaters to switch off
    for &heater_off in heatersOff.iter() {
        // Search its id in the spacecraft's heaters list
        heaterId = spacecraftHeaters.iter().position(|h| h.nominal == heater_off);

        /* ... more code: searching heater id as redundant, switching off... */
    }
}

It can be observed that the indexes have disappeared. The first for loop now uses a reference (a pointer) to the heater and the second loop has simply been removed.

Iterators are powerful. This kind of high level constructions avoid entire classes of bugs. In this case, no mistakes can be made with indexes, like in C. It can also be noted that the default value of heaterId is not -1 (which is error prone: what if it is a uint?), but None. It is similar to Python’s None, except that Rust is strongly typed, therefore None can not be implicitly casted to 0.

Iterators might be possible to implement in C, but it would be very complex (function pointers?), and definitely not as safe as when built in the language, like in Rust.

Conclusion

Years after years and projects after projects, it has been demonstrated that manual reviews and inspections are fragile and unreliable. Moreover, this kind of bugs are hard to find by tools (compilers or static analyzers).

Therefore, another method must be used. The best one is to remove the issue altogether by using higher-level constructions such as Iterators. This often has a runtime cost, but in Rust’s case, it is small or even negligible when compared to C.