Fixpoint

2025-11-09

October Busyboxing with grep and other bug fixes

Filed under: Data, Gales Linux, Software — Jacob Welsh @ 17:51

The Fixpoint series on Busybox code study and correction continues.

This time, the most damaging thorn that pricked me was in its implementation of the standard regular expression powered search and filtering program grep. What appears to have been a shoddy fix to an earlier bug, involving a leak of context lines between subsequent input files, turned the otherwise inconsequential overflow of a 32-bit signed integer used for counting input lines into a critical loss of output lines. If one made the mistake of only changing the counter variable to an otherwise more sensible unsigned integer type, not only would there still be a potential data loss on wraparound, but the original bug would flare up again.

This made for my first foray into the Busybox test suite, where I added test cases to demonstrate the original problem in several variants, thus building confidence that the final fix wasn't stepping on one or another old rusty rake. Thankfully, the testing and test writing processes weren't difficult to navigate.

To squeeze a bit more juice, I went on to grab some previously noted low-hanging fruits: ill-defined signal flagging behavior across various applets, and a recalcitrant stty refusing to display terminal size configuration when it happens to be zero—getting so confused in the process as to protest that the TTY device is not a TTY.

In further machine-generated accidental comedy, I checked in on Google's latest "AI Mode" for any signs of intelligent life outside the proverbial castle walls. It confidently assured me that no discussion of such a bug existed, not even my own public disclosure two months prior—and no, I haven't been hiding the web view of the logs with robots.txt or anything. It went on babbling about various things I hadn't asked about, such as unrelated Busybox components and unrelated grep implementations, having decided that the question I should have asked was about any sort of security vulnerabilities—at least provided that they'd been duly and officially stamped as such. See, that would have been the easier question to answer, not requiring any original analysis—almost as if there weren't in fact any emergent intelligence at work, just a psychotically confident regurgitation machine dressed in ironic little warning stickers. Wouldn't you like to date one?

Most outrageous of all was the claim that the overflow I'd so graciously pointed out was unlikely to exist, due to the range of 64-bit integers. As if one can begin to contemplate probabilities on the basis of random unjustified—or more like hallucinated—assumptions.(i)

Without further ado, here be patches:

The changes will be included in the next Gales tarball release. Enjoy!

  1. End of ironic use of em dashes. See, they've become one of the cheaper tells for AI-generated text, as unlike most people these days writing on keyboards, recent ChatGPT models have tended to make liberal use of them, even when told to desist. [^]
  2. busybox/findutils: fix loss of grep output after 2^31 input lines and extend line counters to 64 bits.

    grep dropped matching lines after counter overflow, even without the -n, -m or -c modes which use the counts, due to the awkward (linenum < 1) exit in print_line. This change addresses the original problem of leaking context lines in a stronger way, by resetting the context buffer between files. Thus, wrapping line counters no longer affect the modes that don't use them; for the rest, the counters and their displays expand to 64-bit unsigned.

    The initial allocation of the context buffer now takes advantage of calloc, saving a dedicated overflow check.

    Finally, this addresses a potential format string vulnerability in the usage_pod build program, and a local pointer with non-obvious initialization in grep_file, giving a warning-free grep rebuild.

    Text size +42 bytes on amd64. [^]

  3. busybox/testsuite: add tests for grep context buffer leak for which the prior attempted fix caused data loss at 2^31 input lines.

    These successfully demonstrate that the original bug was real, that a first attempt of just switching to uint64_t leaves things broken in yet another way, and that the current stronger approach of clearing the buffer on each new file resolves it for all those known cases. [^]

  4. busybox: use volatile sig_atomic_t for global written from signal handler, per standard C.

    This could fix a theoretical optimization-dependent situation where an applet fails to detect a signal because bb_got_signal isn't refreshed from memory.

    Previously noted in: http://jfxpt.com/2022/busybox-microcom-the-code-review/

    Text size +18 bytes on amd64, Gales configuration. [^]

  5. busybox/coreutils: fix garbled stty output & nonsensical error messages, such as when querying a serial console device; prune ifdefs and straighten some error reporting.

    Example of the primary pathology:

    $ stty -a -F /dev/ttyUSB0
    speed 115200 baud;stty: /dev/ttyUSB0
    line = 0;
    intr = ^C; ...
    
    $ stty -a -F /dev/ttyUSB0 >/dev/null
    stty: /dev/ttyUSB0: Not a tty
    
    $ stty -F /dev/ttyUSB0 size
    stty: /dev/ttyUSB0

    Now shows:

    $ stty -a -F /dev/ttyUSB0
    speed 115200 baud; rows 0; columns 0; line = 0;
    intr = ^C; ...

    Cutting through the perror_on_device* wrappers to the better fitting busybox routines brings joint benefits of reducing format string warnings along with source and machine code size. The final one with "cannot perform all requested operations" is further improved by skipping perror because errno is not meaningful in that case.

    Text size -65 bytes on amd64. [^]

No Comments »

No comments yet.

RSS feed for comments on this post. TrackBack URL

Leave a comment

Powered by MP-WP. Copyright Jacob Welsh.