Fixpoint

2022-10-20

BusyBox microcom, the code review

Filed under: Gales Linux, Hardware, JWRD, Software — Jacob Welsh @ 05:49

The code to be examined is a BusyBox applet (i.e. virtual program, one of the many cohabitants of the single busybox executable) written by one Vladimir Dronnikov and first added to its codebase in 2007, providing in its own words a "bare bones 'talk to modem' program - similar to 'cu -l $device'", and "inspired by mgetty's microcom". The name invokes "minicom", a popular but decidedly non-minimalist serial communications program, and clearly all the world is missing now is the corresponding maxicom and megacom.

It came to my attention only today, while provisioning a router for a customer and asking myself - not for the first time but more insistently now - "is there seriously no 'cu' ('call up') implementation for BusyBox?" My surprise in finding one could be ascribed to its being similar to the thing I was looking for yet misleadingly named after the thing I was NOT looking for. Then again, in fairness, it could just as well be ascribed to my still incomplete surveying of BusyBox, this thing that my OS is to some degree built around.

The basic requirements of such a program are simple: to forward bytes received from here to there, to simultaneously forward bytes received from there to here, and to provide just enough user interface to set it up, tear it down, and allow sending the out-of-band "break" signal provided by serial ports and recognized as a special interrupt by some equipment.

If you're into plumbing analogies, there's two pipes for data flowing in opposite directions, and since each pipe has an inlet and an outlet, there are four total endpoints.

The "here" and "there" mentioned above will typically refer in Unix terms to two distinct TTY device nodes, one representing the terminal emulator that displays text on screen and receives keyboard input, the other representing a physical serial port that transmits and receives the character data to and from a remote machine. The user is thereby able to control the remote machine as if his display and keyboard were directly attached to it, but with considerably more flexibility, considerably simpler hardware interfaces required on that machine, and all the beauty and power that comes from working with text streams instead of shoveling pixel grids.

Because the standard serial port was commonly used as the local interface to a standalone modem in the era of dial-up networking, some implementations of the concept added on various features to support modem initialization and dialing, allowing similar control of a truly remote system across the analog telephone network.

In 2017 when I was more single-mindedly ensconced in "to hell with the whole stinking computing industry, I'll rewrite the whole thing myself" mode, I did a quick implementation of the concept in a Python script.(i) I've been hesitant to publish it because of rough edges like leaky exception handling, lack of documentation and a truly minimal interface; moreover it's a dubious fit for Gales Linux since Python is such a heavy dependency while this would make most sense to include in the base system. Still, it's what I use to this day and provides a concise, known-good reference, weighing in at 119 total lines.

On the other end of the spectrum, a "cu" implementation found in the mainstream Linux world is provided by Taylor-UUCP.(ii) Its "cu.c" file alone consumes a whopping 2,168 lines; the full codebase, counting autoconfism and similar copypasta (because it fucking counts), a jaw-dropping 130,605.(iii)

Those 600-some words of introduction out of the way, we're now equipped to look at the item in question. At 183 lines, it earns right away the distinction of being something I can even consider reviewing, and apparently that's saying something these days (and no, I don't mean something about me).

Unfortunately, the praise I can offer it pretty much ends there.

The terminal initialization is tricky to follow, with obtusely named functions xget1 and xset1, invoking an external BusyBox routine get_termios_and_make_raw, which doesn't seem to have a clear definition of what "raw" means, given the various flags that are still passed in and have to be explained in comments. I didn't delve too deep, the termios API is going to be ugly no matter what and it's a minor point.

It has a -X option to "disable special meaning of NUL and Ctrl-X from stdin", while the help text fails to mention what that special meaning might be. It turns out Ctrl-X directs it to quit(iv) while the NUL character - which if you look up the ASCII table is produced by Ctrl-@, and in my experience for some odd reason also by Ctrl-Space - directs it to send a break.

This is not the interface a user of "cu" or "ssh" would expect, which is to enter an escape/control mode when ~ (tilde) is pressed at the start of a line, i.e. immediately following a Return. It's true that simple control keys would be easier to learn and faster to use, however this comes at the cost of an egregious conflict with popular Unix programs such as emacs and pico/nano, and others inspired by emacs input conventions. Finally, I submit that the design is broken just in its own right because it can't be nested. If by some awkward but unavoidable circumstance you ended up having to use two instances in a chain, there would be no way to pass the control codes through to the inner one, short of the quite disruptive route of quitting the outer one and restarting it with -X, and then how exactly would you quit it again, with Ctrl-X now being passed through?

There's a "-d DELAY" option to "wait up to DELAY ms for TTY output before sending every next byte to it". By which they mean, near as I can tell, before sending any further bytes to it. I can't tell what you'd need this behavior for, but I suppose someone must have and it doesn't add too much complexity.

The "-t TIMEOUT" option is also a bit perplexing; I'd expect a timeout to refer to an interval without response from the remote system, but it actually counts from the last input from either end.

The next thing that jumps out at me is the use of lock files, following the pattern /var/lock/LCK..device-file-basename. The intent seems to be to prevent accidental opening of the same port by multiple microcom instances or other programs (mgetty is mentioned again). That would be nice and all, but it's flaky in several ways. It requires all programs involved to voluntarily observe the same locking protocol, which seemingly requires root or uucp group permissions to work. If creating the lock file fails for reasons other than its already existing, eg /var/lock doesn't exist or permission is denied, the failure is silently ignored. Finally, there's no provision for cleanup, e.g. if the process is killed or the system is reset before it has a chance to remove its lock file then it will falsely conclude the device is locked on its next invocation and remain thus wedged pending manual intervention. This is one reason that flock or similar kernel-supported file locking is nice, though I'm uncertain if it would work on a device node (or TTY specifically).

Getting finally to the core of the program, its input/output loop looks questionable too. The design is single-threaded, multiplexing I/O on the various devices using the poll call. Nothing wrong with that in principle, but there's a couple pitfalls to watch out for and it seems the authors didn't.

If reading from stdin blocks unexpectedly after the poll reports it readable(v) then it would block not merely the outbound but also the inbound flow (from target TTY to stdout). Worse, if writing to stdout blocks at all, such as during an output flood over a slow link, it would likewise block also the outbound flow (from stdin to target TTY), blocking the human operator from controlling the remote machine at the very moment it's most needed.

The solution is to note that write can block just as well as read and that poll/select must always be paired with nonblocking I/O calls. Embarrassingly, Unix still doesn't have such calls, so the userspace programmer has to work around by setting a nonblocking mode of one form or another, which may affect other processes sharing the device, so requires cleanup and possibly further fallback tricks.

The signal handling looks correct, except on looking into the record_signo BusyBox function, the global variable it writes the signal number into is not declared volatile sig_atomic_t as required by the C standard for the behavior to be defined; the comment protesting that "all known arches use small ints for signals" is a far cry from a citation of some other standard that might define it. I suppose this would be worth a fix at the BusyBox level regardless of what we do with microcom.

On the bright side, I suspect all these issues can be resolved without too much difficulty, though it may mean removing features and changing the UI. And although the impression is that they got more wrong than right, fixing what's there would likely still be faster than writing a new applet from scratch.

  1. I initially called it "ser_utils.py" but renamed to "cu.py" and removed some overhead when it became clear that the other envisioned utilities hadn't been written and wouldn't benefit perceptibly from code sharing anyway. [^]
  2. That's Ian Lance Taylor, of GCC/Cygnus fame; UUCP refers to a larger suite of Unix-to-Unix serial communications programs. [^]
  3. By my reckoning, its tarball would take over 30 seconds to send, gzipped and base64 encoded, over a top-of-the-line 56kbaud modem. It's a suite of programs for moving bytes from point A to point B. What, is half a C compiler hiding in there?! [^]
  4. This is important because you can't use Ctrl-C or Ctrl-D, as it's designed to pass all input literally to the serial port so that those codes can be used to control the remote programs. [^]
  5. This is allowed behavior and can happen for a variety of reasons, such as TCP checksum failure or simply another process meanwhile consuming the available bytes. An argument could perhaps be made that it can't *reasonably* happen in the intended use case, though no such argument is to be found in the code. [^]

No Comments »

No comments yet.

RSS feed for comments on this post. TrackBack URL

Leave a comment

Powered by MP-WP. Copyright Jacob Welsh.