Fixpoint

2021-11-07

MySQL in Gales 3: debug, rebug

Filed under: Gales Linux, JWRD, MySQL, Software — Jacob Welsh @ 08:38

Continuing from where the advanced quality assurance process of "start it and see if it blows up" had returned in the affirmative, it was clearly time to pay an old debt and get a proper debugger working on Gales.

In that department the GNU debugger GDB, like its sister projects GCC and Binutils with which it cooperates, is unchallenged as the premier solution in the Linux environment, with support for a wide range of languages on the frontend and CPU architectures on the backend, and a comprehensive command set for inspecting and modifying the internal state of running processes as well as analyzing crashed ones post-mortem. I went with version 7.12.1 with minimal deliberation and for reasons I can't recall; what I can observe now is that it's the last in the 7.x series, dating to 2017, after which the build requirements were upped to "A C++-11 compiler (for instance, GCC 4.8 or later)"; although it's unclear whether our GCC 4.7.4's almost-complete C++11 support would in fact suffice. The larger long-term consideration, I figure, will be picking a version close enough to our chosen GCC and Binutils that they can be re-integrated back into a unified tree, deduplicating their not insubstantial shared components.(i) On the bright side, the debugger being a late-stage development-time tool that doesn't get further built upon or incorporated into things(ii) I expect we can fine-tune the version choice with minimal downstream damage when the time comes.

I got the gport working after a few minor hacks not quite worth going into - build system features not behaving as advertised basically - except to note that Gales still needs a "makeinfo" so I can stop relying on precompiled Info pages from release tarballs, as well as an "info" program to actually browse the poor things. Either that or a decision to go for HTML conversion or some such. I disabled various obscure extensions (python, guile and xml in my gdb?? Nothing good can come of this), ensured it was rebuilding its flex/bison outputs properly from source, and got it to use the system readline, while system zlib was foiled again by compressBound().

Putting my new gdb to use, I soon found that musl had been built without extended debug info (the gcc -g flag), putting a limit on how much intelligible output you get if the backtrace hits a libc function. Unlike autotools, the musl build scripts don't include this by default, but there was a helpful option at the ready to enable it. The size penalty wasn't bad at all - it usually isn't with C; it's C++ where you tend to get assaulted with order of magnitude fatter binaries - so I'm going to keep it in.(iii)

With the right tools in place, the previously opaque mysqld segfault became transparent. It was failing to access its data directory - reasonably enough, as I hadn't created any such directory - but then crashing when trying to report the error. The culprit was the "standard" library function strerror_r() and the situation is so ugly that I'd like to quote for the record what the glibc-flavored manpages have to say on the matter:

The strerror_r() function is similar to strerror(), but is thread safe. This function is available in two versions: an XSI-compliant version specified in POSIX.1-2001 (available since glibc 2.3.4, but not POSIX-compliant until glibc 2.13), and a GNU-specific version (available since glibc 2.0). The XSI-compliant version is provided with the feature test macros settings shown in the SYNOPSIS; otherwise the GNU-specific version is provided. If no feature test macros are explicitly defined, then (since glibc 2.4) _POSIX_C_SOURCE is defined by default with the value 200112L, so that the XSI-compliant version of strerror_r() is provided by default.

The XSI-compliant strerror_r() is preferred for portable applications. It returns the error string in the user-supplied buffer buf of length buflen.

The GNU-specific strerror_r() returns a pointer to a string containing the error message. This may be either a pointer to a string that the function stores in buf, or a pointer to some (immutable) static string (in which case buf is unused). If the function stores a string in buf, then at most buflen bytes are stored (the string may be truncated if buflen is too small and errnum is unknown). The string always includes a terminating null byte ('\0').

In other words, on a glibc system there's two incompatible functions with the same name; you have no idea which you're getting until you examine both the glibc version and the cocktail of macros defined at the point the header is included, which may come either from the source file in question, from some other header, from the compiler defaults according to the standards mode it's invoked with, or from the user or build system; the "portable" one won't work on older glibc; and you can't use the plain old strerror() in multi-threaded code.(iv)

By contrast, from musl we find some almost breathable air:

I have found that _GNU_SOURCE is defined when I build snort on OpenWrt (OpenWrt uses musl). This causes the build to fail because snort expects strerror_r to return a char * since _GNU_SOURCE is defined.

Is this a bug in musl?

No, musl explicitly does not suppore the GNU interfaces that conflict with standard interfaces by the same name.

MySQL tried to deal with the mess in much the same way: assuming GNU behavior when _GNU_SOURCE is defined. But this doesn't actually make any sense, because the idea of these "feature test macros" is that they're to be defined by the programmer to communicate the desired standard to the system headers - not the other way around!

I could have dealt with this specific case in one way or another, but since MySQL's setting _GNU_SOURCE in the first place was based on an assumption that Linux means Glibc and I didn't know what else might be broken in the same vein, I thought it best to cut to the root or at least set things on firmer ground by removing the _GNU_SOURCE definitions (there were a couple, all done as special cases for Linux), indicating that we want to use the portable code wherever there's a choice and steer clear of further GNUisms. In hindsight I still suspect this was the right choice, but it did come with an unexpected cost of triggering - in combination with further multi-platform hacks and bad assumptions, naturally - an even more subtle and latent failure down the line.

~ To be continued ~

  1. I found Sokolov's An Introduction to the Cygnus Tree (archived) an interesting historical read, though I lack enough of a clue to weigh it against the offense-taking replies from people who were there at the time, or the distinct possibility that the hallucinations of greatness were strong on both sides. In any case the current practice is that Binutils and GDB are developed mozillated in a single Git tree while GCC retains its own with some degree of overlap. [^]
  2. At least in the C world. Lisp takes a very different view on this. [^]
  3. It might not be apparent how one goes about applying changes to musl or other base system components on Gales, as they're outside the gports system. My current practice is to open up the full Gales repository, which I'll have around anyway as that's where the gports are, re-run the bootstrap process (which I now have largely automated), and manually copy the results to the live system. For musl that literally just means cp libc.a /lib, while for binaries that are in use (mapped into running process address space) you may need to use "install", as in install busybox /bin, or otherwise do an atomic replacement with "mv" or "ln" because "cp" will attempt to overwrite the destination in place which is blocked by the kernel (you'll get a "Text file busy" error; I gather "text" is used in the sense of an executable's "text" section i.e. not proper text at all but machine code.)

    I suppose a good old "make install" would be in order here, but it'll take some thought and this just hasn't come up that much yet. [^]

  4. Why not - why can't we just require strerror to use the obvious and perfectly thread-safe mechanism of an array of constant error strings? I expect it's to allow for internationalization, you know, so that subtle terms of art like "Text file busy" can be butchered into the vernacular of illiterates who have no use for them anyway.

    On the other hand, from the purely technical perspective we can observe that parallel programming, while tricky in the best of conditions, is especially treacherous in C because of manual memory management, hence all these "_r" variants getting bolted on in later years. [^]

4 Comments »

  1. I went with version 7.12.1 with minimal deliberation and for reasons I can't recall;

    Is this because it was started in 2020 and you're writing about it now ?

    The larger long-term consideration, I figure, will be picking a version close enough to our chosen GCC and Binutils that they can be re-integrated back into a unified tree, deduplicating their not insubstantial shared components.

    Alright. That Sokolov article was a trip. With such inefficient process, not wonder buggy code is emitted.

    Either that or a decision to go for HTML conversion or some such.

    I think this is the smarter long-term play, i.e. rather than supporting all the various was manuals and documentation has been emitted historically, convert them in HTML. This brings to mind a comment from MP I didn't manage to track down about the pain of traditional documentation and the lack of footnotes and the mp-wp select.

    Thanks for explaining the current process to replace pieces in the base system in footnote iii.

    since MySQL's setting _GNU_SOURCE in the first place was based on an assumption that Linux means Glibc and I didn't know what else might be broken in the same vein, I thought it best to cut to the root or at least set things on firmer ground by removing the _GNU_SOURCE definitions

    Alright, makes sense to me.

    Comment by Robinson Dorion — 2021-11-10 @ 13:35

  2. Is this because it was started in 2020 and you're writing about it now ?

    This was from August 2020, yes. As to why minimal deliberation, I don't know, perhaps we weren't communicating well at the time. Looks like that had improved since earlier in the year but still it was a bit of a sleepy & sad time with retreats back to shadows, old things, old places and old ways.

    With such inefficient process, no wonder buggy code is emitted.

    I'm not sure about a direct link from inefficiency to bad code. Perhaps more like both coming from a common root of irresponsibility, or of people just not being very good. And there's use of automation as a drug (suppressing pain signals).

    rather than supporting all the various ways manuals and documentation has been emitted historically, convert them in HTML.

    There are tools for this, might even come standard with the texinfo package; example of typical results. (Oh hey, there's Fred Fish again.) There are manpage collections also rendered online in various places so converters from that format might be usable too. Not sure to what degree these produce *maintainable* output for doing a full one-time conversion as opposed to just being an output stage in a pipeline that maintains the original source format.

    djb also went this way - observing the party was moving to the web - which is why there are no man pages in daemontools or djbdns though there are for the earlier qmail.

    a comment from MP I didn't manage to track down about the pain of traditional documentation and the lack of footnotes and the mp-wp select.

    I distinctly remember this and also sadly can't find it in the logs, in Fixpoint comments or in googling a couple other blogs.

    MP-WP conversion might be a bit more involved than mere HTML conversion. And traditional documentation does have some nice properties that I'd really not want to lose, that might take some thinking. Like how it comes standard with the code (or in some separate file but in any case easy to grab in full), can be maintained in lockstep with the code, and is readily indexed and navigable on the target system, in text mode, in a revision that matches the installed code. (Incidentally Oracle's let these rot regarding the MySQL reference manual - the sources and build tools for current versions are not published that I could find.)

    Comment by Jacob Welsh — 2021-11-11 @ 01:20

  3. (Oh hey, there's Fred Fish again.)

    I better quote or that ref will get lost to the shifting sands:

    This is the Tenth Edition, for GDB (GDB) Version 12.0.50.20211110-git.

    Copyright (C) 1988-2021 Free Software Foundation, Inc.

    This edition of the GDB manual is dedicated to the memory of Fred Fish. Fred was a long-standing contributor to GDB and to Free software in general. We will miss him.

    Comment by Jacob Welsh — 2021-11-11 @ 01:25

  4. a comment from MP I didn't manage to track down about the pain of traditional documentation and the lack of footnotes and the mp-wp select.

    Rereading processes eventually found it and now finally bring it back to where it had been wanted: http://thetarpit.org/2020/a-journey-through-the-gales-installation-process#comment-238

    Comment by Jacob Welsh — 2022-06-18 @ 21:25

RSS feed for comments on this post. TrackBack URL

Leave a comment

Powered by MP-WP. Copyright Jacob Welsh.