Fixpoint

2021-11-06

MySQL in Gales 2: bundles of joy

Filed under: Gales Linux, JWRD, MySQL, Software — Jacob Welsh @ 05:21

The sarcasm will be contained to the title for this one.

We left off where I'd cleared out a neglected broken appendage along with the body it came attached to, namely MySQL's internal fatal signal interception and backtracing code, on the theory that if it's that broken we're going to need a full-scale debugger anyway. It wouldn't be long at all before that played out, too, but meanwhile there were further wrinkles in just getting the code to compile.

My senses were on alert for the presence of bundled third-party libraries, such as seen earlier in CMake. This is a common situation in large projects, because it makes sense from the perspective of an independent software publisher aiming to ship a release that works in a variety of environments where those libraries may be absent, or worse, present but incompatible with the version used by the developer. It's a sort of admission that there isn't really such thing as independence when it comes to software, only better or worser maintained illusions at most. At the same time, bundled libs are often the last thing you want when trying to assemble an integrated, coherent, efficient and maintainable system, for the obvious reason that the required resources are multiplied by however many copies of that almost-same code get pulled in by different sources. That's often, not always, and it's worth noting that Gales does not take the dogmatic approach of many earlier Linux distributions on this. That said, I'm not seeing any principled cases of bundling noted yet in the gports tree, just a few undesired ones that I haven't got around to fixing, and there's likely more unseen; so perhaps a bit more dogma is actually in order!

The most obvious bundlings in MySQL 5.6.38, sporting their very own CMake variables to select alternate versions, are:

  • libedit: from NetBSD, 1992-2011; imported 2003. Locally patched and merged with upstream updates a few times. Used for input line editing in the mysql shell, originally as an alternative to GNU Readline until the latter was dropped for licensing reasons.
  • libevent: from Niels Provos et al, 2002-2012; imported 2011 or earlier.(i) Locally patched then seemingly replaced with upstream updates a few times. A multi-platform async networking library, apparently including DNS and HTTP clients, previously seen under suspicion by association with tmux trouble.
  • yassl: from Sawtooth Consulting, 2003; imported 2005. Locally patched and merged with upstream updates rather frequently - as in 406 commits over 12 years - until in subsequent releases they gave up and switched to demanding OpenSSL from the system.
  • zlib: from Mark Adler and Jean-loup Gailly, 1995-2017; imported 2002. Locally patched then seemingly replaced with upstream updates a few times. Widely used implementation of standard lossless compression formats; used here for various compression and crc32 checksum features.

Beyond those bearing the impression of optionality are a few others that have been more deeply absorbed into the tree and date back at least to its BitKeeper import in 2000:

  • dbug: from Fred Fish of Enhanced Software Technologies, 1987. Used for build-time optional debug instrumentation.
  • regex: from Henry Spencer of University of Toronto, 1992-1994. A log message that seems to capture the situation best:
    The MySQL server uses Henry Spencer's library for regular expressions to support the REGEXP/RLIKE string operator. This changeset adapts a recent fix from the upstream for better 32-bit compatiblity [sic]. (Note that we cannot simply use the current upstream version as a drop-in replacement for the version used by the server as the latter has been extended to understand MySQL charsets etc.)

  • strings: from Richard A. O'Keefe, 1984, though most of its current contents appear to be later additions.

I was dubious right away about "libedit" as my own experience of the mysql shell had always been with one or another "readline" based version; it's hardly my problem that Oracle wants to ship prebuilt binaries while their lawyers don't like the GPL. Further, readline is already present in Gales, finding ample usage by the likes of bash,(ii) bc, gdb, sqlite and python. But it looked like they had made a full break with the ornery GNU: the only option on the matter exposed through CMake was whether to use the bundled or system version of libedit. Having bigger fish to fry, I took a note and let it proceed with the bundled one, figuring that would at least be the one most tested and likely to work (heh).

As to "libevent", it appeared to be used only in connection with one InnoDB Memcached plugin. Memcached being a whole separate thing that we're not currently interested in - nor do I anticipate that we'd likely ever be - I happily struck it from the CMakeLists and scripted the full removal of its code just to be sure. This has turned out fine so far.

In place of "yassl" or OpenSSL, if there had to be any SSL at all, I would of course want to use my existing LibreSSL gport; and that project did after all aim to stay mostly OpenSSL compatible. The usage of this library is in two main branches: just for basic cryptographic primitives (possibly just SHA1 for password hashing), and for the whole monstrous SSL/TLS "protocol" which it makes available for client-server communication. It appears the latter was formerly optional as at least some of the preprocessor logic to switch it off is still in place, but the current build system offers no such optionality. Fortunately adapting it to LibreSSL wasn't too bad; the one thing that came up was some CMake code demanding a specific major API version number as reported by OpenSSL, whereas LibreSSL had ditched all such madness, incrementing and freezing the number at 2. Thus all that was needed was to change the offending line to check for 2 instead of 1. I suppose deleting the check altogether would have worked just as well, but taken things out of "sed one-liner" territory and I wasn't inclined to do some more thorough patch at such an early stage.

We come finally to "zlib" and this should have been an easy one, right? I mean, it's a pleasantly small library with narrow and clearly defined purpose, that's been around for ages and used by just about everything. Out of a general stodgy conservative bias and lacking further information, I had used a rather old version for the zlib gport: 1.1.4, the last bugfix release in its series, released 2002 with predecessor in 1998. I was aware of subsequent security fixes to its "gzio" feature but dubious that the feature was actually needed (why not just pipe to gzip if you want a streaming interface?), so had simply removed the code, and this had pretty much worked out OK. MySQL however insisted on using a newer API addition, one compressBound() function and companion deflateBound(). Good call on my part, bad on theirs, and if you want the proof then consider this scar tissue from the 5.6.42 release notes:

The zlib library bundled with MySQL has been upgraded from version 1.2.3 to version 1.2.11. MySQL implements compression with the help of the zlib library.

The zlib compressBound() function in zlib 1.2.11 returns a slightly higher estimate of the buffer size required to compress a given length of bytes than it did in zlib version 1.2.3. The compressBound() function is called by InnoDB functions that determine the maximum row size permitted when creating compressed InnoDB tables or inserting rows into compressed InnoDB tables. As a result, CREATE TABLE ... ROW_FORMAT=COMPRESSED or INSERT and UPDATE operations with row sizes very close to the maximum row size that were successful in earlier releases could now fail.

Such as for example...? I guess they don't know, and so far I'm quite unconvinced that "were successful in earlier releases" shouldn't be properly read as "would have appeared successful but caused internal server memory or table corruption or other delayed failures in earlier releases".

So at the moment I can't use my system zlib; I may end up updating it to the latest once I can take a proper look at it. Since bumping the MySQL version to 5.6.45 at least I won't have the known-bad bundled compressBound() in play.

These items taken care of or at least discovered, I reached the point of a first successful build and the moment of truth. I typed in a "./mysqld", gingerly pressed Enter and... it instantly crashed and burned with a Segmentation Fault.

C programming, y'know?

~ To be continued ~

  1. The 130k-line commit adding it among other things under plugin/innodb_memcached bears the description of simply "Rebase mysql-5.6-labs-innodb-memcached based on mysql-5.6.2-m5-release branch", these unresolved identifiers suggesting lost history. [^]
  2. Albeit bundled; as I understand, readline began its life as the input editing code in bash and then branched off as a reusable item. The default ksh shell in Gales provides its own input editing. [^]

5 Comments »

  1. [...] Continuing from where the advanced quality assurance process of "start it and see if it blows up" had returned in the affirmative, it was clearly time to pay an old debt and get a proper debugger working on Gales. [...]

    Pingback by MySQL in Gales 3: debug, rebug « Fixpoint — 2021-11-07 @ 08:38

  2. The sarcasm will be contained to the title for this one.

    Cooler read, but just as informative.

    Having bigger fish to fry, I took a note and let it proceed with the bundled one, figuring that would at least be the one most tested and likely to work (heh).

    Alright. That "heh" has me hanging off a cliff.

    Memcached being a whole separate thing that we're not currently interested in - nor do I anticipate that we'd likely ever be - I happily struck it from the CMakeLists and scripted the full removal of its code just to be sure. This has turned out fine so far.

    Alright, sounds fine.

    Thus all that was needed was to change the offending line to check for 2 instead of 1. I suppose deleting the check altogether would have worked just as well, but taken things out of "sed one-liner" territory and I wasn't inclined to do some more thorough patch at such an early stage.

    So at the moment I can't use my system zlib; I may end up updating it to the latest once I can take a proper look at it. Since bumping the MySQL version to 5.6.45 at least I won't have the known-bad bundled compressBound() in play.

    Alright for now, like you said, bigger fish to fry.

    Comment by Robinson Dorion — 2021-11-09 @ 17:35

  3. That "heh" has me hanging off a cliff.

    Keep holding on!

    Actually, there was already some libedit drama at this stage of the tale, just getting it to build, that I missed. First it had some shell script generated code thing that misfired; here was my patch from the time in full:

    pdksh quirk perhaps: "set - x y z" leaves the "-" as $1 (and there's no reason to have it here).
    
    Caused headers to be generated with broken, colliding ifdef guards.
    
     -jfw
    
    --- a/cmd-line-utils/libedit/makelist.sh
    +++ b/cmd-line-utils/libedit/makelist.sh
    @@ -62,7 +62,7 @@
         ;;
    
     -h)
    -    set - `echo $FILES | sed -e 's/\\./_/g'`
    +    set `echo $FILES | sed -e 's/\\./_/g'`
         hdr="_h_`basename $1`"
         cat $FILES | $AWK '
     	BEGIN {
    

    Then in libedit/chartype.h there was some ifdef madness for unicode detection that deliberately bombed because it didn't know about musl, the fix being to delete it on the assumption that indeed musl uses unicode for its "wide characters" rather than "some other funky encoding"; the offending part:

    /* Ideally we should also test the value of the define to see if it
     * supports non-BMP code points without requiring UTF-16, but nothing
     * seems to actually advertise this properly, despite Unicode 3.1 having
     * been around since 2001... */
    
    /* XXXMYSQL : Added FreeBSD & AIX to bypass this check.
      TODO : Verify if FreeBSD & AIX stores ISO 10646 in wchar_t. */
    #if !defined(__NetBSD__) && !defined(__sun) \
      && !(defined(__APPLE__) && defined(__MACH__)) \
      && !defined(__FreeBSD__) && !defined(_AIX)
    #ifndef __STDC_ISO_10646__
    /* In many places it is assumed that the first 127 code points are ASCII
     * compatible, so ensure wchar_t indeed does ISO 10646 and not some other
     * funky encoding that could break us in weird and wonderful ways. */
    	#error wchar_t must store ISO 10646 characters
    #endif
    #endif
    

    Comment by Jacob Welsh — 2021-11-09 @ 20:17

  4. pdksh quirk perhaps: "set - x y z" leaves the "-" as $1 (and there's no reason to have it here).

    Ftr, there is at least a theoretical reason and the better fix would have been replacing the single with a double dash.

    Comment by Jacob Welsh — 2021-11-09 @ 20:28

  5. [...] [...]

    Pingback by #jwrd Logs for Nov 2021 « Fixpoint — 2021-11-10 @ 00:07

RSS feed for comments on this post. TrackBack URL

Leave a comment

Powered by MP-WP. Copyright Jacob Welsh.