A perplexing curiosity from the depths of Scheme definitions

2024-11-14

A perplexing curiosity from the depths of Scheme definitions

Filed under: Scheme, Software — Jacob Welsh @ 02:22

I figure I know the Scheme programming language pretty well. Its fifth-revision spec famously fits in 50 typeset pages, I've been through it all and a few of its references too.⁽ⁱ⁾ The "metacircular evaluator" expressing its core interpretation rules in terms of an existing Scheme fits on a sufficiently generous blackboard. It has dozens of published or proprietary implementations, and I've implemented it a few times over myself at varying levels of detail and completeness, along with various application code for it. Still, I'm occasionally running into situations where I can only scratch my head and wonder how on earth they're supposed to be handled.

Consider the following code:

(define (whatsit) 1)

It defines "whatsit" as a procedure of no arguments which returns the number 1. Or to spell it out in more detail in the language of the spec: it establishes the identifier "whatsit" as a variable, binding it at top level to a location, in which location is stored a procedure, which procedure will return the value 1 when applied with no arguments.

(define (getit) (whatsit))

This one similarly defines a procedure of no arguments, which returns the result of calling whatsit. Thus, what exactly it returns will depend on what is stored at the location named by "whatsit" at the time; it could be reassigned later as another procedure (or even as some other value, though this would produce an error when getit attempts to "call" that value).

The next one, however, does not assign a new value to that location:

(define-syntax whatsit (syntax-rules () ((whatsit) 2)))

Instead, it reestablishes "whatsit" as a syntactic keyword (aka macro), binding it not to a location but to a new type of syntax. The interpretation of that syntax is that, within the scope of its definition, which is to say globally, any expression of the form (whatsit) is replaced by the literal number 2.⁽ⁱⁱ⁾

As programming languages go, Scheme is extremely liberal with respect to identifiers, having very little in the way of "reserved words": you're quite free to redefine builtin procedures, redefine variables as macros, and even redefine builtin syntactic keywords as variables or macros, locally or globally.⁽ⁱⁱⁱ⁾ This may produce some bizarre effects, though at minimum you're guaranteed that it won't break any of the other builtin items.

So after interpreting these three definitions in order, what then is the result of evaluating (getit): 1, or 2 ?

Note that the definition of getit is within the scope of the syntax definition, because they're both global. This would seem to argue for 2. If the "inner whatsit" remains as a variable reference to the procedure returning 1, then the name has not truly been re-bound at top level as a syntactic keyword; its old variable binding has merely been hidden away from subsequent code. And this would seem inconsistent with the well-understood behavior of redefining a top-level variable as a new value.

However, in typical implementation, the syntax definition does not exist at the time the procedure body of getit is processed - "compiled" or "macro expanded" or however it might be called - nor at the time that procedure is "closed" in its environment and assigned to its location, if the two events differ. The redefinition of whatsit changes not just the value that getit is to return but the way in which it's to obtain that value. It would seem that accommodating this interpretation would entail a significant performance penalty at the very least, as the type of the operator (variable or syntax) must be re-examined, and macros potentially re-expanded, each time a form is evaluated. It would also demand a much closer coupling between the macro expansion and evaluation mechanisms, which would throw a wrench in my own implementation. Those costs notwithstanding, I can't for the life of me imagine why anyone would want to write code that would depend on this behavior. I think I'm comfortable taking the bet that ~nobody else did this "the right way" either and so even to the extent I might some day care about porting Scheme code from existing systems, it's unlikely to be a compatibility problem.

Thus, Gales Scheme will return 1, I've set the test case accordingly and don't plan on making it a "known failure to raise awareness" ; if it's a bug in somebody's worldview then so far it's a feature in mine.

For all the historical roots, extensive research and appearances of continuity and current activity in the space, its reference library is rather sadly decaying and much of it never made the transition to proper computer text format either. [^]
If this sounds exactly the same as what the procedural version does except for the number, it's just because of the simplicity of the example. A procedure is evaluated each time it is called, with reference to the values stored in its environment at the time, whereas a macro can be thought of as performing a one-time transformation on the program code itself, the results of which may then be evaluated as usual any number of times. Thus, macros can do things that procedures can't (and vice versa, although a macro can always insert a procedure call). [^]
The current Gales Scheme releases don't implement local syntax binding constructs or allow variables to shadow syntactic keywords; the upcoming release corrects this among many macro-related issues. [^]

5 Comments »

To me it seems one of those cases where the "very liberal" could do with a warning at least that something is indeed being hidden by a new definition, regardless even of what is then considered the correct resolution for that getit.

What would a (define (grabit) (whatsit)) return if defined after all the 3 definitions you list?

Comment by Diana Coman — 2024-11-14 @ 09:04
@Diana Coman: grabit would return 2. The most relevant text I think would be (5.1):

A Scheme program consists of a sequence of expressions, definitions, and syntax definitions. ... Definitions and syntax definitions occurring at the top level of a program can be interpreted declaratively. They cause bindings to be created in the top level environment or modify the value of existing top-level bindings. Expressions occurring at the top level of a program are interpreted imperatively; they are executed in order when the program is invoked or loaded, and typically perform some kind of initialization.

It's a bit confusing but if the "declarative" things can modify existing bindings, the intent seems to be that all these things are executed in order. Thus the effects of the syntax definition are visible at minimum to subsequent expressions.

Comment by Jacob Welsh — 2024-11-14 @ 16:38
Just tested on MIT Scheme, which is a rather optimized implementation. It agrees that (grabit) returns 2 but in fact raises an error on the call to (getit): "Variable reference to a syntactic keyword: whatsit". So, it does not recompile existing procedures based on the change but it does validate that variables are still variables.

Comment by Jacob Welsh — 2024-11-14 @ 17:07
[...] the last one, the code involved here will be pretty simple, but it'll need a bit more context loaded up. Or [...]

Pingback by Another perplexing curiosity from the depths of Scheme macros « Fixpoint — 2024-11-26 @ 06:09
[...] This is for what looks like the second of two articles on Scheme uncertainties, questions. The first one was about - well, they both involve top-level definitions. The second one has more to do with the [...]

Pingback by That, in a word, is the corner that I'm painted into, even though this seems an incredibly obscure situation of having a top-level macro which inserts a top-level definition, which may or may not be previously defined and depends on the interpreter's inte — 2024-11-26 @ 21:53

RSS feed for comments on this post. TrackBack URL

Fixpoint

2024-11-14

A perplexing curiosity from the depths of Scheme definitions

5 Comments »

Leave a comment