Fixpoint

2024-11-26

That, in a word, is the corner that I'm painted into, even though this seems an incredibly obscure situation of having a top-level macro which inserts a top-level definition, which may or may not be previously defined and depends on the interpreter's interpretation of the standard as to whether all variables are bound to a location.

Filed under: Historia, JWRD, Paidagogia, Scheme, Software — Jacob Welsh @ 21:52

After a first drafting session for my last article on the ambiguity of identifiers in macro-inserted top-level definitions in Scheme, I seemed to have become stuck on the introductory matter and lost sight of where I was going with it. Robinson was kind enough to give me an audience to talk through it all afresh, and it turned out quite the hefty impromptu exploratory lecture.

I ended up transcribing it in full before I was up to facing the article head-on again; after completing the latter, it doesn't even seem like that much information and I'm not sure what the hangup was. Perhaps an unmet need to explicitly gather various previously scattered bits, which had previously come together only subconsciously in the thick of the hacking process, and with a self-reinforcing emotional side of frustration with the situation.

Here for your vicarious enjoyment is a front-row seat to the full flow, edited to lighten the load of verbal fillers, hedging words,(i) and real-time self-corrections.

~~~

J: This is for what looks like the second of two articles on Scheme uncertainties, questions. The first one was about - well, they both involve top-level definitions. The second one has more to do with the macro system. Nevermind the first one, because it's not all that related.

It basically comes down to just one or two lines of code, and asking what should go there: it's not quite clear why what's there is there; it seems to work but doesn't seem quite theoretically justified and then when I look into it, it explodes on me. The questions and... dunno if confusions, but complications at least; and yet it doesn't seem like such an important thing. Like it's a pretty obscure situation where it would matter, and it comes down to some fine details of interpretation of the standard because it's not really using it the intended way, and it wouldn't be smart to rely on a particular behavior in that situation. At least that's how it seems at the moment.

R: How do you know what is the intended way?

J: Mm, reading between the lines. What's the spirit of the law?

R: Is there a Federalist Papers for the Scheme R5RS?

J: Well there's the Lambda Papers. Those are earlier though.

So, I wrote a little bit about the macro system in the last article, on what are macros. A macro is similar to a procedure in that it's a means of abstraction: a means of writing some code and then reusing it in multiple places by reference. But the way the macro works is pretty different from a procedure; basically it processes source code into new code, every time it's used. And macros appear in various programming languages, even C through the preprocessor. In that case they're not that well integrated with the language proper, they're really just making textual transformations on the code before it even gets to the compilation stage. But it allows you to do things like define some particular constant or expression which then can be automatically copy and pasted various times, and in a bit of a structured way (like, you can have macros that take arguments).

But Scheme has the macros more closely integrated with the language itself, with the compilation. And one of the particularly interesting features of the R5RS standard, of the high-level macro facility - I think it was first introduced in R4 Scheme but then became mandatory in R5 - is what they call hygienic macros. And this deals with a couple different problems that come up with trying to use macros in practice, where there's basically two sides of it, two directions that it goes. The general issue is capturing of variable names.

Let's say you have a macro that adds some code that defines a temporary variable name for its own purposes. Well if this were a procedure that's all well and good, it would just be a local variable and it doesn't affect the calling code at all. But in a macro, because it's mixing its own code in with the calling code - with the user's code - if the macro defines a new variable then that variable would be visible in the code that's using it. And so you have name conflicts there. Like, if the macro defines a temporary variable that shares the same name as some variable you used in your code, then the macro use would capture that variable so it doesn't refer to what it was intended to refer to. And this creates the need to be very careful about what's in the macro and what's in the macro use, and you can't really separate the two, you have to keep them both in mind when you're working with it.

The other situation, then, is when the macro refers to some variable in its own environment, and that same name is used in some outer scope by the calling code, so the macro's use of that variable does not refer to the intended thing but rather to the user's thing. So a macro capturing a name used by the caller, or the caller capturing a name used by the macro.

R: Who gets priority there, preference?

J: Either one of these situations would be a problem if it's not specially handled.

R: So is that specially handled in the user code? Or in the macro, or interpreter, or compiler code?

J: Well with the hygienic macro system, the idea is that both of these issues are resolved because the compiler will handle it in a particular way, which I haven't got into yet. But in order to appreciate what that mechanism is you have to first appreciate what the problem is.

You asked which takes precedence; it depends on how the code is structured. When you have a binding construct like a LET or a LAMBDA (a procedure), that creates a new scope, and so the names defined by that construct are defined within the region of that code block; you can have the same name defined at multiple levels and then when you refer to that name, it refers to the innermost level at which it was redefined. So if the name is defined at one level and the macro is called inside that, then that name could capture the macro's use of the same. Whereas if you have the macro use, and in its arguments a reference is made to a name defined by the macro, then the macro is creating that outer scope which is capturing the name in the inner code.

So to look at the standard here... there's basically these two bullet points that explain how this is supposed to work. "All macros defined using the pattern language are hygienic and referentially transparent and so preserve Scheme's lexical scoping."

And these two terms basically correspond to those two directions that I told you about. So the first one: "If the macro transformer inserts a binding for an identifier", a variable or a keyword - where the macro is creating its own temporary variable - "the identifier will in effect be renamed throughout its scope to avoid conflicts with other identifiers. Note that a define at top level may or may not introduce a binding", see some other section. And this is where eventually the fine print is going to come in. But the basic idea here is that, say the macro uses the name "temp" for its temporary variable, to which it assigns let's say the result of the expression that the user passed in; then when the macro use is transcribed - that is, is expanded, the user's code is inserted into the macro, pasted in basically - it will be as if the macro did not use the name temp but temp12345 or some generated unique name.

R: Mm.

J: And each time the macro uses that name, it will be renamed in the same way, so its uses refer to the same thing. But if the user uses that same name "temp", it won't conflict.

R: Mm, that's smart.

J: Good, so that's the easier one. You can say maybe it's possible that the user code accidentally uses the same temp12345 but you can make it so it's unlikely or whatever.

R: Result of a keksum ?

J: Mhm. [chuckles] My previous interpreter did not implement these hygiene rules, so there were some cases where I had to be careful about this sort of thing, and when I used a temp name I just in the code wrote temp-abcdd, like a random looking name just to make it unlikely to capture anything. Sort of doing this manually.

The second part, then, is that "if the macro transformer inserts a free reference to an identifier" - and a free reference means it's referring to a variable that is not defined within the scope of the macro itself, it's defined at some higher level; the reference "escapes" that particular scope, so it's said to be "free", as opposed to a "bound" variable. If the macro inserts a free reference to an identifier, "the reference refers to the binding that was visible where the transformer was specified"; that is to say, where the macro was defined, "regardless of any local bindings that may surround a use of the macro." So this is similar to the way that variable references are supposed to work in Scheme, where if a function refers to a variable that's not defined within that function itself, then it refers to the innermost definition in the region of code where that function definition occurs. So you can tell just by reading the function definition in its context what that variable refers to, regardless of what variables may be defined at the place where the function is called.

This brings that same property to macros, so that the reference refers to the binding visible where the macro was defined, regardless of any local bindings that may surround the use of the macro. Now this one is a little less clear how you would implement it. Because you can't just rename, generate a random name for that reference, because it is actually referring to a variable that exists in the macro's definition environment. You can't just rename the reference because then it wouldn't refer back to what it's supposed to. And... maybe you could rename it in the user code or something like that, but not sure about that, it'd probably have some other problems.(ii)

The standard then doesn't tell you how to implement this, but they do have some references and one of those was a paper by I guess one of the authors of the standard,(iii) explaining how this is done because a lot of people had questions about it. Basically the way that both of these problems are addressed is through a more general sort of renaming, where instead of identifiers just being symbols (words), they become compound objects, like a list or vector or something, and the compiler or interpreter or system as a whole has to be modified to accomodate this more general sort of name, this more complex kind of name which may have multiple components to it.

For the first one you can create a generated name by taking the original name and putting it in this container structure with some unique identifier, reflecting the use of that macro. So now this overall object as a whole is clearly distinct from the original name, but it can be matched up with other occurrences of itself. Then where this mechanism really comes in is for the free references part. All the identifiers inserted by a macro, in its macro use, are renamed in this way, and so when one of these identifiers is resolved, if it's defined by the macro then the use will have the same renamed structure as the point where it's defined in the macro, because they're subsituted in the same way. Whereas if it was not defined by the macro, that is, it was coming in from the macro's scope definition environment, then that complex name will not be found in the environment [of use] and then it will be unpacked to look at the original, unrenamed name, and that will be resolved according to the macro's environment of definition, rather than point of use. So there's some other pointer in the system there to keep track of where the macro was defined, so you can track it back.

So with this shift in worldview of what names even are, you gain more flexibility to do this, and at the time the macro is transcribed, the system doesn't need to know or care whether this is a bound name or a free name or in what context or whatever; it applies the same renaming to all the identifiers inserted by the macro (as distinct from identifiers in the user code that the macro is referring to). And then when the compilation or interpretation proceeds to the point of needing to see what these generated names refer to, then it will find in the environment whether this was defined here or whether it needs to be unpacked and looked somewhere else.

So that's the long introductory section and I have a lot of that written up already.

But now the question comes up of where exactly... so in the interpreting or compiling at various points you now have to handle these generalized name objects instead of simply symbols. And in what places then do you compare these as structures as a whole, or where do you unpack them to look at the name that's within? And because I'm implementing this in the compiler - a Scheme program that's separate from and running on top of the interpreter - the idea is that it's supposed to transform all of the code into some lower level or simpler, more primitive Scheme code which then is run directly by the interpreter.

R: So the compiler takes Scheme code and turns it into more efficient Scheme code that can be run by the interpreter?

J: Yeah, maybe not more efficient(iv) but more low-level, like using just a few building blocks provided by the interpreter, and so it provides the more advanced functionality. Such as macros, and...

R: And that was all opened up by this latest work? Or this has been...

J: That's sort of been the case, I mean I had a macro system before which was how some of the derived syntax types were implemented. For example, there's no need for AND and OR and COND and these various things in the core Scheme language, all you really need is an IF-construct and the other things can be defined in terms of that, as macros.

R: Sounds cool.

J: In the syntax library. So this has been with the interpreter all along, but it didn't have these hygiene properties, so now I've added that in, and buffed it up in some other ways.

Where were we?

So yes, because I have this separation, there's the compiler and there's the interpreter, and I try not to... there has to be some awareness between the two; the compiler needs to know how the interpreter works, like what kinds of forms it can evaluate, but I try not to have too much coupling between them. Like I don't want to have to change the interpreter too much in order to implement new features in the compiler.

One key task that is done by the interpreter is resolving global variable names - global variable references. It has a runtime environment structure that keeps track of all the names that have been defined and the values that have been stored in them, so if there's a reference to a variable which is not found in a local binding construct, then that means it's a global variable reference, something defined at the top level of the program. And these definitions can happen in any order as long as the reference to a variable is not *evaluated* before it's been defined. But you may have to compile that reference before it has been defined.

R: Hm.

J: So this for example is how you'd write a recursive procedure, or... you can refer to some variable that you know will be defined in some function that you're writing; the order of the definitions doesn't matter that much. But when the code that refers to that variable is evaluated, like is actually executed, then the binding has to exist, otherwise it's an unbound variable; it's an error. And so at compile time you don't necessarily know what all variables are going to exist in the global environment; this is seen at runtime. But the interpreter assumes that all names are straightforward symbols, and that's how it does its lookups in the environment. So when the compiler sees that it's creating a reference to a global variable, it needs to unpack that and not have some complicated renamed object but just a plain name, so that it will refer to the correct thing when it's interpreted. Or so it seems.

The code that does that is in here, when you're creating global denotation - had to introduce this concept of "denotations" to keep straight the different kinds of names, macros and macro-inserted identifiers and things. So this is when you have a statement that defines some global variable, it's going to add this denotation to the compiler's representation of this global environment. Which may not be strictly necessary at the moment, because there is this lazy lookup; that's what I was describing before where a name doesn't have to exist until it's actually used; the resolution of that name is deferred until the point of use, so it's done lazily. But then once the lookup is done the first time then it sort of gets looked up and it's more efficient thereafter.(v)

So I have this function to create a global denotation; it creates this list with the symbol "global" to denote what it is, and then it creates this global variable reference object. The treatment of variable references in the compiler at the moment is a bit of a holdover from the previous system, where there's this special type of object called a "variable reference" which is supported directly by the interpreter, which was implemented as an optimization so you don't have to repeat these variable lookups every time they're used. And so I'm still using that mechanism even with this new expanded compiler. Then in the next version that becomes a bit less of a special case and it's just another kind of compiled expression type of which there are various. But for now I have these variable reference objects and there's this global-var-ref function, which is provided as a builtin to create such objects. And it needs to take a plain symbol, so I unpack that using the original-name function on this name object that was passed in; original-name is a recursive function because these renamed objects can be recursively renamed. A macro can insert another macro use, or even another macro definition. And so this renaming... you might have a rename of a rename of a rename of a variable.

R: Turtles all the way down.

J: Right. So the original-name function will fetch the name that's inside, check if it's a plain name or a rename, continue fetching inside until you get down to the root.

But that's a bit of a dubious process really and it sort of needs justification of why you want to go all the way down, like it's such a special case. Why is that the right thing to do? So that's the flag that came up when I was reviewing this. Because that's not really the normal process; that's not the way that renames are usually supposed to be interpreted. You're supposed to... if you don't find it then you jump to the macro's environment of definition and you take the prior name and you repeat the process. So this seemed like some sort of special short-circuit, where we're going all the way to the root of the name in order to create this global variable reference. So how might that go wrong?

Well, one thing that came up was with these top-level definitions. At this point I want to pull up the code, not in the diff format but in its state so as to take a closer look at it; so this is in... ok it's in library, global-denotation.

R: I'll get a refill on water.

J: Alright, can I pause this? Looks like not.(vi)

Here's the same code we were looking at over there. Where's this global-denotation function used? Only two points, where we're compiling a definition, a "define" statement. In one case we've already fished out the original name, and are then explicitly using that for the global denotation, so we wouldn't even need that in the function itself, but this perhaps also needs some justifying. And in the second case we haven't done that, so that's why it does the original-name itself. And what is this second case?

Well it's the fourth of a series of possibilities... basically the define statement complicates a lot of things, it's not really a core part of the pretty, theoretical language, but it's a very useful thing in practice. There are both top-level and internal definitions, which have slightly different rules; top-level definitions to some extent are the same things as an assignment statement, as a SET!. But with a SET!, you're not allowed to assign a variable that has not yet been defined, that is unbound. Sort of the same as referring to a variable that's not been defined. But you are allowed to DEFINE a variable, obviously, that has not been defined. Whereas if it has already been defined and then you say DEFINE on it again, then you're simply reassigning a new value to that same variable.

So at the high level here there's two types of definitions, a top-level and an internal; they're handled in different ways; the top level at this point is the more complicated. So we look for the top-level environment frame that it's going to be inserted into, to verify that it's an interactive environment - that it's somewhere that we're allowed to make new definitions. We look up the binding for that name in that environment, and this is what does that whole macro rename unpacking process to resolve it to the correct denotation, whatever that may be.

First case is if that didn't find anything; that means we have a DEFINE statement at the top level of the program for a variable that has not previously been seen by the compiler. So we are going to add a new binding for it, in the compilation environment. Because the compiler is separated from the interpreter, they each have their own view of the environment; the interpreter has the runtime environment, that changes as the program executes according to what it does and the way in which things are called; whereas the compilation environment is a similar sort of structure, but each piece of code is only compiled once although it can run multiple times. And so the compilation environment is not to keep track of the values stored in the variables, it's really just keeping track of what the different names are. Are they variables, are they syntactic keywords, whatever. Are they macros?

Alright so, I have a comment saying that adding this binding is not strictly necessary at present, but it does allow to share the new variable reference object. So perhaps this whole case could be removed. Now if the binding was found, for the name that we're redefining, well then we *are* redefining it, it's an existing variable and the code here does nothing; there's nothing to change in the compilation environment. So this name already referred to a global variable, now we're redefining, it still refers to a global, there's nothing that needs to change.

However, if it is a special - that means a special built-in form, like LAMBDA, IF, DEFINE, etc.: these builtin syntactic keywords - and it is the special for DEFINE... okay this is just an error check, doesn't matter that much at the moment, for our purposes right now.

Okay otherwise, that means the variable was found in the compilation environment, but it was not a global variable. That means that it's either a special or a macro keyword; it's a kind of syntactic keyword whether builtin or user-defined, and so we need to change the compiler's view of that variable to reflect that it is no longer actually a variable but it's a macro. Or whatever. Yeah, a macro.

R: Or a special.

J: Ah, no no no - this is for define, so it WAS a macro or special and now it will be a variable.

R: Mm.

J: So Scheme is very flexible in this way, you can redefine things all over the place; and so in this case you are introducing a top-level definition that uses the same name as, let's say, some macro previously defined. And so, maybe we put a warning in here, maybe not, sort of depends on how we want to look at things, but it is allowed; and the compiler now needs to know that this is no longer a special keyword, because special keywords change the way the compiler compiles it. It needs to know that it's just a variable and so it'll just compile it as a variable reference.

So we alter that binding to be creating this global denotation object for the name. And what is the name? Well the name is simply... the name being defined, according to the DEFINE statement we're looking at; but that may be a macro-renamed complex name. But by the time it's got up here to the top-level environment, basically this environment lookup process here has resolved the true meaning of the name. Hm, I suppose we could have this take the CAR of the binding to get that original name... sorta the same thing either way. But since the name variable here has not been unpacked, unboxed, etc., this function will need to do that, to get to the symbol at the bottom of it.

Or does it? Well, basically, is this supposed to refer to the original name, or to the however-many times renamed version? When would it make a difference? Well, this is a DEFINE statement; the statement may have been entered directly in the code, or it may have been inserted by a macro use. If it was inserted by a macro use, then we can consult that text in the standard that I read earlier to understand what the name is supposed to mean. The name inserted by a macro use, what's it supposed to refer to?

Well, "if a macro transformer inserts a binding for an identifier" is one case; the other case is the macro transformer "inserts a free reference to an identifier". So in order to determine, we have to figure out which of these is at work. Are we inserting a new binding for an identifier, or are we inserting a free reference to an existing identifier?

Because if it's the first, then the identifier needs to be renamed to avoid conflicts with other uses of that same name. If it's the second, then it needs to refer to the original meaning of that name. In this case, in top-level definitions, these are sort of opposite behaviors. Whereas in the more normal case where you're not redefining existing top-level keywords, there's no such confusion. Either this is a binding added by the macro, or it's not. But at top level, we get to this fine print here, that DEFINE at top level may or may not introduce a binding, see this other section. Well we'd better look at that.

Top level versus internal definitions... but the part that they're talking about is... well okay, "at the top level of a program a definition has essentially the same effect as the assignment expression, if the variable is bound." So if the variable's already defined at top level then redefining it is the same as assigning it using the SET! keyword (set, exclamation mark). If the variable is not bound, however, then "the definition will bind the variable to a new location before performing the assignment, whereas it would be an error to perform a set on an unbound variable", like I was describing earlier; however "some implementations of Scheme may use an initial environment in which all possible variables are bound to locations, most of which contain undefined values. Top-level definitions in such an implementation are truly equivalent to assignments."

How is this possible, how could all possible variables be bound to locations, when there are infinite possible names? Well basically it would work as if this were the case and any variable that you use automatically has some location associated with it. So, merely by saying set A equals 2, merely by compiling that statement, the name A has already been assigned a location, which is conveniently available to assign a value.

So, there are some reasons that you might implement it this way. It's sort of the classic Lispy way [where symbols themselves are compound objects with a "value slot" and possibly "function slot"]. My implementation does not take that approach, so I do recognize a difference between the define and the set, so it is an error to set a variable that has not previously been defined.

R: Mhm.

J: I just find it more tidy, more rigorous. That's the interpretation that I've taken all along. But so this is a possibility.

Going back to where we were, "a define at the top level may or may not introduce a binding", this is section 5.2. So, when does a define at top level introduce a binding? Well if you have that other kind of implementation where all names are implicitly bound, then a define at top level will never introduce a binding. But if you have the other interpretation, then a define at top level introduces a binding if the name was not already bound.

So then the resolution between which of these paragraphs applies depends on whether that variable was already bound at the top level, or whether this is defining a new variable. If it was already defined, then we take the second approach, and so the name refers to the binding that was visible where the transformer was specified, hm... we insert a definition for a name that was already defined, and so the name refers to that top-level variable, that previously existed. Because that's what it meant in the macro's environment of definition. That means then that we do NOT want to use a renamed thing because we want to use a name that will track back to the original name, at the top level.

On the other hand, if the variable was not previously defined, then we're inserting a new binding, much as in a local binding construct, LET or whatever, LAMBDA; if that's the case then the identifier must be renamed to *avoid* conflicts with other identifiers. So if some code coming after this top-level definition inserted by a macro use then defines its own variable of this same name, that needs to not conflict with the name that was inserted by the macro. Because then the macro implementation and the later code are interfering with each other, conflicting over this name. So in this case it *does* need to be a renamed object, so that the interpreter will *not* resolve it to the same variable as the later code.

Okay, so if this is correct so far, then we need to distinguish between these cases based on whether the variable was previously defined at the top level; but as I just said, because of lazy lookup, the compiler's not in a position to know whether the variable is defined at runtime, because it depends on the later code - it depends on what happens between the compilation and the evaluation of that code.(vii) So that, in a word, is the corner that I'm painted into [laughing], even though this seems an incredibly obscure situation of having a top-level macro which inserts a top-level definition, which may or may not be previously defined and depends on the interpreter's interpretation of the standard as to whether all variables are bound to a location!

[both laughing]

So this is why the article isn't out yet.

R: There's the title.

J: The whole sentence!

[ongoing laughter]

R: "In a word," 15 minutes later...

J: Well compared to the first 45 minutes it was a short word.

R: Yeah but you needed all 15 minutes to get to that word.

J: Yes. And so... "and so, my friends..." The question of should this be fishing out the internal name or not... there's some other comment on this situation that I had, if I can find it.

There's too much functionality implemented in here now, it's too hard to find things in the code.

R: Hm. Problem of abundance.

J: "Cop-out", was that the word I used? No... it's not that one...

R: "Lazy lookup"?

J: Nah... was it the section name, 5.2? No... "defined"... here it is, I think - no...

"I haven't found how this might happen, but neither am I certain that it can't" - something else.

R: It's almost like, somebody that's trying to create code that fucks it up, like deliberately. They would have a problem with this. Like the proof that you're wrong in this edge case. I don't know.

J: Yeah, it sorta seems that way.

R: It seems like someone hunting out a bug.

J: Mhm. I wonder if... I think I made a test case for this, that's probably where the comment is that I'm looking for.

Here it is.

Yes. So, here's the test case where I'm defining a syntax; that means defining a macro. The name of the macro is "define-my-var", so this is a macro that's going to define a variable named my-var, whenever you invoke this macro; it has no arguments and it inserts a (define my-var 1).

Okay, it so happens that in this test framework the name my-var was already defined above, therefore this does not insert a new binding, but modifies the existing one: the referential transparency requirement. So I use the macro, I say (define-my-var). This statement then is rewritten as that statement, so this becomes (define my-var 1); that's what the macro does. That definition then is evaluated and the my-var is supposed to refer to the original name, and that means if we then check the value in my-var it should have been updated to this number 1, rather than the symbol that was in it before. That's the test case. And that works correctly, at the moment. Because I took the approach of taking the original name.

Otherwise, if it was not defined previously, then it's inserting a new binding, so it's supposed to use a renamed variable, which would thus not be visible as the name my-var to later expressions. That's the collision avoidance requirement. "Unfortunately, the present compiler is not in a position to tell the difference; it doesn't track all bindings in the runtime interaction environment, as that's the interpreter's concern."

R: Ah, there's "cop-out".

J: "There's the cop-out that implementations are allowed to treat all names as bound at top level, such that define really is equivalent to set!", and that is in effect what we're doing. So, that's how we're treating the renaming; if we're considering that all top-level definitions are not inserting a new binding but are referring to an existing binding, then the behavior is correct; but we're not consistent in that treatment. Because that's not actually the way that we treat defines. "That said, the difference seems rather spurious anyway, more of an unintended implication of the rules than something that would be sane to rely on."

So that's about where it's at.

I made a test case for the opposite behavior, where we define a syntax again called "define-my-var", where in this case my-var was not defined previously. We call that macro, so it inserts the definition; it defines my-var to something, and because that's new it's supposed to rename such that a subsequent reference to the name my-var should produce an error, because as far as the innocent user of this macro is concerned, he did not define any variable named my-var and therefore it should not have been defined; this was just some internal detail that the macro did to fuck him up. That's sorta the charitable interpretation of the intent of the feature, or why it might be desirable to work in this way.

I have written macros that insert definitions, even top-level definitions. A good example was in my Scheme package system that I used in gbw-signer. There's these different modules in different source files; there's the bignum, the ecdsa - those are the main ones, and there's the gbw-signer main program, and I made an "import" macro which allows you to fish out the names defined in one of those packages and define them at the top level of the program. And that always worked fine. I also wanted the same import macro to work for internal definitions so that you could import things from one module into another module; that never worked because the prior compiler had problems in its macro implementation and it didn't mix properly with internal definitions. That is resolved now, as a result of the last month's work, so that macro now works for top-level or internal definitions. However in that macro, the names being defined are actually provided by the user code. So there's no ambiguity as far as renaming of macro-inserted identifiers because it's not macro-inserted. It's like, you say (import foo); well, it's understood that you're defining the name "foo" in the environment in which you're using that macro.

So I've not run into this situation in practice, only something somewhat like it.

And I've looked at some FAQs, I dunno if I found any mailing list discussions, a lot of that stuff's getting hard to find. But I didn't find anything that addresses this particular...

R: Is there, like archives to download? Is it all like downloadable archives for mailing lists?

J: Um, I think there's maybe some, then some that maybe you could wget. There's different sources though, there's different places to look, different sites. I think I may have some things archived, but not exactly sure what and I don't have them in a readily searchable form.(viii)

R: So what do you see as the way out of the corner you painted yourself into? What are the different ways you could take this?

J: Well, certainly the easiest thing is to just leave it as is; document the behavior. Which is probably the way that it's gonna go, for now at least. I wonder if - that other comment was, like, not sure about - it says that the compiler doesn't track the bindings in the runtime environment because that's the interpreter's concern, but is there more to it than that? Because I was saying... top-level definitions can be introduced later, after the first one is compiled; and then, no, but that would apply to code that's in a procedure or something, hm, something where the evaluation is deferred. Just thinking out loud.

I'm just thinking like even if they increase the integration between the interpreter and the compiler, so that let's say the compiler does have visibility of the full runtime interaction environment - I think that would be enough, actually. I think that would fix it. Because at the time that you're compiling an expression, you would know at least all the things that had been compiled up to that point.

Well, so it might fix it, more pondering necessary if I want to go that way.

R: You want to talk through it?

J: ... sure.

So the idea is like, if we want to completely correctly resolve this situation, according to all the interpretations aforementioned, such that it does distinguish this situation of whether the name was defined determining whether the macro-inserted identifier is renamed: what would matter is whether the name is defined at the time that this macro-inserted definition is evaluated; and top-level items (expressions, definitions, etc.) are compiled and evaluated sequentially. That is, one is compiled, then it's evaluated, then the next is compiled, then it's evaluated. It has to be that way, for other reasons.

R: Does that get rid of the lazy lookups?

J: No. Because, let's say, you can have a top-level definition which defines a procedure, in whose body a reference is made to some variable that has not yet been defined. That's fine because you're just defining the procedure, you're not executing it. Then when you call that procedure, at that point that referenced variable then will have to be defined.

So I think because their compilation and interpretation is interleaved in this way, at the top level of the program, at the time you're compiling a top-level definition, you will have visibility of all top-level names that have been previously defined; and so you know whether the new name being defined is bound or unbound. I really think it would work. However another complication then is, we're essentially now allowing these complex generated name objects to be placed in the runtime environment as the names, and now essentially the interpreter C code may have to be changed in some ways to accomodate this, not sure.

Well yes, because suppose you find that the macro is inserting a new identifier, and so it has to rename it; and then say the macro makes a reference to that identifier, that reference is similarly renamed; that will be compiled as a global variable reference, which uses the lazy lookup, which means that when the interpreter does that lazy lookup, it's now not merely comparing symbols but it's comparing potentially complex renamed objects, and so the interpreter now also needs to gain the logic for how to structurally compare those objects rather than just comparing by pointer equality. Unless they are interned, which was one possibility I had in the compiler that I didn't pursue, because it seemed like a sort of an optimization, a premature optimization.

So there's going to be at least those two issues to deal with: making sure that the interpreter is set up to handle these renamed objects, and allowing the compiler to see the top-level runtime environment, to know which names are defined or not. That's the cost side, and then the benefit side is that we are able to distinguish this incredibly obscure case, that I can't as yet see why anyone would need to be done "correctly".

R: So maybe it's just best for now to write it all out, and say this is how it works, and it could be changed, and this is how it would be done in the current state of the code, but... I'm happy with the corner I painted myself into. Or like I'm satisfied, this is how I would solve it if it needed to be but I don't see that it needs to be.

J: Yeah.

R: And especially like, yeah we want more people to be using this, but for right now it's like, you're using it, so as long as it's clear how it works, you're not gonna write code that has bad assumptions about this.

J: Right.

And really if you were writing code that depends on this behavior, then that same code would also be dependent on this treatment of all names as bound at top level. So, you're writing code that won't work portably on different legal Scheme implementations anyway. So really seems like it's fine, just mildly inconsistent.

Yeah that's the similarity to the other issue that I already wrote about, which also involved top-level defines I think. And redefining top-level names, right, because I defined a procedure, I defined whatsit as a variable, I then redefined whatsit as a macro, and that then changes the meaning of this previous function definition, even after it's been compiled. So the similarity here is that, we're not redefining it as a macro but we're having a macro-inserted definition which may or may not refer to the name written on the tin, depending on the prior state of that name. That's why these two issues sort of came up together, and I first thought they'd come out in one article.

R: Well I think like, part of this problem's the standards writers didn't want to take the decision, they left it kind of ambiguous.

J: Basically that means that there were already implementations that did it in different ways, they could see a legitimate argument either way...

R: Mmm.

J: So they're not forcing it down anyone's throat, they're just describing the state of the world.

R: Mmmm. And you just gotta pick the side, basically.

Now these Scheme systems that they were referencing when they wrote the standard, what happened to those?

J: There are many Scheme implementations still around; MIT Scheme that I was playing with earlier is probably pretty close to a lot of these guys. There's a lot, really; the boiled-down simplicity of the core language at least makes it amenable to implementing rapidly, which is sorta what got me sucked into it in the first place. But then you get into all these details...(ix)

But there were different experimental implementations, like it was first implemented on MacLisp I believe, in other words on a commercial Lisp system, and then ported to different... to PCs and to whatever else.

But yeah, I do wish I had the comments section and the trackbacks to this standard.

R: Yeah. When was this standard written?

J: 95? 98.

R: They had wordpress back then.

J: Dialup modem? They had Usenet. I wish I had a Usenet archive. Somehow Google stole it all and put it behind their stupidwalls.

R: I bet you could find it on a torrent. That'd be interesting.

J: How did that work? Like there used to be... maybe usenet was a company at first, then it became more federated(x) and it was sort of like a collection of mailing lists but more like blogs in that it would be stored on a central news server and then you'd just go retrieve it whenever you connected to the Net, so it more fit the dialup world. And then a Usenet node would... sorta like Bitcoin nodes they'd all have all the history of all the newsgroups.(xi) But of course the content grew and grew, and there were the binaries and people were sharing porn on it and so on, and this became too much of a load, so some servers would start to filter what groups they would carry...(xii)

"We are one of the most extensive archives of the Usenet newsgroups on the Internet, archiving hundreds of millions of Usenet posts." Archive.org... the usenet historical... "the first Usenet archive was called DejaNews and assembled a database of Usenet postings going back to 1995; DejaNews was acquired by Google and renamed Google Groups... after continued updating is now by far the best newsgroup archive available with database going back to 1982" - but we can't download it.

Anyway, that'd be one forum where discussions were had but there were others.

R: Something happened with the archive.org, where it was in read-only mode recently.

J: What's read-only mode? Like it's not doing...

R: You could look up stuff that had already been saved but you couldn't save stuff.

J: Oh you mean archive.is or whatever?

R: No, the Wayback Machine, on archive.org.

J: Did they have an interface where you could request them to save something?

R: I think so.

J: Ok. I just thought they did a crawl.

R: Well, in any case they stopped crawling.

J: Mhm. But they're back?

R: I don't know if they're back. That was something some people were fearing like "oh look, they're getting rid of the Wayback Machine before they steal the election."

J: Heh.

  1. My personal favorites seem to be the ever-popular "uhh" as well as "so", "like", "sort of", "a bit", and "basically"; sometimes even "essentially". [^]
  2. For one, it would quite invert the flow of processing because the "user" code that creates the binding in question is *outside* the macro use. [^]
  3. Jonathan Rees, The Scheme of Things: Implementing Lexically Scoped Macros, 1993. Rees is listed among the Editors in R5RS, which does not cite this paper directly but rather some related ones. [^]
  4. Yet! [^]
  5. The technique is memoization. [^]
  6. Thanks, com.danielkim.soundrecorder_130.apk, quite some progress we've made since my dad's cassette tape recorder of the 90's. [^]
  7. This climactic statement of the problem is not actually quite correct, as we'll explore further on. [^]
  8. Note to self: cs.cmu.edu-ai-repository.tar.gz, ftp.cs.indiana.edu-scheme-repository.tar.bz2 [^]
  9. Fwiw, even with all the pesky details so far I still think it's one of the more high-leverage languages to implement, in terms of value relative to cost; at minimum it still wins on uniformity of grammar, absence of associativity and precedence issues, and low count of required special syntactic forms. If it weren't a high-level hygienic macro language I was struggling with, then it would be something perhaps even more complex like templates or generic packages, or the downstream fallout from lacking such things. However, I'd certainly say I underestimated the size of the mountain when I started on the climb. [^]
  10. No; I've learned that it started out with message exchanges between universities, independent of the ARPANET and certainly well before the commercial Internet. [^]
  11. Not really, it was more "ring buffer" style with new posts evicting the oldest ones on a per-newsgroup quota basis, with possibly just a few days average retention time. Hence why complete archives are so hard to come by. Also unlike blogs in that newsgroups were unmoderated by default, saved from the slide into meaningless drivel and advertising, to the extent that they were and for as long as they were, only by the exclusivity of the system as a whole, which was well on the way out by the notorious "eternal September" of 1993 where AOL opened the floodgates to the ever-unwashed masses. [^]
  12. Believe it or not, the thing is still around in town-dump format, with various providers seeming to do a brisk business driven mostly by filesharers who now complain about the freeloaders who use the system for their personal encrypted big data backups, to the tune of hundreds of terabytes added daily and providers now claiming years of binary retention. [^]

No Comments »

No comments yet.

RSS feed for comments on this post. TrackBack URL

Leave a comment

Powered by MP-WP. Copyright Jacob Welsh.