Implementer’s Notes on Supporting ECMAScript 3rd and 5th Editions

Having implemented an interpreter for the 1st edition of ECMAScript, adding support for more recent versions seemed like the next logical step for mjs.

As you may or may not know (in which case you can read more at e.g. A Brief History of JavaScript) there were never any ECMAScript 2nd or 4th editions, so “only” the 3rd and 5.1th editions need work.

Early on I decided to keep support for older versions even while implementing the newer ones as I thought it would be neat to be able to switch the interpreted version at runtime (or at least on start-up). In the grand scheme of things this turned out to not cause too much extra work, and it helped improve conformance for both the older and latest versions.

Speaking of conformance, I don’t claim full conformance with any of the standards even though I’ve tried my best. Turns out understanding a computer language specification is hard. Doubly so when you want to reach the same conclusions as your fellow implementers. A large part of testing has been comparing my implementation to existing ones and poring over the specification(s) to figure out who was wrong. This is of course a little bit dangerous as there is no such thing (that I know of at least) as a reference ES5.1 implementation to check against - most actually implement bits beyond ES5.1 and/or have exhibit legacy behavior in some corner cases.

The implementation also relies heavily on the C++ standard library for things like number parsing, calendar and regular expression support. While I’ve tried to make sure all the tested platforms behave correctly, there will are places where conformance suffers because of my laziness.

Snapshots of the code where I thought I was done with the various versions can be browsed here:

This document was written at around the point of initial ES5 support being done.

If you notice anything wrong in the text or with the code, let me know (though I don’t promise to follow up).

What Went Right

As mentioned in the introduction, keeping support for older ECMAScript editions turned out to be the right decision. It helped retain the value of the work put into supporting the older editions and prepared the source code and myself for the mindset needed for strict mode, where many rules change for individual functions (and sometimes expressions). Seeing as how the ECMAScript specification writers also seem to like to flip-flop on decisions (see the quirks section), this also keeps code alive and tested that will become useful when implementing later versions.

Implementing new features as complete “slices” (where all the necessary lexer, parser and interpreter changes are made for the feature before moving on the next one), rather than making say all lexer changes, then all parser changes, etc., helped in giving a sense of progress and accomplishment during the otherwise daunting task of supporting a new version.

When starting work on a new ECMAScript edition I initially gathered a list of new features and changes and grouped them in a way I though made sense to implement them in. I also made notes on which features might be interdependent, so I could either implement them together or make appropriate stubs or placeholders. Things that look like they might just be additions to the standard library sometimes turn out to hide major restructurings in other parts of the code, while language extensions that look big turn out to be easily implementable in isolation.

Taking a cue from oil shell I’ve tried to keep the lexer “dumb” (though still versioned), keep most context-sensitive logic in the parser and let it direct mode changes and handle validation. There’s room for improvement in their interaction as the parser sometimes duplicates logic found in the browser for instance when recognizing octal literals in strict mode string literals. Given that ES1 with its semi-colon insertion rules already more or less mandates an impure parser it seemed like the natural place to add more complexity.

Overall - except for the cases noted below - the structure of the code after finishing initial ES1 support was very conductive for implementing the later versions. Most changes could be made incrementally and the overall structure of the code is mostly unchanged. With some minor additions the garbage collector, representation of values, and object model could be adapted.

Implementing support for ES3 went like a (comparable) breeze and apart from regular expression literals making the lexer context sensitive (see above note) the language changes seemed for the most part logical and reasonable. Most of the additions are either sorely missed conveniences like array and object literals, function definitions as expressions or simple language constructs found in similar languages: do..while, switch, strict equality comparisons and support for exceptions.

What Went Wrong

ES5… Well “wrong” is probably too strong a word (but fits with the classic game postmortem headlines), but supporting ES5 definitely caused more headaches than ES3. On the surface it might seem like there weren’t that many changes between ES3 and ES5.1 apart from a bunch of new standard library features, but actually supporting all the features required a complete rework (which is at the time of writing not finalized) of how object properties are handled.

In ES1 and ES3 object properties are simple: Sure the user can add, change and remove properties, but only simple ones and they always behave the same (enumerable, writable and configurable). This all changes in ES5 where evil evil functions like Object.defineProperty and moral equivalents allow the user to impede all my attempts at optimizations and cleverness. Combined with the addition of getters and setters has left all my dreams of having an optimized Array object in temporary ruins.

The appearance of getters and setters also means that the most standard library functions (which are all implemented in C++) are littered with stale pointer use bugs. A large task in exposing and correcting these awaits, or admitting failure and inhibiting garbage collection while a native function is executing. An example of this kind of problem looks like this:

// Unsafe in ES5 mode!
void copy_property(string name, object_ptr o1, object o2) {
    o1->put(name, o2->get(name.view()));
}

Copying a property from one object to another, what could possibly go wrong? Nothing in ES1/ES3, but in ES5 if name refers to a property with a user-defined getter, a garbage collection could be triggered, moving the object pointed to by o1 and leaving the result of o1->stale (if the compiler chose to evaluate it before the call to o2->get!).

Fixing it is a simple matter of rewriting the expression to make sure the call to o2->get is fully evaluated before o1->:

// Safe in all modes (assuming valid arguments)
void fixed_copy_property(string name, object_ptr o1, object o2) {
    const value v = o2->get(name.view());
    o1->put(name, v);
}

Finding all of these new problems - especially when they’re not guaranteed to occur due to differences in the generated code - is going to be an ongoing challenge for a good while, I fear.

It also seems like, that while adding support for ES5 the need to access the global context arose in more places than before. I may be too emotionally attached to my decision to not have an easily accessible global context to change it now, but my recommendation for other ECMAScript implementers would be to just give in, and keep it in thread-local storage for easy retrieval. Know which hill to die on. Unlike me.

When running the test suite it can also be felt that the number of tests have grown a lot while the attention to speed has been put on the back burner. This is of course a natural development as I have been prioritizing correctness, but the code feels in need of a round of optimizations. Luckily I now have better test coverage to ensure the hard work on conformance isn’t compromised.

A Small Collection of ECMAScript Quirks

What follows is a loose collection of notes of funny/sad/interesting ECMAScript behavior I’ve noticed. It’s highly likely that I’m just wrong and have misinterpreted the specification(s) and/or relied too much on the observed behavior of other interpreters.

For more “fun” have a look at some of the test cases in the source code.

Or come up with an appropriate set of tests for the 54 steps of Array.prototype.splice (ES3, 15.4.4.12) or try to figure out new ways my eternal nemesis Object.defineProperty can hurt me and break my code.

Labelled Statements

Starting out soft (and maybe controversial) here and I know they’re in Java, but really they shouldn’t have been part of ECMAScript. The need for them is rare and the code can always be structured in ways that make them unnecessary. Their rarity also means that the future maintainer of the code will probably be stumbled at first when encountering them.

Elision of Array Literal Items

ECMAScript allows you to elide items in an array literal:

[1,,2,]

produces a list of length 3 with [0] === 1 and [2] === 2 but [1] not defined (i.e. the array doesn’t have a 1 property). Multiple items can be elided in this way. Also notice that the final comma is a trailing comma, which doesn’t contribute to the length of the array, not an elided item (it seems ancient versions of Internet Explorer didn’t implement this rule)!

I don’t feel too strongly either way about this feature, but don’t find myself wanting or needing it often and where it might make sense - say for a spare lookup table - I’d worry about the readability.

Switch Statements

In ECMAScript the case clauses of switch statements are allowed to be full expressions, which means that:

switch (2) {
case (console.log('A'),0):
case (console.log('B'),1):
case (console.log('C'),2):
case (console.log('C'),3):
}

prints A, B and C.

Another quirk is that:

switch (0){default:}

is legal in ECMAScript but a syntax error in most other languages (that I’m familiar with). A - possibly empty - statement is usually required after default:.

Flip-flopping on Decisions

Should format control characters be stripped from the source code before processing? ES1 is silent, ES3 says yes, ES5 says no.
ES3: Let’s reserve a bunch of keywords! ES5: Nah, let’s not (abstract etc.)
Redefining properties in object literals? OK in ES3 and non-strict ES5, SyntaxError under some conditions in strict mode. Except it seems to be changed back to the old behavior in ES6.

Naming of Property Attributes

In ECMAScript 1st and 3rd editions properties can have certain attributes:

ReadOnly
DontEnum
DontDelete

in the 5th edition these are now inverted and called:

Writable
Enumerable
Configurable

While the first two are arguably better names, I’m not (yet?) fond of “Configurable”. There is more to a property being “Configurable” than what used to be implied by (not DontDelete) in the new property model in ES5, and perhaps the “subtle” change without giving me a good mental model is what is fueling my suspicion towards this change.

Direct Call to `eval`

Like the heading says calls to eval behaves differently on whether the call is made directly (eval(...)) or indirectly (e=eval; e(...); or (0,eval)(...)). I guess this is to allow more advanced interpreters to pre-evaluate some direct calls to eval? Either way it feels dirty to have this distinction.

Strict Mode

I’m all for strict mode and advise you to employ whenever possible. That said some of the rules seem pretty arbitrary and some of them required (in my implementation) some nasty hacks to realize. Some cases of note (again be aware that it’s likely that I’m simply misreading the specification):

'use strict'; function f(a,a){} // SyntaxError

Great, another error caught early! Duplicate parameter names aren’t allowed.

'use strict'; f=Function('a','a',''); // Should also fail, right?

Doesn’t throw an error though. The function body needs to be strict for it to fail.

A slightly amusing example is:

function f() { '\012';'use strict'; }

is specified to fail on the octal escape sequence because they are not allowed in strict mode function bodies. Yeah, that’s another special case in the parser…