author | Alan Dipert
<alan@dipert.org> 2020-02-12 09:25:46 UTC |
committer | Alan Dipert
<alan@dipert.org> 2020-02-12 09:25:46 UTC |
parent | 55b7cd3011f3bfea936f663123d921a025929116 |
paper/jacl-els-2020.tex | +351 | -63 |
diff --git a/paper/jacl-els-2020.tex b/paper/jacl-els-2020.tex index 33db01f..d6aecb5 100644 --- a/paper/jacl-els-2020.tex +++ b/paper/jacl-els-2020.tex @@ -145,12 +145,13 @@ In this paper, I will describe JACL's design and the current state of its implementation. I will also outline major remaining work, such as introduction of the Common Lisp Object System (CLOS) and the ability to produce optimized deliverables. + \section{Previous Work} Many Lisps exist that either compile to JavaScript or are interpreted -by a JavaScript program. The vast majority of these were created for -educational purposes. In order to focus my comparison, I will only -discuss Lisps that meet either of the following criteria: +by a JavaScript program. In order to focus my comparison, I will +survey only Lisps that have either demonstrated industrial utility or +that implement a significant subset of Common Lisp. \begin{itemize} \item They implement a significant subset of Common Lisp. @@ -166,26 +167,26 @@ to target JavaScript. Parenscript is not bootstrapped and its compiler is not written in JavaScript, and so it relies on a hosting Common Lisp system for -compilation. It offers no REPL. +compilation. -By default, JavaScript generated by Parenscript relies on no -additional data structures or functionality other than that provided -natively by JavaScript. Even with this limitation, Parenscript -supports a surprising amount of Common Lisp, including some lambda -list support, and support for multiple values. +Only JavaScript types are available to Parenscript programs at +runtime, and so Parenscript is more of a syntax frontend for +JavaScript than it is an interactive Lisp system. -If runtime support is enabled, a few sequence functions become -available to Parenscript programs. +While Parenscript is not positioned to facilitate large-scale SPA +development, it remains popular way for adding dynamic +JavaScript-behaviors to static Web sites. \subsection{SLip} % http://lisperator.net/slip/impl SLip [ref] is perhaps the most ambitious Common Lisp-on-JavaScript -system to date with respect to the sheer number of features it -offers. However, it intentionally diverges from Common Lisp in some -respects. Based originally on the compiler and bytecoded VM presented -in PAIP [ref], SLip features include: +system created to date. It offers a stunning array of powerful +features. + +Based originally on the compiler and bytecoded VM presented in PAIP +[ref], SLip features include: \begin{itemize} \item Self-hosting compiler @@ -195,36 +196,37 @@ in PAIP [ref], SLip features include: \item Green threads \item Package system \item TinyCLOS [ref] - \item Partial condition system with \texttt{HANDLER-CASE} and \texttt{HANDLER-BIND} - \item GNU Emacs-like browser-based editor and interactive development environment + \item Resident Emacs-like editor and interactive development + environment \end{itemize} Because of interpretation overhead, performance relative to JavaScript is necessarily worse. -Generated code is also larger than comparable JavaScript code, so -applications would take longer for users to download than their -JavaScript equivalents. - -FASLs can be created that represent Lisp as a series of VM -instructions encoded as JavaScript function calls. These are faster to -load in the browser than Lisp source code because the browser uses its -native parser to load them. - The compiler features a peephole optimizer based on the one in PAIP. There is no way to tree-shake the Lisp image given an entrypoint, or to otherwise produce standalone JavaScript executable that a static whole-program optimizer such as Google Closure could work with. -These performance and code-size limitations render SLip impractical -for the type of of industrial SPA development JACL is designed to -facilitate. +Lisp files may be batch-compiled to FASLs. FASLs represent code as +JavaScript code instead of as Lisp data. The browser is able to load +FASLs faster than SLip code because the browser's native JavaScript +parser is much faster than SLip's reader. + +Despite the ability to produce FASLs, the interpreted nature of SLip +precludes the system from producing competitively fast or small +application deliverables. Consequently, SLip does not satisfy the JACL +project goal of facilitating large-scale industrial SPA development. \subsection{JSCL} Of existing Common Lisps, JSCL is the one aligned most closely with -the JACL project goal. Its features include the following: +the JACL project goal. Unlike Parenscript, JSCL is a genuine Lisp +system. And unlike SLip, JSCL compiles directly to JavaScript instead +of to an interpreted bytecode. + +Its features include the following: \begin{itemize} \item Self-hosting compiler targeting JavaScript @@ -235,59 +237,345 @@ the JACL project goal. Its features include the following: \item \texttt{CLOS} \end{itemize} +The JSCL system and its features are organized hierarchically. I will +now investigate how the lowest level Lisp and JavaScript operations +are supported by JSCL. + +\subsubsection{JavaScript interoperation} + JSCL's support for JavaScript interoperation consists of the following conventions and facilities: \begin{itemize} - \item Unboxed numbers: JSCL fixnums and flonums both map to + \item \textbf{Unboxed numbers}: JSCL fixnums and flonums both map to JavaScript's native \texttt{number} type. This allows for numeric performance competitive with JavaScript. - \item Functions are JavaScript functions: JSCL functions compile to - concrete JavaScript functions that may be invoked from JavaScript - without any special calling convention. - \item The \texttt{\#\textbackslash j} pseudo-symbol syntax which I will - elaborate on below. + \item \textbf{Native functions}: JSCL functions compile to concrete + JavaScript functions that may be invoked from JavaScript. + \item \textbf{\texttt{\#\textbackslash j} syntax}: The pseudo-symbol + reader syntax, which I will elaborate on below. \end{itemize} -The JSCL reader parses tokens such as \texttt{\#\textbackslash - j:window:console:log} as segmented references to hierarchical -objects in JavaScript's global environment. +\subsubsection{\texttt{\#\textbackslash j} syntax} + +The JSCL reader parses tokens like \linebreak\texttt{\#\textbackslash + j:window:console:log} as case-sensitive references to hierarchical +objects within JavaScript's global environment. Case preservation is +necessary because JavaScript identifiers are case-sensitive. In a value context, references evaluate to the referenced JavaScript -object. \texttt{\#\textbackslash j:window:console:log} evaluates to +object. For example, \texttt{\#\textbackslash j:window:console:log} evaluates to the same object that the JavaScript expression \texttt{window.console.log} does. -In function position of operation forms --- as in +In function position of operation forms --- as in \linebreak \texttt{(\#\textbackslash j:window:console:log $"$hello$"$)} --- references are taken as the name of a foreign function to call. The compiler generates code that coerces the arguments from Lisp to -JavaScript types and applies them to the named foreign function. In -this example, the Lisp string \texttt{$"$hello$"$} is converted to a -JavaScript string before being passed to the function +JavaScript types and applies them to the named foreign function. + +In the preceding example, the Lisp string \texttt{$"$hello$"$} is +converted to a JavaScript string before being passed to the function \linebreak \texttt{window.console.log} for display in the JavaScript console. -TODO Problems with JSCL REPL -TODO Problems with JSCL interop -TODO Problems with JSCL compiler organization +JSCL has a distinct string type because unlike Common Lisp strings, +JavaScript strings are immutable. -\subsection{ClojureScript} +Argument type coercion is automatic, cannot be disabled, and can be +defeated by compound structures. For example, should a JavaScript +array containing Lisp strings be passed as an argument, the inner Lisp +strings would not be converted to JavaScript strings. + +\subsubsection{Multiple values} + +Despite being native JavaScript function objects, JSCL functions +require a special calling convention. As JavaScript functions, JSCL +functions always require one more argument than they declare. This +required first argument is the function to which any returned values +should be applied. + +This argument is the basis of a clever and robust mechanism for +supporting multiple value returns. + +The JSCL compiler supplies this argument at function call sites +automatically. The value of the supplied function depends on the +calling context. If the called function is called in a context in +which only a single value is required, the first argument is a +function that returns its first argument and discards all others. + +If the function is called in a context that demands multiple values, +the compiler supplies as the first argument a function that returns +all of its argument as a tagged array. + +JSCL functions that appear as arguments in a \texttt{\#\textbackslash + j} syntax foreign function call are wrapped in a JavaScript function +that supplies the first required argument. In this way, JavaScript +code may call the functions it receives through FFI without supplying +the first argument. + +This arrangement implies the reasonable limitation that JavaScript +functions will never receive a tagged multiple values array from +called JSCL functions. + +However, it does require that all JSCL functions passed to JavaScript, +by any means, must be suitably wrapped. + +\subsubsection{REPL} + +JSCL includes a reader, compiler, and printer, and evaluation is +performed by JavaScript's \texttt{eval} function. Between these, a +Read Eval Print Loop (REPL) is possible, and the JSCL distribution +includes a reference implementation. + +However, JSCL only supports reading from string-backed input streams. +Input streams are not an abstraction supported by JavaScript in Web +browsers. With a few obscure exceptions [foot], JavaScript programs +may only receive input asynchronously. + +Effectively, the JSCL reader consumes strings, not streams in the +traditional sense. An error is signaled if the end of the string is +encountered before the reader has finished reading a datum. + +Because input strings may not contain partial data, the REPL +necessitates a ``pre''-reader process that accumulates characters in +response to asynchronous input events, and invokes the reader only +once a complete form --- as a string --- has been accumulated. + +Such a pre-reader can be found in the JSCL REPL implementation. It +handles standard Lisp atoms and list syntax, but would be stymied by +reader macros. As such, the pre-reader is a separate, degenerate +reader that limits what's capable of being read by the underlying, +full-featured synchronous reader. + +Because JSCL includes no demonstration of embedding the REPL in an +application, and because JSCL itself is compiled in batch through a +synchronous process, the limitations of the pre-reader appear not to +encumbered those who have used JSCL to date. However, the lack of a +full-fledged asynchronous, embeddable reader severely limits JSCL as +an interactive Lisp development environment. + +\subsubsection{Compiler organization} + +JSCL compilation involves the following two high-level stages: + +\begin{itemize} + \item Conversion from Lisp to JavaScript AST represented as + S-expressions. + \item Conversion from JavaScript AST to JavaScript strings. +\end{itemize} + +The first stage, the conversion from Lisp to JavaScript Abstract +Syntax Trees (AST), is where the implementation of Lisp's special +forms in terms of JavaScript language constructs and runtime support +is performed. This is done in a single pass in which macro expansion, +lexical analysis, and JavaScript AST generation all happen. The +lexical environment is maintained in a special variable as the +compiler descends into Lisp code. + +Code for \texttt{TAGBODY}/\texttt{GO} is generated in the first +stage, and the generated code is much slower than comparable +JavaScript code for looping. + +Only the general, dynamic case of \texttt{TAGBODY}/\texttt{GO} is +implemented. Every control transfer initiated by \texttt{GO} results +in a JavaScript exception being thrown, an expensive operation. + +Since many Common Lisp operators have \emph{implicit tagbodies}, and +since most other iteration operators are expressed in terms of +\texttt{TAGBODY}/\texttt{GO}, this performance problem pervades the +JSCL system. + +Unfortunately, the first stage of the JSCL compiler does not define +its own AST, and so there is no framework for implementing +sophisticated, staged optimizations, such as those that could mitigate +performance problems like this one. + +Optimizations such as this are particularly important for a Lisp to +JavaScript compiler to perform, because they are exactly the kind that +a JavaScript optimizer could not perform, provided only JavaScript +code. + +Curiously, the arithmetic optimizations performed by the second, +code-generation stage are \emph{exactly} the kind that JavaScript +optimizers excel at. + +If JSCL oriented itself toward generating code optimizable by third +party tools, then its own internals could be simplified to a degree. + +\section{Design and implementation} + +From a design perspective, JACL is an effort to balance the +requirements of an interactive and practical Lisp development +environment with the constraints imposed by the Web browser platform. + +JACL proposes several design innovations in pursuit of this balance. + +\subsection{Asynchronous reader} + +The basis for interactive development in Lisp is undeniably the REPL, +but as JSCL's ``pre-reader'' conundrum demonstrates, even the direct +approach to this simple mechanism is hampered by JavaScript's +asynchronous model of input. + +To resolve this difficulty, JACL's reader facility is completely +asynchronous. It is the ``pre-reader'' taken to its inevitable +conclusion. + +JACL's reader is implemented as a JavaScript class, +\texttt{Reader}. \texttt{Reader} instances are parameterized by an +input source. One such input source is the \texttt{BufferedStream} +class. The input source asynchronously notifies the reader instance +when characters are available. The reader incrementally consumes these +characters. Once the reader has accumulated a Lisp datum, it notifies +its consumers of the availability of the datum. + +The JACL reader implementation makes extensive use of modern +asynchronous JavaScript programming techniques including Promises, +iterators, async functions, async iterators, and the \texttt{await} +keyword. + +This language support for asynchronous programming reduces verbosity +of asynchronous code, while also allowing it to be nearly as easy to +follow as comparable synchronous code. For example, the following is +an example of instantiaing a \texttt{BufferedStream} and +\texttt{Reader}, sending in characters asynchronously after a delay, +and then displaying the resulting Lisp data to the JavaScript console. + +\begin{verbatim} +let bs = new BufferedStream(), + rdr = new Reader(bs); + +window.setTimeout(() => { + bs.writeEach("123 "); +}, 1000); + +console.log(await rdr.read()); +\end{verbatim} + +In the preceding example, \texttt{window.setTimeout()} is used to +enqueue a JavaScript function for execution after 1000 +milliseconds. The enqueued function writes input to the +\texttt{BufferedStream} \texttt{bs}. + +Before the enqueued function is invoked, execution proceeds to the +\texttt{console.log} call, but is suspended by the \texttt{await} +keyword. + +The \texttt{await} keyword expects a Promise object on its right side, +and will suspend JavaScript execution until the Promise has +``resolved'', or notified its subscribers that the pending computation +it represents has completed. \texttt{rdr.read()} returns such a +Promise. + +Once \texttt{rdr} has completed a form, the \texttt{await} evaluates +to the number \texttt{123} and the number is printed to the JavaScript +console. + +The ``read'' portion of JACL's REPL is satisfied by first establishing +\texttt{BufferedStream} and \texttt{Reader} objects. Then, in an +asynchronous loop, objects are consumed from the \texttt{Reader}, +analyzed, compiled, and evaluated. + +Concurrently, characters may be sent to the \texttt{BufferedStream} +instantiated by the REPL by calling its \texttt{write()} or +\texttt{writeEach()} methods. + +Neither character input nor read object consumption impede other +JavaScript operations, so the JACL REPL model supports embedding in +applications. + +It is hoped that in the future, JACL will be written in itself, and so +all of these techniques will be accessible from Lisp. + +\subsection{Chrome Devtools REPL} + +A browser-based REPL facilitates experimentation with the language by +interested people, from the comfort of their Web browsers. + +It's also a useful debugging feature of a deployed application. + +However, most developers already have a preferred text editor and a +refined REPL interaction workflow, so it's not within the JACL project +scope to build a resident text editor or IDE in the style of +SLip. Even if a resident editor was a goal, the file system access +restrictions imposed by the browser would present significant +challenges. + +Instead, JACL offers an alternative development REPL approach that +requires minimal host tooling: the DevTools-based REPL. + +For each tab, Google Chrome is capable of hosting a WebSocket-based +debug endpoint that implements the DevTools Protocol. The Chrome +DevTools Protocol is a standard for JSON object interchange over +WebSocket that makes all of the browser's debugging facilities +available remotely, over the network. + +JACL leverages this feature of Chrome to deliver a command-line REPL +client that may be run on the developer's host machine. The workflow +is the following: + +\begin{enumerate} + \item Run Google Chrome from the shell with the + \texttt{--remote-debugging-port} parameter. + \item Navigate to the Web site hosting the JACL system you wish to + interact with. + \item Run \texttt{jacl-repl} from the shell. + \item Be presented with a Lisp prompt. +\end{enumerate} + +As a simple command-line application, \texttt{jacl-repl} can be run in +various contexts. For example, it could be run within an Emacs +``inferior-lisp'' buffer, and then Lisp forms could sent from other +Emacs buffers for evaluation in the REPL. + +\texttt{jacl-repl} is currently an R script, but a standalone binary +executable is imagined in the future. + +An advantage of the reader approach taken by JACL with respect to the +implementation of \texttt{jacl-client} is that the REPL client is +completely ignorant of Lisp syntax and semantics. \texttt{jacl-client} +merely buffers and sends characters over the WebSocket and into the +Lisp system, and displays characters output from Lisp. While +\texttt{jacl-client} presents a synchronous input/output interface, it +is actually the frontend for bidirectional asynchronous data transfer. + +\subsection{Analyzing compiler} + +The JACL compiler is organized to facilitate high-level optimizations, +such as those necessary for efficient compilation of +\texttt{TAGBODY}/\texttt{GO}. + +Low level optimizations are left to (optional) JavaScript optimizers +that can be applied to delivered executables, which I will outline +next. + +The compiler has one analysis pass that results in an AST. The AST is +then converted to JavaScript strings by a code generation step. + +Each AST node is a JavaScript object with at least the following +keys: + +\begin{itemize} + \item \texttt{op}: The node name, a JavaScript string. + \item \texttt{env}: An \texttt{Environment} object that represents the node's lexical environment. + \item \texttt{parent}: The node's parent; this is \texttt{null} for the root. + \item \texttt{form}: The node's original source data, a Lisp datum. +\end{itemize} + +TODO + +\subsection{Delivery} + +TODO + +\section{Conclusions and Future Work} + +TODO + +\section{Acknowledgments} + +TODO -%% \subsection{Template Styles} -%% -%% The primary parameter given to the ``\verb|acmart|'' document class is -%% the {\itshape template style} which corresponds to the kind of publication -%% or SIG publishing the work. This parameter is enclosed in square -%% brackets and is a part of the {\verb|documentclass|} command: -%% \begin{verbatim} -%% \documentclass[STYLE]{acmart} -%% \end{verbatim} -%% -%% Journals use one of three template styles. All but three ACM journals -%% use the {\verb|acmsmall|} template style: -%% \begin{itemize} -%% \item {\verb|acmsmall|}: The default journal template style. -%% \item {\verb|acmlarge|}: Used by JOCCH and TAP. %% \item {\verb|acmtog|}: Used by TOG. %% \end{itemize} %%