git » jacl.git » commit 81a7b92

paper

author Alan Dipert
2020-02-12 09:25:46 UTC
committer Alan Dipert
2020-02-12 09:25:46 UTC
parent 55b7cd3011f3bfea936f663123d921a025929116

paper

paper/jacl-els-2020.tex +351 -63

diff --git a/paper/jacl-els-2020.tex b/paper/jacl-els-2020.tex
index 33db01f..d6aecb5 100644
--- a/paper/jacl-els-2020.tex
+++ b/paper/jacl-els-2020.tex
@@ -145,12 +145,13 @@ In this paper, I will describe JACL's design and the current state of
 its implementation. I will also outline major remaining work, such as
 introduction of the Common Lisp Object System (CLOS) and the ability
 to produce optimized deliverables.
+
 \section{Previous Work}
 
 Many Lisps exist that either compile to JavaScript or are interpreted
-by a JavaScript program. The vast majority of these were created for
-educational purposes. In order to focus my comparison, I will only
-discuss Lisps that meet either of the following criteria:
+by a JavaScript program. In order to focus my comparison, I will
+survey only Lisps that have either demonstrated industrial utility or
+that implement a significant subset of Common Lisp.
 
 \begin{itemize}
   \item They implement a significant subset of Common Lisp.
@@ -166,26 +167,26 @@ to target JavaScript.
 
 Parenscript is not bootstrapped and its compiler is not written in
 JavaScript, and so it relies on a hosting Common Lisp system for
-compilation. It offers no REPL.
+compilation.
 
-By default, JavaScript generated by Parenscript relies on no
-additional data structures or functionality other than that provided
-natively by JavaScript. Even with this limitation, Parenscript
-supports a surprising amount of Common Lisp, including some lambda
-list support, and support for multiple values.
+Only JavaScript types are available to Parenscript programs at
+runtime, and so Parenscript is more of a syntax frontend for
+JavaScript than it is an interactive Lisp system.
 
-If runtime support is enabled, a few sequence functions become
-available to Parenscript programs.
+While Parenscript is not positioned to facilitate large-scale SPA
+development, it remains popular way for adding dynamic
+JavaScript-behaviors to static Web sites.
 
 \subsection{SLip}
 
 % http://lisperator.net/slip/impl
 
 SLip [ref] is perhaps the most ambitious Common Lisp-on-JavaScript
-system to date with respect to the sheer number of features it
-offers. However, it intentionally diverges from Common Lisp in some
-respects. Based originally on the compiler and bytecoded VM presented
-in PAIP [ref], SLip features include:
+system created to date. It offers a stunning array of powerful
+features.
+
+Based originally on the compiler and bytecoded VM presented in PAIP
+[ref], SLip features include:
 
 \begin{itemize}
   \item Self-hosting compiler
@@ -195,36 +196,37 @@ in PAIP [ref], SLip features include:
   \item Green threads
   \item Package system
   \item TinyCLOS [ref]
-  \item Partial condition system with \texttt{HANDLER-CASE} and \texttt{HANDLER-BIND}
-  \item GNU Emacs-like browser-based editor and interactive development environment
+  \item Resident Emacs-like editor and interactive development
+    environment
 \end{itemize}
 
 Because of interpretation overhead, performance relative to JavaScript
 is necessarily worse.
 
-Generated code is also larger than comparable JavaScript code, so
-applications would take longer for users to download than their
-JavaScript equivalents.
-
-FASLs can be created that represent Lisp as a series of VM
-instructions encoded as JavaScript function calls. These are faster to
-load in the browser than Lisp source code because the browser uses its
-native parser to load them.
-
 The compiler features a peephole optimizer based on the one in
 PAIP. There is no way to tree-shake the Lisp image given an
 entrypoint, or to otherwise produce standalone JavaScript executable
 that a static whole-program optimizer such as Google Closure could
 work with.
 
-These performance and code-size limitations render SLip impractical
-for the type of of industrial SPA development JACL is designed to
-facilitate.
+Lisp files may be batch-compiled to FASLs. FASLs represent code as
+JavaScript code instead of as Lisp data. The browser is able to load
+FASLs faster than SLip code because the browser's native JavaScript
+parser is much faster than SLip's reader.
+
+Despite the ability to produce FASLs, the interpreted nature of SLip
+precludes the system from producing competitively fast or small
+application deliverables. Consequently, SLip does not satisfy the JACL
+project goal of facilitating large-scale industrial SPA development.
 
 \subsection{JSCL}
 
 Of existing Common Lisps, JSCL is the one aligned most closely with
-the JACL project goal. Its features include the following:
+the JACL project goal. Unlike Parenscript, JSCL is a genuine Lisp
+system. And unlike SLip, JSCL compiles directly to JavaScript instead
+of to an interpreted bytecode.
+
+Its features include the following:
 
 \begin{itemize}
   \item Self-hosting compiler targeting JavaScript
@@ -235,59 +237,345 @@ the JACL project goal. Its features include the following:
   \item \texttt{CLOS}
 \end{itemize}
 
+The JSCL system and its features are organized hierarchically. I will
+now investigate how the lowest level Lisp and JavaScript operations
+are supported by JSCL.
+
+\subsubsection{JavaScript interoperation}
+
 JSCL's support for JavaScript interoperation consists of the following
 conventions and facilities:
 
 \begin{itemize}
-  \item Unboxed numbers: JSCL fixnums and flonums both map to
+  \item \textbf{Unboxed numbers}: JSCL fixnums and flonums both map to
     JavaScript's native \texttt{number} type. This allows for numeric
     performance competitive with JavaScript.
-  \item Functions are JavaScript functions: JSCL functions compile to
-    concrete JavaScript functions that may be invoked from JavaScript
-    without any special calling convention.
-  \item The \texttt{\#\textbackslash j} pseudo-symbol syntax which I will
-    elaborate on below.
+  \item \textbf{Native functions}: JSCL functions compile to concrete
+    JavaScript functions that may be invoked from JavaScript.
+  \item \textbf{\texttt{\#\textbackslash j} syntax}: The pseudo-symbol
+    reader syntax, which I will elaborate on below.
 \end{itemize}
 
-The JSCL reader parses tokens such as \texttt{\#\textbackslash
-  j:window:console:log} as segmented references to hierarchical
-objects in JavaScript's global environment.
+\subsubsection{\texttt{\#\textbackslash j} syntax}
+
+The JSCL reader parses tokens like \linebreak\texttt{\#\textbackslash
+  j:window:console:log} as case-sensitive references to hierarchical
+objects within JavaScript's global environment. Case preservation is
+necessary because JavaScript identifiers are case-sensitive.
 
 In a value context, references evaluate to the referenced JavaScript
-object.  \texttt{\#\textbackslash j:window:console:log} evaluates to
+object. For example, \texttt{\#\textbackslash j:window:console:log} evaluates to
 the same object that the JavaScript expression
 \texttt{window.console.log} does.
 
-In function position of operation forms --- as in
+In function position of operation forms --- as in \linebreak
 \texttt{(\#\textbackslash j:window:console:log $"$hello$"$)} ---
 references are taken as the name of a foreign function to call. The
 compiler generates code that coerces the arguments from Lisp to
-JavaScript types and applies them to the named foreign function. In
-this example, the Lisp string \texttt{$"$hello$"$} is converted to a
-JavaScript string before being passed to the function
+JavaScript types and applies them to the named foreign function.
+
+In the preceding example, the Lisp string \texttt{$"$hello$"$} is
+converted to a JavaScript string before being passed to the function \linebreak
 \texttt{window.console.log} for display in the JavaScript console.
 
-TODO Problems with JSCL REPL
-TODO Problems with JSCL interop
-TODO Problems with JSCL compiler organization
+JSCL has a distinct string type because unlike Common Lisp strings,
+JavaScript strings are immutable.
 
-\subsection{ClojureScript}
+Argument type coercion is automatic, cannot be disabled, and can be
+defeated by compound structures. For example, should a JavaScript
+array containing Lisp strings be passed as an argument, the inner Lisp
+strings would not be converted to JavaScript strings.
+
+\subsubsection{Multiple values}
+
+Despite being native JavaScript function objects, JSCL functions
+require a special calling convention. As JavaScript functions, JSCL
+functions always require one more argument than they declare. This
+required first argument is the function to which any returned values
+should be applied.
+
+This argument is the basis of a clever and robust mechanism for
+supporting multiple value returns.
+
+The JSCL compiler supplies this argument at function call sites
+automatically. The value of the supplied function depends on the
+calling context. If the called function is called in a context in
+which only a single value is required, the first argument is a
+function that returns its first argument and discards all others.
+
+If the function is called in a context that demands multiple values,
+the compiler supplies as the first argument a function that returns
+all of its argument as a tagged array.
+
+JSCL functions that appear as arguments in a \texttt{\#\textbackslash
+  j} syntax foreign function call are wrapped in a JavaScript function
+that supplies the first required argument. In this way, JavaScript
+code may call the functions it receives through FFI without supplying
+the first argument.
+
+This arrangement implies the reasonable limitation that JavaScript
+functions will never receive a tagged multiple values array from
+called JSCL functions.
+
+However, it does require that all JSCL functions passed to JavaScript,
+by any means, must be suitably wrapped.
+
+\subsubsection{REPL}
+
+JSCL includes a reader, compiler, and printer, and evaluation is
+performed by JavaScript's \texttt{eval} function. Between these, a
+Read Eval Print Loop (REPL) is possible, and the JSCL distribution
+includes a reference implementation.
+
+However, JSCL only supports reading from string-backed input streams.
+Input streams are not an abstraction supported by JavaScript in Web
+browsers. With a few obscure exceptions [foot], JavaScript programs
+may only receive input asynchronously.
+
+Effectively, the JSCL reader consumes strings, not streams in the
+traditional sense. An error is signaled if the end of the string is
+encountered before the reader has finished reading a datum.
+
+Because input strings may not contain partial data, the REPL
+necessitates a ``pre''-reader process that accumulates characters in
+response to asynchronous input events, and invokes the reader only
+once a complete form --- as a string --- has been accumulated.
+
+Such a pre-reader can be found in the JSCL REPL implementation. It
+handles standard Lisp atoms and list syntax, but would be stymied by
+reader macros. As such, the pre-reader is a separate, degenerate
+reader that limits what's capable of being read by the underlying,
+full-featured synchronous reader.
+
+Because JSCL includes no demonstration of embedding the REPL in an
+application, and because JSCL itself is compiled in batch through a
+synchronous process, the limitations of the pre-reader appear not to
+encumbered those who have used JSCL to date. However, the lack of a
+full-fledged asynchronous, embeddable reader severely limits JSCL as
+an interactive Lisp development environment.
+
+\subsubsection{Compiler organization}
+
+JSCL compilation involves the following two high-level stages:
+
+\begin{itemize}
+  \item Conversion from Lisp to JavaScript AST represented as
+    S-expressions.
+  \item Conversion from JavaScript AST to JavaScript strings.
+\end{itemize}
+
+The first stage, the conversion from Lisp to JavaScript Abstract
+Syntax Trees (AST), is where the implementation of Lisp's special
+forms in terms of JavaScript language constructs and runtime support
+is performed. This is done in a single pass in which macro expansion,
+lexical analysis, and JavaScript AST generation all happen. The
+lexical environment is maintained in a special variable as the
+compiler descends into Lisp code.
+
+Code for \texttt{TAGBODY}/\texttt{GO} is generated in the first
+stage, and the generated code is much slower than comparable
+JavaScript code for looping.
+
+Only the general, dynamic case of \texttt{TAGBODY}/\texttt{GO} is
+implemented. Every control transfer initiated by \texttt{GO} results
+in a JavaScript exception being thrown, an expensive operation.
+
+Since many Common Lisp operators have \emph{implicit tagbodies}, and
+since most other iteration operators are expressed in terms of
+\texttt{TAGBODY}/\texttt{GO}, this performance problem pervades the
+JSCL system.
+
+Unfortunately, the first stage of the JSCL compiler does not define
+its own AST, and so there is no framework for implementing
+sophisticated, staged optimizations, such as those that could mitigate
+performance problems like this one.
+
+Optimizations such as this are particularly important for a Lisp to
+JavaScript compiler to perform, because they are exactly the kind that
+a JavaScript optimizer could not perform, provided only JavaScript
+code.
+
+Curiously, the arithmetic optimizations performed by the second,
+code-generation stage are \emph{exactly} the kind that JavaScript
+optimizers excel at.
+
+If JSCL oriented itself toward generating code optimizable by third
+party tools, then its own internals could be simplified to a degree.
+
+\section{Design and implementation}
+
+From a design perspective, JACL is an effort to balance the
+requirements of an interactive and practical Lisp development
+environment with the constraints imposed by the Web browser platform.
+
+JACL proposes several design innovations in pursuit of this balance.
+
+\subsection{Asynchronous reader}
+
+The basis for interactive development in Lisp is undeniably the REPL,
+but as JSCL's ``pre-reader'' conundrum demonstrates, even the direct
+approach to this simple mechanism is hampered by JavaScript's
+asynchronous model of input.
+
+To resolve this difficulty, JACL's reader facility is completely
+asynchronous. It is the ``pre-reader'' taken to its inevitable
+conclusion.
+
+JACL's reader is implemented as a JavaScript class,
+\texttt{Reader}. \texttt{Reader} instances are parameterized by an
+input source. One such input source is the \texttt{BufferedStream}
+class. The input source asynchronously notifies the reader instance
+when characters are available. The reader incrementally consumes these
+characters. Once the reader has accumulated a Lisp datum, it notifies
+its consumers of the availability of the datum.
+
+The JACL reader implementation makes extensive use of modern
+asynchronous JavaScript programming techniques including Promises,
+iterators, async functions, async iterators, and the \texttt{await}
+keyword.
+
+This language support for asynchronous programming reduces verbosity
+of asynchronous code, while also allowing it to be nearly as easy to
+follow as comparable synchronous code. For example, the following is
+an example of instantiaing a \texttt{BufferedStream} and
+\texttt{Reader}, sending in characters asynchronously after a delay,
+and then displaying the resulting Lisp data to the JavaScript console.
+
+\begin{verbatim}
+let bs = new BufferedStream(),
+    rdr = new Reader(bs);
+
+window.setTimeout(() => {
+  bs.writeEach("123 ");
+}, 1000);
+
+console.log(await rdr.read());
+\end{verbatim}
+
+In the preceding example, \texttt{window.setTimeout()} is used to
+enqueue a JavaScript function for execution after 1000
+milliseconds. The enqueued function writes input to the
+\texttt{BufferedStream} \texttt{bs}.
+
+Before the enqueued function is invoked, execution proceeds to the
+\texttt{console.log} call, but is suspended by the \texttt{await}
+keyword.
+
+The \texttt{await} keyword expects a Promise object on its right side,
+and will suspend JavaScript execution until the Promise has
+``resolved'', or notified its subscribers that the pending computation
+it represents has completed. \texttt{rdr.read()} returns such a
+Promise.
+
+Once \texttt{rdr} has completed a form, the \texttt{await} evaluates
+to the number \texttt{123} and the number is printed to the JavaScript
+console.
+
+The ``read'' portion of JACL's REPL is satisfied by first establishing
+\texttt{BufferedStream} and \texttt{Reader} objects. Then, in an
+asynchronous loop, objects are consumed from the \texttt{Reader},
+analyzed, compiled, and evaluated.
+
+Concurrently, characters may be sent to the \texttt{BufferedStream}
+instantiated by the REPL by calling its \texttt{write()} or
+\texttt{writeEach()} methods.
+
+Neither character input nor read object consumption impede other
+JavaScript operations, so the JACL REPL model supports embedding in
+applications.
+
+It is hoped that in the future, JACL will be written in itself, and so
+all of these techniques will be accessible from Lisp.
+
+\subsection{Chrome Devtools REPL}
+
+A browser-based REPL facilitates experimentation with the language by
+interested people, from the comfort of their Web browsers.
+
+It's also a useful debugging feature of a deployed application.
+
+However, most developers already have a preferred text editor and a
+refined REPL interaction workflow, so it's not within the JACL project
+scope to build a resident text editor or IDE in the style of
+SLip. Even if a resident editor was a goal, the file system access
+restrictions imposed by the browser would present significant
+challenges.
+
+Instead, JACL offers an alternative development REPL approach that
+requires minimal host tooling: the DevTools-based REPL.
+
+For each tab, Google Chrome is capable of hosting a WebSocket-based
+debug endpoint that implements the DevTools Protocol. The Chrome
+DevTools Protocol is a standard for JSON object interchange over
+WebSocket that makes all of the browser's debugging facilities
+available remotely, over the network.
+
+JACL leverages this feature of Chrome to deliver a command-line REPL
+client that may be run on the developer's host machine. The workflow
+is the following:
+
+\begin{enumerate}
+  \item Run Google Chrome from the shell with the
+    \texttt{--remote-debugging-port} parameter.
+  \item Navigate to the Web site hosting the JACL system you wish to
+    interact with.
+  \item Run \texttt{jacl-repl} from the shell.
+  \item Be presented with a Lisp prompt.
+\end{enumerate}
+
+As a simple command-line application, \texttt{jacl-repl} can be run in
+various contexts. For example, it could be run within an Emacs
+``inferior-lisp'' buffer, and then Lisp forms could sent from other
+Emacs buffers for evaluation in the REPL.
+
+\texttt{jacl-repl} is currently an R script, but a standalone binary
+executable is imagined in the future.
+
+An advantage of the reader approach taken by JACL with respect to the
+implementation of \texttt{jacl-client} is that the REPL client is
+completely ignorant of Lisp syntax and semantics. \texttt{jacl-client}
+merely buffers and sends characters over the WebSocket and into the
+Lisp system, and displays characters output from Lisp. While
+\texttt{jacl-client} presents a synchronous input/output interface, it
+is actually the frontend for bidirectional asynchronous data transfer.
+
+\subsection{Analyzing compiler}
+
+The JACL compiler is organized to facilitate high-level optimizations,
+such as those necessary for efficient compilation of
+\texttt{TAGBODY}/\texttt{GO}.
+
+Low level optimizations are left to (optional) JavaScript optimizers
+that can be applied to delivered executables, which I will outline
+next.
+
+The compiler has one analysis pass that results in an AST. The AST is
+then converted to JavaScript strings by a code generation step.
+
+Each AST node is a JavaScript object with at least the following
+keys:
+
+\begin{itemize}
+  \item \texttt{op}: The node name, a JavaScript string.
+  \item \texttt{env}: An \texttt{Environment} object that represents the node's lexical environment.
+  \item \texttt{parent}: The node's parent; this is \texttt{null} for the root.
+  \item \texttt{form}: The node's original source data, a Lisp datum.
+\end{itemize}
+
+TODO
+
+\subsection{Delivery}
+
+TODO
+
+\section{Conclusions and Future Work}
+
+TODO
+
+\section{Acknowledgments}
+
+TODO
 
-%% \subsection{Template Styles}
-%% 
-%% The primary parameter given to the ``\verb|acmart|'' document class is
-%% the {\itshape template style} which corresponds to the kind of publication
-%% or SIG publishing the work. This parameter is enclosed in square
-%% brackets and is a part of the {\verb|documentclass|} command:
-%% \begin{verbatim}
-%%   \documentclass[STYLE]{acmart}
-%% \end{verbatim}
-%% 
-%% Journals use one of three template styles. All but three ACM journals
-%% use the {\verb|acmsmall|} template style:
-%% \begin{itemize}
-%% \item {\verb|acmsmall|}: The default journal template style.
-%% \item {\verb|acmlarge|}: Used by JOCCH and TAP.
 %% \item {\verb|acmtog|}: Used by TOG.
 %% \end{itemize}
 %%