| author | Alan
<alan@minerva.local> 2026-05-02 20:40:27 UTC |
| committer | Alan
<alan@minerva.local> 2026-05-02 20:40:27 UTC |
| parent | d9c207523a1525c62396823efdd3b16c44591d5a |
| specs/stage2.md | +604 | -454 |
| specs/stage3.md | +761 | -0 |
diff --git a/specs/stage2.md b/specs/stage2.md index fe3ad5c..1117aff 100644 --- a/specs/stage2.md +++ b/specs/stage2.md @@ -1,37 +1,146 @@ # Larquil Bootstrap Language (Stage 2) -Stage 2 is the first Larquil stage whose purpose is language growth rather than +Stage 2 is the first Larquil stage whose purpose is language architecture, not only compiler replacement. -Purpose: evolve the Stage 1 self-hosting compiler toward a small Lisp that can -build its own surface language from a tiny kernel. +Purpose: evolve the Stage 1 self-hosting compiler toward a real Lisp compiler +while preserving Stage 0 and Stage 1 bootstrap source compatibility. -Goal: keep the existing `function` form as the primitive code form, make -function bodies a bytecode template language with expression holes, add source -quotation, add namespaces, add closures, and make macros powerful enough that -most future language forms can be written in Larquil itself. +Goal: establish Larquil's semantic compiler pipeline, symbol/namespace model, +internal environment/binding discipline, closure groundwork, and low-level JVM +power boundary. The existing `function` body template language remains valid, +but it is no longer treated as direct backend emission. Template code and +ordinary expression code both enter explicit compiler IR before bytecode is +emitted. This is still a bootstrap language. It is not the final Larquil or Riptide language. --- +# Design Lineage + +Stage 2 is guided by two compatible compiler traditions. + +The Nanopass tradition says to build a compiler as many small, explicit passes +over well-defined intermediate languages. A change in compiler knowledge +should usually appear as a change in IR shape or a pass-local environment, not +as a hidden side effect inside a monolithic lowering routine. + +The SICL/Cleavir tradition says to make environment-sensitive meaning explicit +early, then move through representation levels deliberately: + +```text +concrete source -> resolved AST -> high-level IR -> medium/backend IR -> code +``` + +For Larquil, these are not competing ideas. Stage 2 treats Cleavir-like IR +levels as the durable semantic boundaries, and expects nanopass-sized +transformations inside and between those boundaries. + +The important synthesis: + +- symbols are syntax +- environments assign meaning +- internal bindings accumulate compiler knowledge +- high-level IR still speaks in Larquil values and operations +- JVM representation details appear only after semantic analysis +- user-facing low-level power is preserved as a checked source sublanguage, not + as direct access to the compiler backend + +--- + +# Consulted References + +The following materials motivate this design. They are not normative, but they +explain the architectural pressure behind this stage. + +- Dipanwita Sarkar, Oscar Waddell, R. Kent Dybvig, "A Nanopass Framework for + Compiler Education", Journal of Functional Programming, 2005. + DOI: `10.1017/S0956796805005605`. + The central lesson is that many fine-grained passes with explicit input and + output languages are easier to understand, test, and evolve than a few + monolithic passes. + +- Andrew W. Keep, R. Kent Dybvig, "A Nanopass Framework for Commercial Compiler + Development", ICFP 2013. + DOI: `10.1145/2544174.2500618`. + The relevant lesson is that nanopass organization is not only pedagogical. + Chez Scheme's nanopass rewrite used many more passes than the previous + compiler while preserving practical compile time and improving generated code + quality. + +- Nanopass Framework documentation and tutorials. + Reference: `https://nanopass.org/documentation.html`. + The useful abstraction is `define-language` plus `define-pass`: the grammar + of an intermediate language is part of the design, and each pass states what + language it consumes and produces. + +- Irène Durand, Robert Strandh, "Bootstrapping Common Lisp using Common Lisp", + European Lisp Symposium, 2019. + Reference: `https://zenodo.org/records/2634314`. + The key bootstrapping idea for Larquil is isolation between host and target + environments through explicit global environments, rather than accidental + reliance on the host's current image. + +- Robert Strandh, "SICL: Building blocks for implementers of Common Lisp + systems", 2010. + Reference: `https://dept-info.labri.fr/~strandh/sicl.pdf`. + The relevant ideas are modular implementation-independent layers, as few + lower-layer primitives as practical, explicit declaration-manipulation + modules, and high-quality errors in terms of source code rather than expanded + or lowered code. + +- Cleavir documentation. + Reference: `https://metamodular.com/SICL/cleavir.pdf`. + Cleavir's CST -> AST -> HIR -> MIR -> LIR organization is the main model for + Larquil's semantic layering. In particular, HIR keeps operations at the Lisp + object level so type inference and source-level optimization happen before + representation details dominate. + +- Robert Strandh, "Partial Inlining Using Local Graph Rewriting". + Reference: `https://metamodular.com/SICL/partial-inlining.pdf`. + The important compiler lesson is that lexical names are converted to unique + objects before optimization, and that a graph-like HIR enables local + transformations without requiring source-level reconstruction. + +- Ashley and Dybvig, "A Practical and Flexible Flow Analysis for Higher-Order + Languages", ACM TOPLAS, 1998. + DOI: `10.1145/291891.291898`. + This motivates making flow/control facts explicit enough that later passes can + support higher-order optimization, assignment, and control operators. + +- Common Lisp compiler practice, especially delayed ordinary runtime reference + resolution, declarations, compiler macros, and separation between source + symbols and compiler environments. + Larquil is not copying Common Lisp's package system, but it does preserve the + useful distinction between reading symbols and resolving references. + +--- + # Core Principles 1. Preserve Stage 0 and Stage 1 source compatibility where practical. -2. Keep `function` as the primitive callable form. -3. Treat function bodies as bytecode templates. -4. Add `expr` as the bridge from bytecode templates back into source - expression compilation. -5. Add quotation and reader support for macro-writing syntax. -6. Add namespaces as the global source-level organization model. -7. Do not expose a public Var or Binding object in Stage 2. -8. Delay ordinary namespace reference resolution in the Common Lisp style. -9. Make macro expansion an explicit compiler phase. -10. Keep higher-level forms out of the primitive kernel unless they cannot be - bootstrapped from `function`, `expr`, quotation, namespaces, and macros. - -The existing function body language is the bytecode template substrate. +2. Keep the public runtime vocabulary small. +3. Keep `function`, `quote`, and `expr` as the only primitive source forms. +4. Treat existing function bodies as low-level template regions, not direct + bytecode emission. +5. Preserve near-JVM power for users and bootstrap code through checked + low-level operations. +6. Make reader output, resolved code, high-level IR, JVM IR, and class emission + distinct compiler levels. +7. Resolve executable symbols into compiler-private references before lowering. +8. Attach future declarations and inferred facts to internal bindings and IR + values, not to source symbols. +9. Support closures through `function`. +10. Delay ordinary namespace reference resolution in the Common Lisp style. +11. Keep `load-function` as legacy helper lookup, not namespace lookup. +12. Do not introduce a broad permanent surface language in this stage. + +The central Stage 2 deliverable is not `expr` by itself. +The central deliverable is the compiler spine that lets low-level template code, +ordinary expressions, closures, namespaces, and future surface forms share one +semantic path toward JVM code. --- @@ -43,37 +152,58 @@ Stage 2 keeps the Stage 1 runtime model where possible: - String literal -> `String` - Boolean literal -> `Boolean` - Nil/null literal -> `null` -- Name -> Larquil name object -- List -> `List<Object>` -- Function -> callable object implementing the Larquil function ABI +- Symbol -> Larquil `Symbol` +- List -> `List<Object>`, with `java.util.ArrayList` as the bootstrap representation +- Function -> `IFunction` - Namespace -> Larquil namespace object -Stage 2 replaces the Stage 0 idea of a simple symbol with a name value. -A name is source data. +Stage 2 keeps the Stage 0/Stage 1 `Symbol` vocabulary and extends symbols with +optional namespace qualification. +The representation should leave room for generated-symbol identity, but Stage 2 +does not need to expose a generated-symbol facility. +A symbol is source data. It is not a mutable variable cell. It does not itself carry a namespace binding. +It does not carry type information. +It does not carry compiler resolution information. -Minimum name information: +Minimum symbol information: ```java -final class Name { - final String namespace; // null for unqualified names +final class Symbol { + final String namespace; // null for unqualified symbols final String name; - final Object identity; // optional implementation-private identity for gensyms + final Object identity; // optional implementation-private identity } ``` The exact representation is implementation-defined. The observable requirements are: -- two reader-created unqualified names with the same spelling compare equal -- two reader-created qualified names with the same namespace and local name +- two reader-created unqualified symbols with the same spelling compare equal +- two reader-created qualified symbols with the same namespace and local name compare equal -- a generated name produced by `gensym` does not collide with any - reader-created source name -- printed generated names may be diagnostic strings, but equality must not +- if generated symbols are present internally, they do not collide with any + reader-created source symbol +- printed generated symbols may be diagnostic strings, but equality must not depend only on those diagnostic strings +Stage 2 does not introduce public `Name`, `Var`, `Binding`, `Slot`, cons cell, +condition object, or new function protocol objects. +Implementations may use private records internally. + +Lists in Stage 2 are source-form and bootstrap compiler containers. +They are not a final cons-cell ontology. + +Closure objects produced by capturing `function` forms still implement +`IFunction`. + +Truth remains Stage 0-compatible until a later stage explicitly changes it: + +- `null` is falsey +- `Boolean.FALSE` is falsey +- all other values are truthy + --- # Reader Behavior @@ -84,25 +214,21 @@ The reader recognizes: - string literals - booleans, if enabled by the Stage 2 runtime - `null`, if enabled by the Stage 2 runtime -- names -- qualified names +- symbols +- qualified symbols - lists - quote reader syntax -- quasiquote reader syntax -- unquote reader syntax - line comments Reader examples: ```lisp -foo ; Name(null, "foo") -larquil.core/+ ; Name("larquil.core", "+") +foo ; Symbol(null, "foo") +larquil.core/+ ; Symbol("larquil.core", "+") 123 ; Long "abc" ; String (a b c) ; List<Object> 'foo ; (quote foo) -`(if ,x y z) ; (quasiquote (if (unquote x) y z)) -`(do ,@body) ; (quasiquote (do (unquote-splicing body))) ``` Line comments begin with `;` and continue to the end of the line. @@ -116,11 +242,11 @@ String literal escapes remain at least: Unsupported string escapes are reader errors. -## Name Grammar +## Symbol Grammar -Stage 2 names are case-sensitive. +Stage 2 symbols are case-sensitive. -Names must admit ordinary Lisp operator spellings such as: +Symbols must admit ordinary Lisp operator spellings such as: ```lisp + @@ -128,11 +254,14 @@ Names must admit ordinary Lisp operator spellings such as: * < <= += set! +even? +&body ``` -A qualified name has exactly one `/`. -The namespace name is on the left and the local name is on the right: +A qualified symbol has exactly one `/`. +The namespace name is on the left and the local symbol name is on the right: ```lisp larquil.core/map @@ -156,9 +285,63 @@ larquil.boot my.app.main ``` -Source names and generated JVM names are separate concepts. -The compiler may munge source names into JVM class, field, or method names, -but that munging does not change source name equality or namespace lookup. +Source symbols and generated JVM names are separate concepts. +The compiler may munge source symbols into JVM class, field, or method names, +but that munging does not change source symbol equality or namespace lookup. + +## Reader And Current Namespace + +The reader does not consult the current namespace. + +Reading: + +```lisp +foo +``` + +always produces: + +```text +Symbol(null, "foo") +``` + +Reading: + +```lisp +larquil.core/foo +``` + +always produces: + +```text +Symbol("larquil.core", "foo") +``` + +Quotation preserves that reader result: + +```lisp +'foo ; unqualified symbol +'larquil.core/foo ; qualified symbol +``` + +The current namespace participates in expression resolution, not reading. +This deliberately avoids Common Lisp-style read-time package interning in +Stage 2. + +Stage 2 has no `pkg:sym`, no `pkg::sym`, no keyword package, no read-time +current-package qualification, no symbol value cells, no symbol function cells, +and no symbol property lists. + +## Symbol Equality And Interning + +Reader-created symbols with the same `(namespace, name)` compare equal. +Implementations may intern or canonicalize reader-created symbols, but +interning is an implementation detail. + +If generated symbols are present internally, they compare by generated +identity. +A generated symbol is never equal to a reader-created symbol, even if its +diagnostic printed text matches a source spelling. --- @@ -174,26 +357,18 @@ quote `quote` is primitive because it suppresses evaluation and returns source data. -Quasiquote syntax is not primitive evaluator syntax. -The reader expands it to ordinary macro-call forms: +`function` is primitive because it creates Larquil callable values and is the +existing bootstrap boundary for low-level code. -```lisp -`x ; (quasiquote x) -,x ; (unquote x) -,@x ; (unquote-splicing x) -``` - -`quasiquote` is a bootstrap macro. -`unquote` and `unquote-splicing` are meaningful only while that macro expands -quasiquoted source. -Using `unquote` or `unquote-splicing` outside a quasiquote expansion is an -error. +`expr` is primitive only as a boundary form inside low-level template regions. +It allows ordinary expression compilation to occur at a specific point in a +template body. +It is not intended to become the center of the final language. Stage 2 does not make these forms primitive: ```lisp def -defmacro fn let if @@ -205,31 +380,20 @@ return-from return ``` -Those forms are expected to be bootstrapped as macros. - --- # Source File Structure A source file contains a sequence of top-level forms. -Stage 2 top-level forms are processed left-to-right for macro expansion and -load-time effects. +Stage 2 top-level forms are processed left-to-right for load-time effects. Ordinary runtime references to later namespace values are allowed. -Macro references to later macro definitions are not available unless an earlier -compile-time action has installed the macro. -Initial top-level forms accepted before the macro layer is bootstrapped: +Top-level forms accepted in Stage 2: - named legacy `function` forms - top-level IIFEs - literal values, which are ignored for load-time effects -- explicit bootstrap compile-time registration forms recognized by the Stage 2 - compiler - -After macros are bootstrapped, top-level macro forms such as `def`, -`defmacro`, and `in-namespace` may expand to those primitive load-time or -compile-time actions. Initial namespace: @@ -317,10 +481,10 @@ Examples: Rules: -- parameter names are lexical names -- parameter names must be unqualified -- duplicate parameter names in one function are errors -- function bodies are bytecode template bodies +- parameter symbols are lexical symbols +- parameter symbols must be unqualified +- duplicate parameter symbols in one function are errors +- function bodies are low-level template regions - every existing Stage 1 instruction remains valid in a function body - `expr` is valid in a function body - labels are local to one function body @@ -331,15 +495,20 @@ Named top-level `function` remains accepted for bootstrap convenience. It keeps the Stage 1 helper-function role. It is not the general namespace definition form. -Later `def` and `fn` forms may be macros over primitive `function`. -Namespace-level definitions should be expressed through `def` or through the -manual namespace-assignment code that bootstraps `def`. +Namespace-level definitions should be expressed through explicit namespace +assignment code in this stage. --- -# Function Template Bodies +# Low-Level Template Regions -A function template body is a sequence of template forms. +A function body is a low-level template region. + +This region preserves near-JVM power for end users and bootstrap code. +It is intentionally more capable than ordinary high-level expression syntax. +It is the answer to a problem Clojure leaves to Java interop or Java source: +Larquil should allow carefully written source to express low-level JVM actions +directly when that is the right tool. Template forms include: @@ -367,7 +536,7 @@ Stage 1 instruction example: (return)) ``` -Template plus expression hole example after `+` is available: +Template plus expression boundary example after `+` is available: ```lisp (function add2 (x) @@ -375,9 +544,23 @@ Template plus expression hole example after `+` is available: (return)) ``` -The template instruction language remains stack-oriented. -`expr` is the only Stage 2 bridge from template code into source expression -compilation. +Template instructions do not bypass compiler IR. +They are parsed into low-level operation nodes, checked, and then lowered. + +The low-level region must be checked for: + +- lexical scope of `load` and `store` +- local label uniqueness +- branch target locality +- operand stack effect +- JVM category-1/category-2 stack consistency +- object versus raw JVM value consistency +- method and field descriptor validity +- no branch into or out of expression-lowered internal control flow +- JVM verifier compatibility after lowering + +The first implementation may check only the subset needed for existing +instructions, but the architecture must admit these checks as explicit passes. `load-function` remains a legacy template instruction. It resolves named helper functions in the Stage 1 compatibility model. @@ -388,7 +571,7 @@ through explicit namespace helper calls emitted by bootstrap code. --- -# `expr` Holes +# `expr` Boundaries Syntax: @@ -396,17 +579,19 @@ Syntax: (expr source-expression) ``` -`expr` is valid only in a function template body. +`expr` is valid only in a low-level template region. Semantics: -1. The compiler macroexpands `source-expression`. -2. The compiler resolves it in the current lexical and namespace environment. -3. The compiler emits code for the expression at the current template position. -4. The emitted expression leaves exactly one object value on the operand stack. +1. The compiler resolves `source-expression` in the current lexical and + namespace environment. +2. The expression is compiled through the same resolved AST and Larquil HIR + path as any other Stage 2 expression. +3. The expression is lowered at the current template position. +4. The lowered expression leaves exactly one object value on the operand stack. 5. Control continues with the next template form. -Example after arithmetic functions or macros have been bootstrapped: +Example after arithmetic functions or helpers have been bootstrapped: ```lisp (function print-add2 (x) @@ -430,11 +615,12 @@ This example assumes `+` is available: Rules: - `expr` must have exactly one source expression operand -- an `expr` expression must leave exactly one runtime value -- an `expr` expression must not branch into or out of the containing template - body except through code generated as part of that expression +- an `expr` expression must leave exactly one runtime object value +- an `expr` expression must not branch into or out of the containing low-level + template region except through compiler-represented structured control - expression-generated internal labels must not collide with user-written template labels +- `expr` is a compatibility bridge, not a general block construct --- @@ -449,11 +635,11 @@ Syntax: Semantics: -- quoted names become Larquil name objects -- quoted qualified names preserve namespace qualification +- quoted symbols become Larquil `Symbol` objects +- quoted qualified symbols preserve namespace qualification - quoted lists become runtime list structure - quoted integers, strings, booleans, and null become themselves -- quote does not resolve names +- quote does not resolve symbols Examples: @@ -476,88 +662,32 @@ reads as: (quote x) ``` -Quoted names are not strings: +Quoted symbols are not strings: ```lisp -'foo ; name object "foo" ; string +'foo ; symbol ``` --- -# Quasiquotation - -Quasiquote reader syntax is provided for macro authoring. -It reads to ordinary forms that are expanded by the bootstrap `quasiquote` -macro. - -Reader syntax: - -```lisp -`datum -,expr -,@expr -``` - -Reader expansion: - -```lisp -`x ; (quasiquote x) -,x ; (unquote x) -,@x ; (unquote-splicing x) -``` - -The bootstrap `quasiquote` macro must support: - -- names -- qualified names -- primitive literals -- lists -- unquote inside lists -- unquote-splicing inside lists - -Example: - -```lisp -`(if ,test ,then ,else) -``` - -constructs a list whose first element is the name `if`, whose remaining -elements are the values of `test`, `then`, and `else`. - -Example: - -```lisp -`(do ,@body) -``` - -constructs a list whose first element is the name `do`, followed by the -elements of the list value `body`. - -Rules: - -- `unquote` is valid only while expanding `quasiquote` -- `unquote-splicing` is valid only while expanding a list inside `quasiquote` -- malformed quasiquote forms are macro-expansion errors -- nested quasiquote may be deferred if Stage 2 bootstrapping does not need it - ---- - # Namespaces -A namespace is the source-level global container for values and macro -transformers. +A namespace is the source-level global container for runtime values. Stage 2 specifies namespaces as source semantics. -It does not specify a public Var or Binding object. +It does not specify a public `Var` or `Binding` object. A namespace has: - a canonical name -- a map from local names to runtime values -- a map from local names to macro transformers +- a map from local symbol names to runtime values - implementation-private metadata as needed +Namespace maps are keyed by the unqualified local part of a symbol. +For a qualified symbol, the namespace part selects the namespace and the local +part selects the entry inside that namespace. + The implementation may store private slots or binding records internally. Those are not Stage 2 source-level values. @@ -577,24 +707,22 @@ user The current namespace is compiler/load state. Stage 2 does not require a primitive source form for changing it. Bootstrap code may change it through explicit runtime/compiler helper calls. -After macros are bootstrapped, a surface form such as `in-namespace` may be -defined as a macro. -Qualified name: +Qualified symbol: ```lisp larquil.core/list ``` -means local name `list` in namespace `larquil.core`. +means local symbol `list` in namespace `larquil.core`. -Resolution order for ordinary expression names: +Resolution order for ordinary expression symbols: 1. lexical locals and parameters 2. current namespace value table -3. delayed namespace reference if no current value is known +3. delayed current-namespace reference if no current value is known -Qualified names bypass lexical lookup: +Qualified symbols bypass lexical lookup: ```lisp (function f (list) @@ -602,10 +730,10 @@ Qualified names bypass lexical lookup: (return)) ``` -The `expr` reference above denotes the namespace value `list` in +The `expr` reference above denotes the namespace value for symbol `list` in `larquil.core`, not the parameter `list`. -Unqualified lexical names shadow current namespace names: +Unqualified lexical symbols shadow current namespace symbols: ```lisp (function f (list) @@ -617,6 +745,58 @@ The `expr` reference above denotes the parameter `list`. --- +# Internal Environments And Bindings + +Stage 2 does not expose public binding objects, but the compiler should use +private binding and reference records internally. + +This is the compiler-side distinction: + +```text +Symbol source data created by the reader +Binding compiler-private object naming what a symbol means +Reference compiler-private use of a binding +Namespace runtime/source-level global container +``` + +After resolution, executable symbol occurrences must not remain raw `Symbol` +objects in compiler IR. +They become one of: + +- lexical reference +- namespace reference +- delayed namespace reference +- legacy helper-function reference, where Stage 1 compatibility requires it + +Internal bindings are the unit to which later compiler knowledge attaches: + +- declared type +- inferred type +- arity or call shape +- mutability +- phase +- capture status +- source location +- namespace resolution state +- representation choice + +These records are not public Larquil values. +They do not imply a Clojure-style `Var` model. +They do not imply Common Lisp symbol value cells or function cells. + +This design leaves a clear path for future declaration forms: + +```lisp +;; illustrative future direction, not Stage 2 primitive syntax +(declare n larquil.core/Long) +``` + +A declaration should affect the environment and the relevant internal binding. +It should not mutate the source `Symbol`. +The declaration may later be checked, refined, or erased by subsequent passes. + +--- + # Expression Semantics `expr` compiles Stage 2 source expressions. @@ -628,12 +808,7 @@ Primitive expression forms: (function (<param> ...) template-form...) ``` -All other list expressions are processed as follows: - -1. If the first element is a name that names an installed macro in the current - expansion environment, the form is macroexpanded and compilation continues - on the expansion. -2. Otherwise the form is a function call. +All other list expressions are function calls. Function call semantics: @@ -661,34 +836,16 @@ The second form resolves `+` in namespace `larquil.core`. The compiler may optimize statically known calls, but the source semantics are ordinary operator-position evaluation in a Lisp-1 value namespace. +--- + # Delayed Namespace Resolution Stage 2 follows a Common Lisp-style delayed resolution model for ordinary -runtime names. - -When the compiler sees a non-lexical name that is not currently defined in the -namespace, it may record a namespace reference instead of failing immediately. - -Example intended surface after `def`, `fn`, and `if` are bootstrapped: - -```lisp -(def even? - (fn (n) - (if (= n 0) - true - (odd? (- n 1))))) - -(def odd? - (fn (n) - (if (= n 0) - false - (even? (- n 1))))) -``` +runtime symbols. -This source should not require a Clojure-style forward declaration for `odd?`. -The delayed reference is a namespace reference created by the use of `odd?` -inside the value assigned by `def even?`. -It is not a consequence of named top-level `function` by itself. +When the compiler sees a non-lexical symbol that is not currently defined in +the namespace, it may record a delayed namespace reference instead of failing +immediately. Rules: @@ -697,125 +854,11 @@ Rules: unit ends - evaluating a still-unbound namespace reference at runtime is an undefined-name error -- macro names are not delayed in the same way; a macro must be installed before - a form using that macro is expanded - ---- - -# Macro Expansion - -Stage 2 has a macro-expansion phase before expression lowering. - -Macro transformers are ordinary Larquil functions registered in a namespace -macro table. +- delayed namespace references are explicit compiler IR nodes, not string + lookups scattered through JVM lowering -A transformer receives source data and returns replacement source data. -The exact bootstrap ABI may be implementation-defined, but it must be -documented before Stage 2 is considered complete. - -Recommended initial ABI: - -```lisp -(function (form env) - ...) -``` - -where: - -- `form` is the full source form being expanded -- `env` is an expansion environment object or `null` until an environment - object exists -- the return value is replacement source - -`defmacro` is not primitive in Stage 2. -It is expected to be manually bootstrapped using explicit compile-time -registration actions and template instructions. - -Macro registration used by later forms in the same compilation unit must happen -during compilation, not only when the generated loader is run. -The first implementation may provide a bootstrap compile-time execution path -for the explicit registration code. - -Conceptual manual macro registration body: - -```lisp -(function () - (expr - (function (form env) - ;; macro transformer body - ...)) - (store transformer) - (expr 'when) - (load transformer) - (invokestatic "com/tailrecursion/larquil/stage2/Runtime" - "defineMacro" - "(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;") - (return)) -``` - -The exact helper owner/name/descriptors are provisional. -The spec requirement is that bootstrap source can arrange for this registration -body to run during compilation, before later forms that use the macro are -expanded. -It must not depend on ordinary load-time IIFE execution. - -Once bootstrapped, `defmacro` can be a macro whose expansion emits compile-time -macro registration code. - -Example intended surface after bootstrap: - -```lisp -(defmacro when (test &body body) - `(if ,test - (do ,@body) - null)) -``` - -This is not primitive Stage 2 syntax. -It is an example of the language Stage 2 is meant to enable. - ---- - -# Gensym - -Stage 2 provides a monotonically increasing gensym facility for macros. - -Initial access may be through raw template instructions calling runtime or -compiler helpers. - -After bootstrap, the facility should be available as: - -```lisp -larquil.boot/gensym -``` - -Example intended use: - -```lisp -(larquil.boot/gensym) -(larquil.boot/gensym "tmp") -``` - -Rules: - -- each call returns a generated name -- generated names cannot collide with reader-created source names -- generated names remain usable in quoted and quasiquoted forms -- generated name equality is by generated identity, not only by printed text -- gensym output is unique within one compiler/load session - -Example intended macro pattern: - -```lisp -(defmacro with-temp (value &body body) - (let ((t (larquil.boot/gensym "tmp"))) - `((function (,t) - ,@body) - ,value))) -``` - -The example uses macro-defined `defmacro` and `let`. -They are not primitive forms. +This avoids a Clojure-style forward declaration requirement for ordinary +runtime values while still letting the compiler keep precise reference objects. --- @@ -848,125 +891,174 @@ The anonymous function captures `n`. Rules: - non-capturing functions may compile to singleton helper classes -- capturing functions compile to closure objects or generated classes with - captured environment fields -- mutable captured locals must preserve shared mutation semantics once `set!` - is bootstrapped -- capture analysis is a compiler pass, not a macro responsibility +- capturing functions compile to `IFunction` closure objects or generated + classes with captured environment fields +- mutable captured locals must preserve shared mutation semantics once local + assignment is added +- capture analysis is a compiler pass, not an expression-lowering side effect +- capture facts attach to internal bindings and HIR values, not source symbols Stage 2 does not require final performance decisions for closure representation. It requires correct behavior and a compiler structure that can later optimize -non-capturing and non-escaping closures. +non-capturing, non-escaping, and stack-allocatable closures. --- -# Derived Forms +# Compiler Architecture -The following forms are expected to be defined as macros, not primitive forms: +Stage 2 establishes explicit IR strata. -```lisp -def -defmacro -fn -let -let* -if -do -set! -while -block -return-from -return +The implementation may use more nanopass-sized languages and passes than the +names listed here, but it should preserve these semantic boundaries: + +```text +Lread reader data +Lsurface primitive forms recognized +Lresolved executable symbols resolved to internal references +Lhir Larquil high-level IR +Ljvm JVM-oriented IR +Lclass emitted classfile bytes ``` -Expected `let` expansion shape: +## `Lread`: Reader Data -```lisp -(let ((x a) - (y b)) - body...) -``` +`Lread` contains only reader-produced data: -expands to: +- `Symbol` +- `ArrayList` +- strings +- integers +- booleans +- `null` -```lisp -((function (x y) - body...) - a - b) -``` +No executable symbol has meaning yet. +The current namespace has not affected unqualified symbols. +Source locations should be retained when practical. -Expected `let*` expansion shape: +## `Lsurface`: Surface AST -```lisp -(let* ((x a) - (y (+ x 1))) - body...) -``` +`Lsurface` recognizes the primitive forms: -expands to nested immediate function calls. +- `function` +- `expr` +- `quote` -Expected `fn` expansion shape: +At this level: -```lisp -(fn (x y) - body...) -``` +- malformed primitive forms are reported in source terms +- function parameter lists are checked +- low-level template regions are identified +- quote data is isolated from executable code +- raw symbols may still appear in executable positions -expands to: +## `Lresolved`: Resolved AST -```lisp -(function (x y) - body...) -``` +`Lresolved` replaces executable symbol occurrences with compiler-private +references. -Expected `def` behavior: +At this level: -```lisp -(def answer 42) -``` +- lexical references are distinct from namespace references +- delayed namespace references are explicit +- legacy helper references are distinct from namespace references +- qualified namespace references bypass lexical lookup +- unqualified lexical references shadow namespace references +- source symbols inside quoted data remain symbols -expands to code that evaluates `42`, assigns the resulting value into the -current namespace under name `answer`, and returns the value. +`Lresolved` is the first level where future declarations can be meaningfully +attached to bindings. -Expected `if` behavior: +## `Lhir`: Larquil High-Level IR -`if` may expand to generated labels and template branches. -The exact macro expansion may change as the compiler IR improves. +`Lhir` is the main semantic IR. ---- +It should still speak in Larquil-level values and operations: -# Compilation Model +- function literals +- calls +- returns +- lexical references +- namespace references +- delayed namespace references +- quoted literals +- closure captures +- low-level template operations +- labels and branches +- expression-boundary results -Stage 2 compiler structure should be explicit. +`Lhir` is the right level for: -Minimum pipeline: +- closure analysis +- simple call analysis +- source-level inlining later +- declaration checking later +- type inference later +- escape analysis later +- control-flow normalization later -1. Read source into raw forms with names, qualified names, and reader - expansions for quote and quasiquote syntax preserved. -2. Expand the bootstrap `quasiquote` macro where macro source uses - quasiquote syntax. -3. Expand macros using namespace macro tables. -4. Resolve lexical references versus namespace references. -5. Analyze functions for parameters, locals, free variables, and captures. -6. Lower function template bodies and `expr` holes into JVM-oriented IR. -7. Emit class files through the Larquil-owned classfile writer. +Template instructions and `expr`-compiled expressions both feed into `Lhir`. +This is the key rule that prevents Stage 2 from becoming a direct bytecode +emitter with a few special cases. -Required invariants: +## `Ljvm`: JVM-Oriented IR -- after macro expansion, no unexpanded macro calls remain in compiled source -- `expr` lowering leaves exactly one object value on the stack -- template labels are local to one function body -- expression-generated labels cannot collide with template labels -- lexical references shadow unqualified namespace references -- qualified namespace references bypass lexical lookup -- delayed namespace references do not require forward declarations -- macro expansion is ordered and phase-sensitive +`Ljvm` introduces JVM representation details: -The compiler may use nanopass-style internal languages. -The important requirement is that parsing, macro expansion, resolution, -analysis, lowering, and byte emission are not collapsed into one opaque pass. +- operand stack state +- JVM local slots +- category-1/category-2 value widths +- object versus primitive/raw values +- field and method descriptors +- class and method layout +- exception table requirements, when added +- verifier-visible control flow + +Low-level template operations should be closest to this level, but they still +arrive through checked IR nodes. + +`Ljvm` is the right level for bytecode-specific verification and emission +preparation. + +## `Lclass`: Classfile Emission + +`Lclass` is emitted bytes or an equivalent classfile data structure. + +No semantic decisions should first appear here. +If classfile emission needs to know a fact, that fact should have been made +explicit in an earlier IR level. + +--- + +# Compiler Pass Discipline + +Stage 2 should prefer many small passes over monolithic compilation routines. + +Indicative passes: + +1. read source forms +2. recognize primitive surface forms +3. validate function parameter lists +4. isolate quoted data +5. construct lexical environments +6. resolve executable symbols +7. record delayed namespace references +8. parse low-level template operations +9. lower expression boundaries to HIR +10. construct HIR control flow +11. analyze locals and captures +12. verify low-level region boundaries +13. lower HIR to JVM IR +14. verify JVM stack/local effects +15. emit classfile structures +16. write classfile bytes + +The exact pass list may change. +The important requirement is that each pass has a narrow purpose and a stated +input/output representation. + +Verification passes should be easy to enable during compiler development. +Optimization passes should be easy to disable. --- @@ -981,10 +1073,6 @@ Load-time effects include: - installing namespace values emitted by explicit bootstrap definition code - executing top-level IIFEs - executing load-time portions of bootstrap code -- assigning namespace values emitted by macro-defined `def` - -Macro registration needed by later source forms must happen at compile time, -not only at load time. Ordinary value definitions may happen at load time. References to later values are allowed if the value is installed before the @@ -992,27 +1080,66 @@ reference is evaluated. --- +# Longer-Range Direction + +Stage 2 should deliberately enable, but not implement, the next layer of Lisp +surface growth. + +Expected later work includes: + +- macro expansion +- quasiquote, unquote, and unquote-splicing reader syntax +- generated symbols for macro hygiene +- definition forms +- ordinary expression-bodied function syntax +- `fn`, `let`, `if`, `do`, and assignment as derived forms +- declaration syntax +- type inference and type-directed JVM lowering +- local exits and structured control +- richer low-level JVM source regions + +The Stage 2 contribution to all of that is architectural: + +- source symbols are separate from resolved references +- internal bindings exist before lowering +- HIR exists before JVM representation +- low-level operations are represented and checked +- `expr` gives old template code a path into expression compilation +- classfile emission is no longer where semantic decisions are made + +This lets the next layer define new surface syntax without replacing the +compiler spine again. + +--- + # Error Behavior Reader errors: - malformed strings - unsupported string escapes -- malformed qualified names +- malformed qualified symbols - unmatched parentheses -Compile-time errors: +Surface-form errors: +- malformed `function` +- malformed `expr` +- malformed `quote` - duplicate function parameters +- qualified parameter symbols + +Compile-time errors: + - duplicate template labels in one function - invalid template instruction shape -- `expr` outside a function template body +- invalid method or field descriptor shape +- `expr` outside a low-level template region - `expr` with the wrong number of operands -- macro transformer does not return source data -- macro used before it is installed in the expansion phase -- unquote outside quasiquote expansion -- unquote-splicing outside list quasiquote expansion +- expression lowering that does not leave one object value +- template branch into or out of expression-internal control flow - closure capture shape the compiler cannot represent +- low-level region shape the compiler cannot verify Compile-time warnings: @@ -1027,6 +1154,10 @@ Runtime errors: - invoking a non-callable value - JVM verification or linkage errors caused by invalid generated bytecode +Stage 2 should preserve source location where practical so errors can be +reported in terms of source forms rather than only lowered template or bytecode +operations. + --- # Examples @@ -1041,7 +1172,11 @@ Runtime errors: (return)) ``` -## Function With Expression Hole +This remains accepted. +Internally, the template forms become low-level operation nodes before JVM +emission. + +## Function With Expression Boundary ```lisp (function add3 (x) @@ -1050,6 +1185,8 @@ Runtime errors: ``` This example assumes `+` is defined by the bootstrap core. +The `expr` expression is resolved and lowered through the same compiler path as +other expression code. ## Closure @@ -1064,6 +1201,9 @@ This example assumes `+` is defined by the bootstrap core. (return)) ``` +The inner function captures `n`. +The capture is represented before JVM lowering. + ## Quoted Source ```lisp @@ -1072,15 +1212,10 @@ This example assumes `+` is defined by the bootstrap core. (return)) ``` -## Quasiquoted Macro Output - -```lisp -`((function (,name) - ,@body) - ,value) -``` +Quoted source remains source data. +The symbols inside the quoted form are not resolved. -## Qualified Name +## Qualified Symbol ```lisp (function f (x) @@ -1088,9 +1223,12 @@ This example assumes `+` is defined by the bootstrap core. (return)) ``` +The qualified symbol bypasses lexical lookup. + ## Raw Bootstrap Definition -Before `def` exists, a value may be installed by explicit top-level code: +Before definition syntax exists, a value may be installed by explicit top-level +code: ```lisp ((function () @@ -1098,15 +1236,16 @@ Before `def` exists, a value may be installed by explicit top-level code: (store value) (expr 'answer) (load value) - (invokestatic "com/tailrecursion/larquil/stage2/Runtime" + (invokestatic "com/tailrecursion/larquil/stage0/BootstrapRuntime" "defineValue" "(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;") (return))) ``` -The exact helper names are provisional. +The exact helper names are provisional bootstrap support. +The example intentionally does not require a permanent Stage 2 runtime package. The required capability is explicit namespace value assignment without a -primitive `def` form. +primitive definition form. --- @@ -1116,13 +1255,15 @@ Stage 2 implementation should preserve the bootstrap style: - prefer Larquil source over Java support where practical - keep Java runtime support explicit and small -- keep namespace and macro helper APIs narrow +- keep namespace helper APIs narrow - keep compiler data Larquil-shaped where possible - make compiler passes explicit and testable +- preserve source locations where practical - do not expose implementation-private namespace slots as public language objects -- do not introduce a large permanent surface language before macros can define - it +- do not attach compiler meaning to source symbols +- do not introduce a large permanent surface language +- do not let low-level template code bypass IR verification Temporary host support must be documented if Stage 2 source depends on it. @@ -1134,28 +1275,37 @@ Stage 2 is done when all items below are true: - existing Stage 1-compatible examples still compile and run - the reader parses quoted forms -- the reader expands quasiquote, unquote, and unquote-splicing syntax -- the reader parses qualified names -- malformed qualified names are reader errors -- `quote` returns source data with names preserved -- generated names do not collide with reader-created names -- `larquil.boot/gensym` or equivalent bootstrap access exists +- the reader parses qualified symbols +- the reader never qualifies unqualified symbols using the current namespace +- quoted unqualified symbols remain unqualified +- malformed qualified symbols are reader errors +- `quote` returns source data with symbols preserved +- no public `Name`, `Var`, or `Binding` object is required +- private compiler binding/reference records exist after resolution +- executable symbols do not survive as raw symbols after resolution +- lexical references shadow unqualified namespace references +- qualified namespace references bypass lexical lookup +- unresolved ordinary namespace references do not require forward declarations +- delayed namespace references are explicit compiler nodes +- unresolved ordinary namespace references warn or fail as documented - `function` remains backward-compatible for Stage 1 bodies +- function bodies are represented as low-level template regions +- template instructions parse into IR nodes before bytecode emission +- template labels are local to one function body +- duplicate template labels are rejected - `expr` works inside function bodies - `expr` rejects invalid placement and arity +- `expr` lowering leaves exactly one object value - function bodies can mix template instructions and `expr` - anonymous `function` can be used as a function value - non-capturing functions compile and run - capturing functions compile and run -- lexical references shadow unqualified namespace references -- qualified namespace references bypass lexical lookup -- unresolved ordinary namespace references do not require forward declarations -- unresolved ordinary namespace references warn or fail as documented -- macro transformers can be manually registered -- registered macros expand later forms in the same compilation unit -- `defmacro` can be bootstrapped through manual macro registration -- `def` can be bootstrapped as a macro -- `fn`, `let`, and `if` can be bootstrapped as macros -- compiler pipeline has distinct reader, macro expansion, resolution, analysis, - lowering, and emission steps +- capturing functions still implement `IFunction` +- closure capture information is explicit before JVM lowering +- compiler pipeline distinguishes reader data, surface AST, resolved AST, + Larquil HIR, JVM IR, and classfile emission +- compiler IR records function literals, calls, namespace references, lexical + locals, captures, template forms, labels, branches, and `expr` boundaries +- low-level template regions are checked for label locality and stack effect at + least to the level required by existing Stage 1 instructions - Stage 2 compiler artifacts pass JVM verification when loaded diff --git a/specs/stage3.md b/specs/stage3.md new file mode 100644 index 0000000..9b465a6 --- /dev/null +++ b/specs/stage3.md @@ -0,0 +1,761 @@ +# Larquil Bootstrap Language (Stage 3) + +Stage 3 builds on the Stage 2 compiler spine. + +Purpose: bootstrap the first macro-defined Larquil surface language. + +Goal: add ordered macro expansion, quasiquote reader syntax, generated symbols, +and a small set of derived source forms without expanding the primitive +language kernel. + +Stage 3 remains a bootstrap language. +It is not the final Larquil or Riptide language. + +--- + +# Core Principles + +1. Preserve Stage 2 source compatibility. +2. Keep `function`, `expr`, and `quote` as the primitive source forms. +3. Insert macro expansion into the Stage 2 compiler pipeline before + environment resolution. +4. Bootstrap `defmacro`; do not make it primitive. +5. Define higher-level forms as macros over Stage 2 capabilities. +6. Keep namespace lookup separate from legacy `load-function`. +7. Use generated-symbol identity for macro hygiene. +8. Keep macro expansion ordered and phase-sensitive. +9. Keep internal binding/reference records private to the compiler. +10. Preserve low-level template regions as the user-facing JVM power boundary. + +--- + +# Relationship To Prior Stages + +Stage 3 retains the Stage 2 runtime vocabulary: + +- `Symbol` +- `List<Object>` with `ArrayList` as the bootstrap representation +- `IFunction` +- `Namespace` +- `null` +- Java `Boolean` + +Stage 3 retains Stage 2 symbol reading: + +- the reader does not consult the current namespace +- unqualified symbols remain unqualified +- qualified symbols preserve namespace and local parts +- quoted symbols are not resolved +- executable symbols are resolved by the compiler after macro expansion + +Stage 3 retains Stage 2 compiler strata and inserts one new stratum: + +```text +Lread reader data +Lsurface primitive forms and reader-expanded forms recognized +Lexpanded macro calls expanded +Lresolved executable symbols resolved to internal references +Lhir Larquil high-level IR +Ljvm JVM-oriented IR +Lclass emitted classfile bytes +``` + +Low-level template regions are still the low-level source sublanguage. + +--- + +# Reader Additions + +Stage 3 adds reader syntax for quasiquote, unquote, and unquote-splicing. + +Reader syntax: + +```lisp +`datum +,expr +,@expr +``` + +Reader expansion: + +```lisp +`x ; (quasiquote x) +,x ; (unquote x) +,@x ; (unquote-splicing x) +``` + +These are reader transformations only. +`quasiquote`, `unquote`, and `unquote-splicing` are ordinary symbols in source +forms. +The macro layer gives them meaning. + +Examples: + +```lisp +`(if ,test ,then ,else) +`(do ,@body) +``` + +Rules: + +- the reader expands syntax only +- `unquote` is meaningful only while expanding `quasiquote` +- `unquote-splicing` is meaningful only while expanding a list inside + `quasiquote` +- malformed quasiquote forms are macro-expansion errors +- nested quasiquote may be deferred if bootstrap source does not need it + +--- + +# Macro Tables + +A namespace may contain a macro table in addition to its runtime value table. + +The macro table maps local symbol names to macro transformer functions. +The table is namespace/compiler state. +It is not a public `Var`, `Binding`, or `Slot` object. + +Macro lookup: + +- unqualified macro names are looked up in the current namespace macro table +- qualified macro names are looked up in the named namespace macro table +- macro lookup in operator position happens before ordinary lexical/operator + resolution +- macro lookup happens before ordinary function-call lowering + +There is no delayed resolution for macro names. +A macro must be installed before a form using that macro is expanded. + +--- + +# Macro Transformers + +Macro transformers are Larquil functions. + +Initial transformer ABI: + +```lisp +(function (form env) + ...) +``` + +Arguments: + +- `form` is the full source form being expanded +- `env` is an expansion environment object, or `null` until such an object + exists + +Return value: + +- replacement source data + +Rules: + +- transformer input contains `Symbol`, `ArrayList`, strings, integers, + booleans, and `null` +- transformer output must be valid source data +- returned source is expanded again until no macro call remains at the current + expansion point +- a macro returning `'foo` returns an unqualified symbol +- that symbol resolves at the expansion site unless the macro qualifies it or + generates a fresh symbol +- macros should emit qualified symbols for intended global references +- macros should use generated symbols for fresh locals + +--- + +# Manual Macro Registration + +`defmacro` is not primitive. +It is bootstrapped through explicit compile-time registration. + +Macro registration used by later forms in the same compilation unit must happen +during compilation, not only when the generated loader is run. + +Conceptual manual registration body: + +```lisp +(function () + (expr + (function (form env) + ;; transformer body + ...)) + (store transformer) + (expr 'when) + (load transformer) + (invokestatic "com/tailrecursion/larquil/stage0/BootstrapRuntime" + "defineMacro" + "(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;") + (return)) +``` + +The exact helper owner/name/descriptors are provisional bootstrap support. +The required behavior is that bootstrap source can arrange for this body to run +during compilation before later forms that use the macro are expanded. + +Once this exists, `defmacro` can be defined as a macro whose expansion emits +compile-time macro registration code. + +--- + +# Gensym + +Stage 3 exposes generated symbols for macro hygiene. + +After bootstrap, the facility should be available as: + +```lisp +larquil.boot/gensym +``` + +Examples: + +```lisp +(larquil.boot/gensym) +(larquil.boot/gensym "tmp") +``` + +Rules: + +- each call returns a generated symbol +- generated symbols cannot collide with reader-created source symbols +- generated symbols remain usable in quoted and quasiquoted forms +- generated symbol equality is by generated identity, not only by printed text +- gensym output is unique within one compiler/load session + +--- + +# Derived Forms + +The following forms are expected to be defined as macros: + +```lisp +def +defmacro +fn +let +let* +if +do +set! +while +block +return-from +return +``` + +These forms may become compiler-recognized later for diagnostics or +optimization. +They should not become permanent primitives merely for implementation +convenience. + +--- + +# `defmacro` + +Example: + +```lisp +(defmacro when (test &body body) + `(if ,test + (do ,@body) + null)) +``` + +Expected behavior: + +1. create a transformer function +2. install it in the current namespace macro table +3. make it available to later forms in the same compilation unit +4. return an implementation-defined compile-time value + +`defmacro` itself is defined only after manual macro registration is available. + +Minimum macro lambda-list support: + +- fixed positional parameters +- optional `&body` as an alias for the remaining forms + +Example: + +```lisp +(defmacro ignore1 (x) + null) + +(defmacro when (test &body body) + ...) +``` + +Full Common Lisp lambda lists are not required in Stage 3. + +--- + +# `def` + +Syntax: + +```lisp +(def name value) +``` + +Example: + +```lisp +(def answer 42) +``` + +Expected behavior: + +1. evaluate `value` +2. assign the result into the current namespace value table under `name` +3. return the value + +Before `def` exists, the same capability may be expressed by explicit top-level +code: + +```lisp +((function () + (expr 42) + (store value) + (expr 'answer) + (load value) + (invokestatic "com/tailrecursion/larquil/stage0/BootstrapRuntime" + "defineValue" + "(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;") + (return))) +``` + +The exact helper names are provisional bootstrap support. + +--- + +# `fn` + +Syntax: + +```lisp +(fn (param ...) + body...) +``` + +Expected expansion shape: + +```lisp +(function (param ...) + body...) +``` + +Example: + +```lisp +(fn (x) + (larquil.core/+ x 1)) +``` + +The first implementation may expand `fn` bodies into Stage 2-compatible +`function` bodies with `expr` boundaries where needed. +For example, a single-expression body may lower as: + +```lisp +(function (param ...) + (expr body) + (return)) +``` + +Multiple expression bodies may require `do` to be installed first, or may be +temporarily restricted during bootstrap. +The long-range direction is ordinary expression-bodied functions. + +--- + +# `let` And `let*` + +Parallel `let`: + +```lisp +(let ((x a) + (y b)) + body...) +``` + +Expected expansion shape: + +```lisp +((function (x y) + body...) + a + b) +``` + +Rules: + +- initializers evaluate in the incoming environment +- bindings are established only for the body +- lexical shadowing follows from function parameter scope + +As with `fn`, an initial implementation may lower the function body through +`expr` and `do`: + +```lisp +((function (x y) + (expr (do body...)) + (return)) + a + b) +``` + +Sequential `let*`: + +```lisp +(let* ((x a) + (y (larquil.core/+ x 1))) + body...) +``` + +Expected expansion shape: + +```lisp +(let ((x a)) + (let ((y (larquil.core/+ x 1))) + body...)) +``` + +--- + +# `if` + +Syntax: + +```lisp +(if test then else) +``` + +Expected behavior: + +- evaluate `test` +- if truthy, evaluate `then` +- otherwise evaluate `else` +- return the selected branch value + +Truth remains Stage 2-compatible: + +- `null` is falsey +- `Boolean.FALSE` is falsey +- all other values are truthy + +The first implementation may expand `if` to generated labels, temporaries, and +low-level template branches. +The compiler may later recognize the expansion for destination-aware lowering. + +Stage 2-compatible expansion shape: + +```lisp +((function () + (expr test) + (jump-if-false else) + (expr then) + (return) + (label else) + (expr else) + (return))) +``` + +The actual expansion must generate labels that cannot collide with user labels. + +--- + +# `do` + +Syntax: + +```lisp +(do form...) +``` + +Expected behavior: + +- evaluate forms left-to-right +- discard intermediate values +- return the final value +- return `null` when there are no forms + +Example: + +```lisp +(do + (print "start") + (larquil.core/+ 1 2)) +``` + +Stage 2-compatible expansion shape: + +```lisp +((function () + (expr form1) + (pop) + ... + (expr final-form) + (return))) +``` + +For an empty `do`, the expansion returns `null`. + +--- + +# `set!` + +Syntax: + +```lisp +(set! place value) +``` + +Stage 3 only requires lexical variable assignment. + +Example: + +```lisp +(let ((x 1)) + (set! x 2) + x) +``` + +Rules: + +- `place` must resolve to a mutable lexical binding +- assigning an immutable binding is a compile-time error +- assigning an unresolved namespace reference is not required +- namespace assignment should use explicit namespace helper calls until a place + model exists + +--- + +# Local Exits + +The expected surface forms are: + +```lisp +(block name body...) +(return-from name value) +(return value) +``` + +Stage 3 may defer full implementation of these forms. + +Local-only cases may expand to generated labels and temporaries. +Nonlocal exits across closures require additional compiler/runtime support and +must not be hidden behind an unsound macro expansion. + +If `return` is provided, it should be equivalent to returning from the nearest +implicit block established by a function body, or from an explicitly specified +policy if that implicit block rule is not yet implemented. + +--- + +# Delayed Runtime Resolution + +Stage 3 keeps the Stage 2 delayed namespace reference model for ordinary +runtime values. + +Example: + +```lisp +(def even? + (fn (n) + (if (larquil.core/= n 0) + true + (odd? (larquil.core/- n 1))))) + +(def odd? + (fn (n) + (if (larquil.core/= n 0) + false + (even? (larquil.core/- n 1))))) +``` + +This source should not require a Clojure-style forward declaration for `odd?`. +The delayed reference is a namespace reference created by the use of `odd?` +inside the value assigned by `def even?`. +It is not a consequence of named top-level `function` by itself. + +Rules: + +- unresolved ordinary runtime references are allowed during compilation +- a compilation unit may warn about references that remain unresolved when the + unit ends +- evaluating a still-unbound namespace reference at runtime is an + undefined-name error +- macro names are not delayed in the same way + +--- + +# Compiler Model + +Stage 3 preserves the Stage 2 IR boundaries and adds macro expansion before +resolution. + +Minimum pipeline: + +1. read source into `Lread` +2. expand quote/quasiquote reader syntax into ordinary forms +3. recognize primitive surface forms and reader-expanded forms +4. run ordered macro expansion to produce `Lexpanded` +5. resolve executable symbols to compiler-private references +6. lower to Larquil HIR +7. analyze functions, locals, captures, and low-level regions +8. lower HIR to JVM IR +9. emit class files + +Required invariants: + +- after macro expansion, no unexpanded macro calls remain in compiled source +- macro expansion is ordered and phase-sensitive +- macro expansion happens before ordinary symbol resolution +- quoted data remains unexpanded data unless a macro explicitly constructs new + source from it +- generated symbols remain distinct from reader-created symbols +- executable symbols do not survive as raw symbols after resolution +- delayed namespace references do not require forward declarations +- closure capture information is explicit before JVM lowering +- low-level template regions still pass through IR verification + +--- + +# Error Behavior + +Reader errors: + +- malformed strings +- unsupported string escapes +- malformed qualified symbols +- malformed reader quasiquote syntax +- unmatched parentheses + +Macro-expansion errors: + +- macro used before it is installed +- macro transformer does not return valid source data +- `unquote` outside quasiquote expansion +- `unquote-splicing` outside list quasiquote expansion +- malformed derived-form syntax + +Compile-time errors: + +- duplicate function parameters +- duplicate template labels in one function +- invalid template instruction shape +- `expr` outside a low-level template region +- `expr` with the wrong number of operands +- assignment to an immutable lexical binding +- closure capture shape the compiler cannot represent + +Compile-time warnings: + +- unresolved namespace references remaining at compilation-unit end +- implementation-specific warnings for dynamic namespace calls or missed direct + calls may be added later + +Runtime errors: + +- wrong function arity +- evaluating an unbound namespace reference +- invoking a non-callable value +- JVM verification or linkage errors caused by invalid generated bytecode + +--- + +# Examples + +## Macro Definition + +```lisp +(defmacro when (test &body body) + `(if ,test + (do ,@body) + null)) +``` + +## Function Definition + +```lisp +(def add1 + (fn (x) + (larquil.core/+ x 1))) +``` + +## Local Binding + +```lisp +(let ((x 1) + (y 2)) + (larquil.core/+ x y)) +``` + +## Recursive Definitions + +```lisp +(def fact + (fn (n) + (if (larquil.core/<= n 1) + 1 + (larquil.core/* n (fact (larquil.core/- n 1)))))) +``` + +## Low-Level Escape Hatch + +```lisp +(def print + (function (x) + (getstatic "java/lang/System" "out" "Ljava/io/PrintStream;") + (load x) + (invokevirtual "java/io/PrintStream" "println" "(Ljava/lang/Object;)V") + (aconst-null) + (return))) +``` + +--- + +# Implementation Discipline + +Stage 3 implementation should preserve the bootstrap style: + +- prefer Larquil source over Java support where practical +- keep Java runtime support explicit and small +- keep namespace and macro helper APIs narrow +- keep compiler data Larquil-shaped where possible +- make compiler passes explicit and testable +- preserve source locations where practical +- do not expose implementation-private namespace slots as public language + objects +- do not attach compiler meaning to source symbols +- do not make macro-defined forms permanent primitives merely for convenience +- do not let low-level template code bypass IR verification + +Temporary host support must be documented if Stage 3 source depends on it. + +--- + +# Stage 3 Done Checklist + +Stage 3 is done when all items below are true: + +- Stage 2-compatible examples still compile and run +- the reader expands quasiquote, unquote, and unquote-splicing syntax +- `quasiquote` expands lists, symbols, primitive literals, `unquote`, and + `unquote-splicing` +- malformed quasiquote forms are errors +- generated symbols do not collide with reader-created symbols +- `larquil.boot/gensym` or equivalent bootstrap access exists +- macro transformers can be manually registered +- registered macros expand later forms in the same compilation unit +- macro use before macro definition is an expansion error +- `defmacro` can be bootstrapped through manual macro registration +- `def` can be bootstrapped as a macro +- `fn`, `let`, `let*`, `if`, and `do` can be bootstrapped as macros +- lexical `set!` can be bootstrapped or clearly deferred with an error +- ordinary runtime forward references do not require declarations +- compiler pipeline distinguishes reader data, surface AST, expanded source, + resolved AST, Larquil HIR, JVM IR, and classfile emission +- macro expansion preserves Stage 2 symbol and namespace semantics +- low-level template regions still pass Stage 2 verification +- Stage 3 compiler artifacts pass JVM verification when loaded