git » larquil.git » commit 42a1381

Split and refine Stage 2 and Stage 3 specs

author Alan
2026-05-02 20:40:27 UTC
committer Alan
2026-05-02 20:40:27 UTC
parent d9c207523a1525c62396823efdd3b16c44591d5a

Split and refine Stage 2 and Stage 3 specs

specs/stage2.md +604 -454
specs/stage3.md +761 -0

diff --git a/specs/stage2.md b/specs/stage2.md
index fe3ad5c..1117aff 100644
--- a/specs/stage2.md
+++ b/specs/stage2.md
@@ -1,37 +1,146 @@
 # Larquil Bootstrap Language (Stage 2)
 
-Stage 2 is the first Larquil stage whose purpose is language growth rather than
+Stage 2 is the first Larquil stage whose purpose is language architecture, not
 only compiler replacement.
 
-Purpose: evolve the Stage 1 self-hosting compiler toward a small Lisp that can
-build its own surface language from a tiny kernel.
+Purpose: evolve the Stage 1 self-hosting compiler toward a real Lisp compiler
+while preserving Stage 0 and Stage 1 bootstrap source compatibility.
 
-Goal: keep the existing `function` form as the primitive code form, make
-function bodies a bytecode template language with expression holes, add source
-quotation, add namespaces, add closures, and make macros powerful enough that
-most future language forms can be written in Larquil itself.
+Goal: establish Larquil's semantic compiler pipeline, symbol/namespace model,
+internal environment/binding discipline, closure groundwork, and low-level JVM
+power boundary.  The existing `function` body template language remains valid,
+but it is no longer treated as direct backend emission.  Template code and
+ordinary expression code both enter explicit compiler IR before bytecode is
+emitted.
 
 This is still a bootstrap language.
 It is not the final Larquil or Riptide language.
 
 ---
 
+# Design Lineage
+
+Stage 2 is guided by two compatible compiler traditions.
+
+The Nanopass tradition says to build a compiler as many small, explicit passes
+over well-defined intermediate languages.  A change in compiler knowledge
+should usually appear as a change in IR shape or a pass-local environment, not
+as a hidden side effect inside a monolithic lowering routine.
+
+The SICL/Cleavir tradition says to make environment-sensitive meaning explicit
+early, then move through representation levels deliberately:
+
+```text
+concrete source -> resolved AST -> high-level IR -> medium/backend IR -> code
+```
+
+For Larquil, these are not competing ideas.  Stage 2 treats Cleavir-like IR
+levels as the durable semantic boundaries, and expects nanopass-sized
+transformations inside and between those boundaries.
+
+The important synthesis:
+
+- symbols are syntax
+- environments assign meaning
+- internal bindings accumulate compiler knowledge
+- high-level IR still speaks in Larquil values and operations
+- JVM representation details appear only after semantic analysis
+- user-facing low-level power is preserved as a checked source sublanguage, not
+  as direct access to the compiler backend
+
+---
+
+# Consulted References
+
+The following materials motivate this design.  They are not normative, but they
+explain the architectural pressure behind this stage.
+
+- Dipanwita Sarkar, Oscar Waddell, R. Kent Dybvig, "A Nanopass Framework for
+  Compiler Education", Journal of Functional Programming, 2005.
+  DOI: `10.1017/S0956796805005605`.
+  The central lesson is that many fine-grained passes with explicit input and
+  output languages are easier to understand, test, and evolve than a few
+  monolithic passes.
+
+- Andrew W. Keep, R. Kent Dybvig, "A Nanopass Framework for Commercial Compiler
+  Development", ICFP 2013.
+  DOI: `10.1145/2544174.2500618`.
+  The relevant lesson is that nanopass organization is not only pedagogical.
+  Chez Scheme's nanopass rewrite used many more passes than the previous
+  compiler while preserving practical compile time and improving generated code
+  quality.
+
+- Nanopass Framework documentation and tutorials.
+  Reference: `https://nanopass.org/documentation.html`.
+  The useful abstraction is `define-language` plus `define-pass`: the grammar
+  of an intermediate language is part of the design, and each pass states what
+  language it consumes and produces.
+
+- Irène Durand, Robert Strandh, "Bootstrapping Common Lisp using Common Lisp",
+  European Lisp Symposium, 2019.
+  Reference: `https://zenodo.org/records/2634314`.
+  The key bootstrapping idea for Larquil is isolation between host and target
+  environments through explicit global environments, rather than accidental
+  reliance on the host's current image.
+
+- Robert Strandh, "SICL: Building blocks for implementers of Common Lisp
+  systems", 2010.
+  Reference: `https://dept-info.labri.fr/~strandh/sicl.pdf`.
+  The relevant ideas are modular implementation-independent layers, as few
+  lower-layer primitives as practical, explicit declaration-manipulation
+  modules, and high-quality errors in terms of source code rather than expanded
+  or lowered code.
+
+- Cleavir documentation.
+  Reference: `https://metamodular.com/SICL/cleavir.pdf`.
+  Cleavir's CST -> AST -> HIR -> MIR -> LIR organization is the main model for
+  Larquil's semantic layering.  In particular, HIR keeps operations at the Lisp
+  object level so type inference and source-level optimization happen before
+  representation details dominate.
+
+- Robert Strandh, "Partial Inlining Using Local Graph Rewriting".
+  Reference: `https://metamodular.com/SICL/partial-inlining.pdf`.
+  The important compiler lesson is that lexical names are converted to unique
+  objects before optimization, and that a graph-like HIR enables local
+  transformations without requiring source-level reconstruction.
+
+- Ashley and Dybvig, "A Practical and Flexible Flow Analysis for Higher-Order
+  Languages", ACM TOPLAS, 1998.
+  DOI: `10.1145/291891.291898`.
+  This motivates making flow/control facts explicit enough that later passes can
+  support higher-order optimization, assignment, and control operators.
+
+- Common Lisp compiler practice, especially delayed ordinary runtime reference
+  resolution, declarations, compiler macros, and separation between source
+  symbols and compiler environments.
+  Larquil is not copying Common Lisp's package system, but it does preserve the
+  useful distinction between reading symbols and resolving references.
+
+---
+
 # Core Principles
 
 1. Preserve Stage 0 and Stage 1 source compatibility where practical.
-2. Keep `function` as the primitive callable form.
-3. Treat function bodies as bytecode templates.
-4. Add `expr` as the bridge from bytecode templates back into source
-   expression compilation.
-5. Add quotation and reader support for macro-writing syntax.
-6. Add namespaces as the global source-level organization model.
-7. Do not expose a public Var or Binding object in Stage 2.
-8. Delay ordinary namespace reference resolution in the Common Lisp style.
-9. Make macro expansion an explicit compiler phase.
-10. Keep higher-level forms out of the primitive kernel unless they cannot be
-    bootstrapped from `function`, `expr`, quotation, namespaces, and macros.
-
-The existing function body language is the bytecode template substrate.
+2. Keep the public runtime vocabulary small.
+3. Keep `function`, `quote`, and `expr` as the only primitive source forms.
+4. Treat existing function bodies as low-level template regions, not direct
+   bytecode emission.
+5. Preserve near-JVM power for users and bootstrap code through checked
+   low-level operations.
+6. Make reader output, resolved code, high-level IR, JVM IR, and class emission
+   distinct compiler levels.
+7. Resolve executable symbols into compiler-private references before lowering.
+8. Attach future declarations and inferred facts to internal bindings and IR
+   values, not to source symbols.
+9. Support closures through `function`.
+10. Delay ordinary namespace reference resolution in the Common Lisp style.
+11. Keep `load-function` as legacy helper lookup, not namespace lookup.
+12. Do not introduce a broad permanent surface language in this stage.
+
+The central Stage 2 deliverable is not `expr` by itself.
+The central deliverable is the compiler spine that lets low-level template code,
+ordinary expressions, closures, namespaces, and future surface forms share one
+semantic path toward JVM code.
 
 ---
 
@@ -43,37 +152,58 @@ Stage 2 keeps the Stage 1 runtime model where possible:
 - String literal -> `String`
 - Boolean literal -> `Boolean`
 - Nil/null literal -> `null`
-- Name -> Larquil name object
-- List -> `List<Object>`
-- Function -> callable object implementing the Larquil function ABI
+- Symbol -> Larquil `Symbol`
+- List -> `List<Object>`, with `java.util.ArrayList` as the bootstrap representation
+- Function -> `IFunction`
 - Namespace -> Larquil namespace object
 
-Stage 2 replaces the Stage 0 idea of a simple symbol with a name value.
-A name is source data.
+Stage 2 keeps the Stage 0/Stage 1 `Symbol` vocabulary and extends symbols with
+optional namespace qualification.
+The representation should leave room for generated-symbol identity, but Stage 2
+does not need to expose a generated-symbol facility.
+A symbol is source data.
 It is not a mutable variable cell.
 It does not itself carry a namespace binding.
+It does not carry type information.
+It does not carry compiler resolution information.
 
-Minimum name information:
+Minimum symbol information:
 
 ```java
-final class Name {
-    final String namespace; // null for unqualified names
+final class Symbol {
+    final String namespace; // null for unqualified symbols
     final String name;
-    final Object identity;  // optional implementation-private identity for gensyms
+    final Object identity;  // optional implementation-private identity
 }
 ```
 
 The exact representation is implementation-defined.
 The observable requirements are:
 
-- two reader-created unqualified names with the same spelling compare equal
-- two reader-created qualified names with the same namespace and local name
+- two reader-created unqualified symbols with the same spelling compare equal
+- two reader-created qualified symbols with the same namespace and local name
   compare equal
-- a generated name produced by `gensym` does not collide with any
-  reader-created source name
-- printed generated names may be diagnostic strings, but equality must not
+- if generated symbols are present internally, they do not collide with any
+  reader-created source symbol
+- printed generated symbols may be diagnostic strings, but equality must not
   depend only on those diagnostic strings
 
+Stage 2 does not introduce public `Name`, `Var`, `Binding`, `Slot`, cons cell,
+condition object, or new function protocol objects.
+Implementations may use private records internally.
+
+Lists in Stage 2 are source-form and bootstrap compiler containers.
+They are not a final cons-cell ontology.
+
+Closure objects produced by capturing `function` forms still implement
+`IFunction`.
+
+Truth remains Stage 0-compatible until a later stage explicitly changes it:
+
+- `null` is falsey
+- `Boolean.FALSE` is falsey
+- all other values are truthy
+
 ---
 
 # Reader Behavior
@@ -84,25 +214,21 @@ The reader recognizes:
 - string literals
 - booleans, if enabled by the Stage 2 runtime
 - `null`, if enabled by the Stage 2 runtime
-- names
-- qualified names
+- symbols
+- qualified symbols
 - lists
 - quote reader syntax
-- quasiquote reader syntax
-- unquote reader syntax
 - line comments
 
 Reader examples:
 
 ```lisp
-foo              ; Name(null, "foo")
-larquil.core/+   ; Name("larquil.core", "+")
+foo              ; Symbol(null, "foo")
+larquil.core/+   ; Symbol("larquil.core", "+")
 123              ; Long
 "abc"            ; String
 (a b c)          ; List<Object>
 'foo             ; (quote foo)
-`(if ,x y z)     ; (quasiquote (if (unquote x) y z))
-`(do ,@body)     ; (quasiquote (do (unquote-splicing body)))
 ```
 
 Line comments begin with `;` and continue to the end of the line.
@@ -116,11 +242,11 @@ String literal escapes remain at least:
 
 Unsupported string escapes are reader errors.
 
-## Name Grammar
+## Symbol Grammar
 
-Stage 2 names are case-sensitive.
+Stage 2 symbols are case-sensitive.
 
-Names must admit ordinary Lisp operator spellings such as:
+Symbols must admit ordinary Lisp operator spellings such as:
 
 ```lisp
 +
@@ -128,11 +254,14 @@ Names must admit ordinary Lisp operator spellings such as:
 *
 <
 <=
+=
 set!
+even?
+&body
 ```
 
-A qualified name has exactly one `/`.
-The namespace name is on the left and the local name is on the right:
+A qualified symbol has exactly one `/`.
+The namespace name is on the left and the local symbol name is on the right:
 
 ```lisp
 larquil.core/map
@@ -156,9 +285,63 @@ larquil.boot
 my.app.main
 ```
 
-Source names and generated JVM names are separate concepts.
-The compiler may munge source names into JVM class, field, or method names,
-but that munging does not change source name equality or namespace lookup.
+Source symbols and generated JVM names are separate concepts.
+The compiler may munge source symbols into JVM class, field, or method names,
+but that munging does not change source symbol equality or namespace lookup.
+
+## Reader And Current Namespace
+
+The reader does not consult the current namespace.
+
+Reading:
+
+```lisp
+foo
+```
+
+always produces:
+
+```text
+Symbol(null, "foo")
+```
+
+Reading:
+
+```lisp
+larquil.core/foo
+```
+
+always produces:
+
+```text
+Symbol("larquil.core", "foo")
+```
+
+Quotation preserves that reader result:
+
+```lisp
+'foo              ; unqualified symbol
+'larquil.core/foo ; qualified symbol
+```
+
+The current namespace participates in expression resolution, not reading.
+This deliberately avoids Common Lisp-style read-time package interning in
+Stage 2.
+
+Stage 2 has no `pkg:sym`, no `pkg::sym`, no keyword package, no read-time
+current-package qualification, no symbol value cells, no symbol function cells,
+and no symbol property lists.
+
+## Symbol Equality And Interning
+
+Reader-created symbols with the same `(namespace, name)` compare equal.
+Implementations may intern or canonicalize reader-created symbols, but
+interning is an implementation detail.
+
+If generated symbols are present internally, they compare by generated
+identity.
+A generated symbol is never equal to a reader-created symbol, even if its
+diagnostic printed text matches a source spelling.
 
 ---
 
@@ -174,26 +357,18 @@ quote
 
 `quote` is primitive because it suppresses evaluation and returns source data.
 
-Quasiquote syntax is not primitive evaluator syntax.
-The reader expands it to ordinary macro-call forms:
+`function` is primitive because it creates Larquil callable values and is the
+existing bootstrap boundary for low-level code.
 
-```lisp
-`x     ; (quasiquote x)
-,x     ; (unquote x)
-,@x    ; (unquote-splicing x)
-```
-
-`quasiquote` is a bootstrap macro.
-`unquote` and `unquote-splicing` are meaningful only while that macro expands
-quasiquoted source.
-Using `unquote` or `unquote-splicing` outside a quasiquote expansion is an
-error.
+`expr` is primitive only as a boundary form inside low-level template regions.
+It allows ordinary expression compilation to occur at a specific point in a
+template body.
+It is not intended to become the center of the final language.
 
 Stage 2 does not make these forms primitive:
 
 ```lisp
 def
-defmacro
 fn
 let
 if
@@ -205,31 +380,20 @@ return-from
 return
 ```
 
-Those forms are expected to be bootstrapped as macros.
-
 ---
 
 # Source File Structure
 
 A source file contains a sequence of top-level forms.
 
-Stage 2 top-level forms are processed left-to-right for macro expansion and
-load-time effects.
+Stage 2 top-level forms are processed left-to-right for load-time effects.
 Ordinary runtime references to later namespace values are allowed.
-Macro references to later macro definitions are not available unless an earlier
-compile-time action has installed the macro.
 
-Initial top-level forms accepted before the macro layer is bootstrapped:
+Top-level forms accepted in Stage 2:
 
 - named legacy `function` forms
 - top-level IIFEs
 - literal values, which are ignored for load-time effects
-- explicit bootstrap compile-time registration forms recognized by the Stage 2
-  compiler
-
-After macros are bootstrapped, top-level macro forms such as `def`,
-`defmacro`, and `in-namespace` may expand to those primitive load-time or
-compile-time actions.
 
 Initial namespace:
 
@@ -317,10 +481,10 @@ Examples:
 
 Rules:
 
-- parameter names are lexical names
-- parameter names must be unqualified
-- duplicate parameter names in one function are errors
-- function bodies are bytecode template bodies
+- parameter symbols are lexical symbols
+- parameter symbols must be unqualified
+- duplicate parameter symbols in one function are errors
+- function bodies are low-level template regions
 - every existing Stage 1 instruction remains valid in a function body
 - `expr` is valid in a function body
 - labels are local to one function body
@@ -331,15 +495,20 @@ Named top-level `function` remains accepted for bootstrap convenience.
 It keeps the Stage 1 helper-function role.
 It is not the general namespace definition form.
 
-Later `def` and `fn` forms may be macros over primitive `function`.
-Namespace-level definitions should be expressed through `def` or through the
-manual namespace-assignment code that bootstraps `def`.
+Namespace-level definitions should be expressed through explicit namespace
+assignment code in this stage.
 
 ---
 
-# Function Template Bodies
+# Low-Level Template Regions
 
-A function template body is a sequence of template forms.
+A function body is a low-level template region.
+
+This region preserves near-JVM power for end users and bootstrap code.
+It is intentionally more capable than ordinary high-level expression syntax.
+It is the answer to a problem Clojure leaves to Java interop or Java source:
+Larquil should allow carefully written source to express low-level JVM actions
+directly when that is the right tool.
 
 Template forms include:
 
@@ -367,7 +536,7 @@ Stage 1 instruction example:
   (return))
 ```
 
-Template plus expression hole example after `+` is available:
+Template plus expression boundary example after `+` is available:
 
 ```lisp
 (function add2 (x)
@@ -375,9 +544,23 @@ Template plus expression hole example after `+` is available:
   (return))
 ```
 
-The template instruction language remains stack-oriented.
-`expr` is the only Stage 2 bridge from template code into source expression
-compilation.
+Template instructions do not bypass compiler IR.
+They are parsed into low-level operation nodes, checked, and then lowered.
+
+The low-level region must be checked for:
+
+- lexical scope of `load` and `store`
+- local label uniqueness
+- branch target locality
+- operand stack effect
+- JVM category-1/category-2 stack consistency
+- object versus raw JVM value consistency
+- method and field descriptor validity
+- no branch into or out of expression-lowered internal control flow
+- JVM verifier compatibility after lowering
+
+The first implementation may check only the subset needed for existing
+instructions, but the architecture must admit these checks as explicit passes.
 
 `load-function` remains a legacy template instruction.
 It resolves named helper functions in the Stage 1 compatibility model.
@@ -388,7 +571,7 @@ through explicit namespace helper calls emitted by bootstrap code.
 
 ---
 
-# `expr` Holes
+# `expr` Boundaries
 
 Syntax:
 
@@ -396,17 +579,19 @@ Syntax:
 (expr source-expression)
 ```
 
-`expr` is valid only in a function template body.
+`expr` is valid only in a low-level template region.
 
 Semantics:
 
-1. The compiler macroexpands `source-expression`.
-2. The compiler resolves it in the current lexical and namespace environment.
-3. The compiler emits code for the expression at the current template position.
-4. The emitted expression leaves exactly one object value on the operand stack.
+1. The compiler resolves `source-expression` in the current lexical and
+   namespace environment.
+2. The expression is compiled through the same resolved AST and Larquil HIR
+   path as any other Stage 2 expression.
+3. The expression is lowered at the current template position.
+4. The lowered expression leaves exactly one object value on the operand stack.
 5. Control continues with the next template form.
 
-Example after arithmetic functions or macros have been bootstrapped:
+Example after arithmetic functions or helpers have been bootstrapped:
 
 ```lisp
 (function print-add2 (x)
@@ -430,11 +615,12 @@ This example assumes `+` is available:
 Rules:
 
 - `expr` must have exactly one source expression operand
-- an `expr` expression must leave exactly one runtime value
-- an `expr` expression must not branch into or out of the containing template
-  body except through code generated as part of that expression
+- an `expr` expression must leave exactly one runtime object value
+- an `expr` expression must not branch into or out of the containing low-level
+  template region except through compiler-represented structured control
 - expression-generated internal labels must not collide with user-written
   template labels
+- `expr` is a compatibility bridge, not a general block construct
 
 ---
 
@@ -449,11 +635,11 @@ Syntax:
 
 Semantics:
 
-- quoted names become Larquil name objects
-- quoted qualified names preserve namespace qualification
+- quoted symbols become Larquil `Symbol` objects
+- quoted qualified symbols preserve namespace qualification
 - quoted lists become runtime list structure
 - quoted integers, strings, booleans, and null become themselves
-- quote does not resolve names
+- quote does not resolve symbols
 
 Examples:
 
@@ -476,88 +662,32 @@ reads as:
 (quote x)
 ```
 
-Quoted names are not strings:
+Quoted symbols are not strings:
 
 ```lisp
-'foo        ; name object
 "foo"       ; string
+'foo        ; symbol
 ```
 
 ---
 
-# Quasiquotation
-
-Quasiquote reader syntax is provided for macro authoring.
-It reads to ordinary forms that are expanded by the bootstrap `quasiquote`
-macro.
-
-Reader syntax:
-
-```lisp
-`datum
-,expr
-,@expr
-```
-
-Reader expansion:
-
-```lisp
-`x     ; (quasiquote x)
-,x     ; (unquote x)
-,@x    ; (unquote-splicing x)
-```
-
-The bootstrap `quasiquote` macro must support:
-
-- names
-- qualified names
-- primitive literals
-- lists
-- unquote inside lists
-- unquote-splicing inside lists
-
-Example:
-
-```lisp
-`(if ,test ,then ,else)
-```
-
-constructs a list whose first element is the name `if`, whose remaining
-elements are the values of `test`, `then`, and `else`.
-
-Example:
-
-```lisp
-`(do ,@body)
-```
-
-constructs a list whose first element is the name `do`, followed by the
-elements of the list value `body`.
-
-Rules:
-
-- `unquote` is valid only while expanding `quasiquote`
-- `unquote-splicing` is valid only while expanding a list inside `quasiquote`
-- malformed quasiquote forms are macro-expansion errors
-- nested quasiquote may be deferred if Stage 2 bootstrapping does not need it
-
----
-
 # Namespaces
 
-A namespace is the source-level global container for values and macro
-transformers.
+A namespace is the source-level global container for runtime values.
 
 Stage 2 specifies namespaces as source semantics.
-It does not specify a public Var or Binding object.
+It does not specify a public `Var` or `Binding` object.
 
 A namespace has:
 
 - a canonical name
-- a map from local names to runtime values
-- a map from local names to macro transformers
+- a map from local symbol names to runtime values
 - implementation-private metadata as needed
 
+Namespace maps are keyed by the unqualified local part of a symbol.
+For a qualified symbol, the namespace part selects the namespace and the local
+part selects the entry inside that namespace.
+
 The implementation may store private slots or binding records internally.
 Those are not Stage 2 source-level values.
 
@@ -577,24 +707,22 @@ user
 The current namespace is compiler/load state.
 Stage 2 does not require a primitive source form for changing it.
 Bootstrap code may change it through explicit runtime/compiler helper calls.
-After macros are bootstrapped, a surface form such as `in-namespace` may be
-defined as a macro.
 
-Qualified name:
+Qualified symbol:
 
 ```lisp
 larquil.core/list
 ```
 
-means local name `list` in namespace `larquil.core`.
+means local symbol `list` in namespace `larquil.core`.
 
-Resolution order for ordinary expression names:
+Resolution order for ordinary expression symbols:
 
 1. lexical locals and parameters
 2. current namespace value table
-3. delayed namespace reference if no current value is known
+3. delayed current-namespace reference if no current value is known
 
-Qualified names bypass lexical lookup:
+Qualified symbols bypass lexical lookup:
 
 ```lisp
 (function f (list)
@@ -602,10 +730,10 @@ Qualified names bypass lexical lookup:
   (return))
 ```
 
-The `expr` reference above denotes the namespace value `list` in
+The `expr` reference above denotes the namespace value for symbol `list` in
 `larquil.core`, not the parameter `list`.
 
-Unqualified lexical names shadow current namespace names:
+Unqualified lexical symbols shadow current namespace symbols:
 
 ```lisp
 (function f (list)
@@ -617,6 +745,58 @@ The `expr` reference above denotes the parameter `list`.
 
 ---
 
+# Internal Environments And Bindings
+
+Stage 2 does not expose public binding objects, but the compiler should use
+private binding and reference records internally.
+
+This is the compiler-side distinction:
+
+```text
+Symbol      source data created by the reader
+Binding     compiler-private object naming what a symbol means
+Reference   compiler-private use of a binding
+Namespace   runtime/source-level global container
+```
+
+After resolution, executable symbol occurrences must not remain raw `Symbol`
+objects in compiler IR.
+They become one of:
+
+- lexical reference
+- namespace reference
+- delayed namespace reference
+- legacy helper-function reference, where Stage 1 compatibility requires it
+
+Internal bindings are the unit to which later compiler knowledge attaches:
+
+- declared type
+- inferred type
+- arity or call shape
+- mutability
+- phase
+- capture status
+- source location
+- namespace resolution state
+- representation choice
+
+These records are not public Larquil values.
+They do not imply a Clojure-style `Var` model.
+They do not imply Common Lisp symbol value cells or function cells.
+
+This design leaves a clear path for future declaration forms:
+
+```lisp
+;; illustrative future direction, not Stage 2 primitive syntax
+(declare n larquil.core/Long)
+```
+
+A declaration should affect the environment and the relevant internal binding.
+It should not mutate the source `Symbol`.
+The declaration may later be checked, refined, or erased by subsequent passes.
+
+---
+
 # Expression Semantics
 
 `expr` compiles Stage 2 source expressions.
@@ -628,12 +808,7 @@ Primitive expression forms:
 (function (<param> ...) template-form...)
 ```
 
-All other list expressions are processed as follows:
-
-1. If the first element is a name that names an installed macro in the current
-   expansion environment, the form is macroexpanded and compilation continues
-   on the expansion.
-2. Otherwise the form is a function call.
+All other list expressions are function calls.
 
 Function call semantics:
 
@@ -661,34 +836,16 @@ The second form resolves `+` in namespace `larquil.core`.
 The compiler may optimize statically known calls, but the source semantics are
 ordinary operator-position evaluation in a Lisp-1 value namespace.
 
+---
+
 # Delayed Namespace Resolution
 
 Stage 2 follows a Common Lisp-style delayed resolution model for ordinary
-runtime names.
-
-When the compiler sees a non-lexical name that is not currently defined in the
-namespace, it may record a namespace reference instead of failing immediately.
-
-Example intended surface after `def`, `fn`, and `if` are bootstrapped:
-
-```lisp
-(def even?
-  (fn (n)
-    (if (= n 0)
-        true
-        (odd? (- n 1)))))
-
-(def odd?
-  (fn (n)
-    (if (= n 0)
-        false
-        (even? (- n 1)))))
-```
+runtime symbols.
 
-This source should not require a Clojure-style forward declaration for `odd?`.
-The delayed reference is a namespace reference created by the use of `odd?`
-inside the value assigned by `def even?`.
-It is not a consequence of named top-level `function` by itself.
+When the compiler sees a non-lexical symbol that is not currently defined in
+the namespace, it may record a delayed namespace reference instead of failing
+immediately.
 
 Rules:
 
@@ -697,125 +854,11 @@ Rules:
   unit ends
 - evaluating a still-unbound namespace reference at runtime is an
   undefined-name error
-- macro names are not delayed in the same way; a macro must be installed before
-  a form using that macro is expanded
-
----
-
-# Macro Expansion
-
-Stage 2 has a macro-expansion phase before expression lowering.
-
-Macro transformers are ordinary Larquil functions registered in a namespace
-macro table.
+- delayed namespace references are explicit compiler IR nodes, not string
+  lookups scattered through JVM lowering
 
-A transformer receives source data and returns replacement source data.
-The exact bootstrap ABI may be implementation-defined, but it must be
-documented before Stage 2 is considered complete.
-
-Recommended initial ABI:
-
-```lisp
-(function (form env)
-  ...)
-```
-
-where:
-
-- `form` is the full source form being expanded
-- `env` is an expansion environment object or `null` until an environment
-  object exists
-- the return value is replacement source
-
-`defmacro` is not primitive in Stage 2.
-It is expected to be manually bootstrapped using explicit compile-time
-registration actions and template instructions.
-
-Macro registration used by later forms in the same compilation unit must happen
-during compilation, not only when the generated loader is run.
-The first implementation may provide a bootstrap compile-time execution path
-for the explicit registration code.
-
-Conceptual manual macro registration body:
-
-```lisp
-(function ()
-  (expr
-    (function (form env)
-      ;; macro transformer body
-      ...))
-  (store transformer)
-  (expr 'when)
-  (load transformer)
-  (invokestatic "com/tailrecursion/larquil/stage2/Runtime"
-                "defineMacro"
-                "(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;")
-  (return))
-```
-
-The exact helper owner/name/descriptors are provisional.
-The spec requirement is that bootstrap source can arrange for this registration
-body to run during compilation, before later forms that use the macro are
-expanded.
-It must not depend on ordinary load-time IIFE execution.
-
-Once bootstrapped, `defmacro` can be a macro whose expansion emits compile-time
-macro registration code.
-
-Example intended surface after bootstrap:
-
-```lisp
-(defmacro when (test &body body)
-  `(if ,test
-       (do ,@body)
-       null))
-```
-
-This is not primitive Stage 2 syntax.
-It is an example of the language Stage 2 is meant to enable.
-
----
-
-# Gensym
-
-Stage 2 provides a monotonically increasing gensym facility for macros.
-
-Initial access may be through raw template instructions calling runtime or
-compiler helpers.
-
-After bootstrap, the facility should be available as:
-
-```lisp
-larquil.boot/gensym
-```
-
-Example intended use:
-
-```lisp
-(larquil.boot/gensym)
-(larquil.boot/gensym "tmp")
-```
-
-Rules:
-
-- each call returns a generated name
-- generated names cannot collide with reader-created source names
-- generated names remain usable in quoted and quasiquoted forms
-- generated name equality is by generated identity, not only by printed text
-- gensym output is unique within one compiler/load session
-
-Example intended macro pattern:
-
-```lisp
-(defmacro with-temp (value &body body)
-  (let ((t (larquil.boot/gensym "tmp")))
-    `((function (,t)
-        ,@body)
-      ,value)))
-```
-
-The example uses macro-defined `defmacro` and `let`.
-They are not primitive forms.
+This avoids a Clojure-style forward declaration requirement for ordinary
+runtime values while still letting the compiler keep precise reference objects.
 
 ---
 
@@ -848,125 +891,174 @@ The anonymous function captures `n`.
 Rules:
 
 - non-capturing functions may compile to singleton helper classes
-- capturing functions compile to closure objects or generated classes with
-  captured environment fields
-- mutable captured locals must preserve shared mutation semantics once `set!`
-  is bootstrapped
-- capture analysis is a compiler pass, not a macro responsibility
+- capturing functions compile to `IFunction` closure objects or generated
+  classes with captured environment fields
+- mutable captured locals must preserve shared mutation semantics once local
+  assignment is added
+- capture analysis is a compiler pass, not an expression-lowering side effect
+- capture facts attach to internal bindings and HIR values, not source symbols
 
 Stage 2 does not require final performance decisions for closure
 representation.
 It requires correct behavior and a compiler structure that can later optimize
-non-capturing and non-escaping closures.
+non-capturing, non-escaping, and stack-allocatable closures.
 
 ---
 
-# Derived Forms
+# Compiler Architecture
 
-The following forms are expected to be defined as macros, not primitive forms:
+Stage 2 establishes explicit IR strata.
 
-```lisp
-def
-defmacro
-fn
-let
-let*
-if
-do
-set!
-while
-block
-return-from
-return
+The implementation may use more nanopass-sized languages and passes than the
+names listed here, but it should preserve these semantic boundaries:
+
+```text
+Lread      reader data
+Lsurface   primitive forms recognized
+Lresolved  executable symbols resolved to internal references
+Lhir       Larquil high-level IR
+Ljvm       JVM-oriented IR
+Lclass     emitted classfile bytes
 ```
 
-Expected `let` expansion shape:
+## `Lread`: Reader Data
 
-```lisp
-(let ((x a)
-      (y b))
-  body...)
-```
+`Lread` contains only reader-produced data:
 
-expands to:
+- `Symbol`
+- `ArrayList`
+- strings
+- integers
+- booleans
+- `null`
 
-```lisp
-((function (x y)
-   body...)
- a
- b)
-```
+No executable symbol has meaning yet.
+The current namespace has not affected unqualified symbols.
+Source locations should be retained when practical.
 
-Expected `let*` expansion shape:
+## `Lsurface`: Surface AST
 
-```lisp
-(let* ((x a)
-       (y (+ x 1)))
-  body...)
-```
+`Lsurface` recognizes the primitive forms:
 
-expands to nested immediate function calls.
+- `function`
+- `expr`
+- `quote`
 
-Expected `fn` expansion shape:
+At this level:
 
-```lisp
-(fn (x y)
-  body...)
-```
+- malformed primitive forms are reported in source terms
+- function parameter lists are checked
+- low-level template regions are identified
+- quote data is isolated from executable code
+- raw symbols may still appear in executable positions
 
-expands to:
+## `Lresolved`: Resolved AST
 
-```lisp
-(function (x y)
-  body...)
-```
+`Lresolved` replaces executable symbol occurrences with compiler-private
+references.
 
-Expected `def` behavior:
+At this level:
 
-```lisp
-(def answer 42)
-```
+- lexical references are distinct from namespace references
+- delayed namespace references are explicit
+- legacy helper references are distinct from namespace references
+- qualified namespace references bypass lexical lookup
+- unqualified lexical references shadow namespace references
+- source symbols inside quoted data remain symbols
 
-expands to code that evaluates `42`, assigns the resulting value into the
-current namespace under name `answer`, and returns the value.
+`Lresolved` is the first level where future declarations can be meaningfully
+attached to bindings.
 
-Expected `if` behavior:
+## `Lhir`: Larquil High-Level IR
 
-`if` may expand to generated labels and template branches.
-The exact macro expansion may change as the compiler IR improves.
+`Lhir` is the main semantic IR.
 
----
+It should still speak in Larquil-level values and operations:
 
-# Compilation Model
+- function literals
+- calls
+- returns
+- lexical references
+- namespace references
+- delayed namespace references
+- quoted literals
+- closure captures
+- low-level template operations
+- labels and branches
+- expression-boundary results
 
-Stage 2 compiler structure should be explicit.
+`Lhir` is the right level for:
 
-Minimum pipeline:
+- closure analysis
+- simple call analysis
+- source-level inlining later
+- declaration checking later
+- type inference later
+- escape analysis later
+- control-flow normalization later
 
-1. Read source into raw forms with names, qualified names, and reader
-   expansions for quote and quasiquote syntax preserved.
-2. Expand the bootstrap `quasiquote` macro where macro source uses
-   quasiquote syntax.
-3. Expand macros using namespace macro tables.
-4. Resolve lexical references versus namespace references.
-5. Analyze functions for parameters, locals, free variables, and captures.
-6. Lower function template bodies and `expr` holes into JVM-oriented IR.
-7. Emit class files through the Larquil-owned classfile writer.
+Template instructions and `expr`-compiled expressions both feed into `Lhir`.
+This is the key rule that prevents Stage 2 from becoming a direct bytecode
+emitter with a few special cases.
 
-Required invariants:
+## `Ljvm`: JVM-Oriented IR
 
-- after macro expansion, no unexpanded macro calls remain in compiled source
-- `expr` lowering leaves exactly one object value on the stack
-- template labels are local to one function body
-- expression-generated labels cannot collide with template labels
-- lexical references shadow unqualified namespace references
-- qualified namespace references bypass lexical lookup
-- delayed namespace references do not require forward declarations
-- macro expansion is ordered and phase-sensitive
+`Ljvm` introduces JVM representation details:
 
-The compiler may use nanopass-style internal languages.
-The important requirement is that parsing, macro expansion, resolution,
-analysis, lowering, and byte emission are not collapsed into one opaque pass.
+- operand stack state
+- JVM local slots
+- category-1/category-2 value widths
+- object versus primitive/raw values
+- field and method descriptors
+- class and method layout
+- exception table requirements, when added
+- verifier-visible control flow
+
+Low-level template operations should be closest to this level, but they still
+arrive through checked IR nodes.
+
+`Ljvm` is the right level for bytecode-specific verification and emission
+preparation.
+
+## `Lclass`: Classfile Emission
+
+`Lclass` is emitted bytes or an equivalent classfile data structure.
+
+No semantic decisions should first appear here.
+If classfile emission needs to know a fact, that fact should have been made
+explicit in an earlier IR level.
+
+---
+
+# Compiler Pass Discipline
+
+Stage 2 should prefer many small passes over monolithic compilation routines.
+
+Indicative passes:
+
+1. read source forms
+2. recognize primitive surface forms
+3. validate function parameter lists
+4. isolate quoted data
+5. construct lexical environments
+6. resolve executable symbols
+7. record delayed namespace references
+8. parse low-level template operations
+9. lower expression boundaries to HIR
+10. construct HIR control flow
+11. analyze locals and captures
+12. verify low-level region boundaries
+13. lower HIR to JVM IR
+14. verify JVM stack/local effects
+15. emit classfile structures
+16. write classfile bytes
+
+The exact pass list may change.
+The important requirement is that each pass has a narrow purpose and a stated
+input/output representation.
+
+Verification passes should be easy to enable during compiler development.
+Optimization passes should be easy to disable.
 
 ---
 
@@ -981,10 +1073,6 @@ Load-time effects include:
 - installing namespace values emitted by explicit bootstrap definition code
 - executing top-level IIFEs
 - executing load-time portions of bootstrap code
-- assigning namespace values emitted by macro-defined `def`
-
-Macro registration needed by later source forms must happen at compile time,
-not only at load time.
 
 Ordinary value definitions may happen at load time.
 References to later values are allowed if the value is installed before the
@@ -992,27 +1080,66 @@ reference is evaluated.
 
 ---
 
+# Longer-Range Direction
+
+Stage 2 should deliberately enable, but not implement, the next layer of Lisp
+surface growth.
+
+Expected later work includes:
+
+- macro expansion
+- quasiquote, unquote, and unquote-splicing reader syntax
+- generated symbols for macro hygiene
+- definition forms
+- ordinary expression-bodied function syntax
+- `fn`, `let`, `if`, `do`, and assignment as derived forms
+- declaration syntax
+- type inference and type-directed JVM lowering
+- local exits and structured control
+- richer low-level JVM source regions
+
+The Stage 2 contribution to all of that is architectural:
+
+- source symbols are separate from resolved references
+- internal bindings exist before lowering
+- HIR exists before JVM representation
+- low-level operations are represented and checked
+- `expr` gives old template code a path into expression compilation
+- classfile emission is no longer where semantic decisions are made
+
+This lets the next layer define new surface syntax without replacing the
+compiler spine again.
+
+---
+
 # Error Behavior
 
 Reader errors:
 
 - malformed strings
 - unsupported string escapes
-- malformed qualified names
+- malformed qualified symbols
 - unmatched parentheses
 
-Compile-time errors:
+Surface-form errors:
 
+- malformed `function`
+- malformed `expr`
+- malformed `quote`
 - duplicate function parameters
+- qualified parameter symbols
+
+Compile-time errors:
+
 - duplicate template labels in one function
 - invalid template instruction shape
-- `expr` outside a function template body
+- invalid method or field descriptor shape
+- `expr` outside a low-level template region
 - `expr` with the wrong number of operands
-- macro transformer does not return source data
-- macro used before it is installed in the expansion phase
-- unquote outside quasiquote expansion
-- unquote-splicing outside list quasiquote expansion
+- expression lowering that does not leave one object value
+- template branch into or out of expression-internal control flow
 - closure capture shape the compiler cannot represent
+- low-level region shape the compiler cannot verify
 
 Compile-time warnings:
 
@@ -1027,6 +1154,10 @@ Runtime errors:
 - invoking a non-callable value
 - JVM verification or linkage errors caused by invalid generated bytecode
 
+Stage 2 should preserve source location where practical so errors can be
+reported in terms of source forms rather than only lowered template or bytecode
+operations.
+
 ---
 
 # Examples
@@ -1041,7 +1172,11 @@ Runtime errors:
   (return))
 ```
 
-## Function With Expression Hole
+This remains accepted.
+Internally, the template forms become low-level operation nodes before JVM
+emission.
+
+## Function With Expression Boundary
 
 ```lisp
 (function add3 (x)
@@ -1050,6 +1185,8 @@ Runtime errors:
 ```
 
 This example assumes `+` is defined by the bootstrap core.
+The `expr` expression is resolved and lowered through the same compiler path as
+other expression code.
 
 ## Closure
 
@@ -1064,6 +1201,9 @@ This example assumes `+` is defined by the bootstrap core.
   (return))
 ```
 
+The inner function captures `n`.
+The capture is represented before JVM lowering.
+
 ## Quoted Source
 
 ```lisp
@@ -1072,15 +1212,10 @@ This example assumes `+` is defined by the bootstrap core.
    (return))
 ```
 
-## Quasiquoted Macro Output
-
-```lisp
-`((function (,name)
-    ,@body)
-  ,value)
-```
+Quoted source remains source data.
+The symbols inside the quoted form are not resolved.
 
-## Qualified Name
+## Qualified Symbol
 
 ```lisp
 (function f (x)
@@ -1088,9 +1223,12 @@ This example assumes `+` is defined by the bootstrap core.
   (return))
 ```
 
+The qualified symbol bypasses lexical lookup.
+
 ## Raw Bootstrap Definition
 
-Before `def` exists, a value may be installed by explicit top-level code:
+Before definition syntax exists, a value may be installed by explicit top-level
+code:
 
 ```lisp
 ((function ()
@@ -1098,15 +1236,16 @@ Before `def` exists, a value may be installed by explicit top-level code:
   (store value)
   (expr 'answer)
   (load value)
-  (invokestatic "com/tailrecursion/larquil/stage2/Runtime"
+  (invokestatic "com/tailrecursion/larquil/stage0/BootstrapRuntime"
                 "defineValue"
                 "(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;")
   (return)))
 ```
 
-The exact helper names are provisional.
+The exact helper names are provisional bootstrap support.
+The example intentionally does not require a permanent Stage 2 runtime package.
 The required capability is explicit namespace value assignment without a
-primitive `def` form.
+primitive definition form.
 
 ---
 
@@ -1116,13 +1255,15 @@ Stage 2 implementation should preserve the bootstrap style:
 
 - prefer Larquil source over Java support where practical
 - keep Java runtime support explicit and small
-- keep namespace and macro helper APIs narrow
+- keep namespace helper APIs narrow
 - keep compiler data Larquil-shaped where possible
 - make compiler passes explicit and testable
+- preserve source locations where practical
 - do not expose implementation-private namespace slots as public language
   objects
-- do not introduce a large permanent surface language before macros can define
-  it
+- do not attach compiler meaning to source symbols
+- do not introduce a large permanent surface language
+- do not let low-level template code bypass IR verification
 
 Temporary host support must be documented if Stage 2 source depends on it.
 
@@ -1134,28 +1275,37 @@ Stage 2 is done when all items below are true:
 
 - existing Stage 1-compatible examples still compile and run
 - the reader parses quoted forms
-- the reader expands quasiquote, unquote, and unquote-splicing syntax
-- the reader parses qualified names
-- malformed qualified names are reader errors
-- `quote` returns source data with names preserved
-- generated names do not collide with reader-created names
-- `larquil.boot/gensym` or equivalent bootstrap access exists
+- the reader parses qualified symbols
+- the reader never qualifies unqualified symbols using the current namespace
+- quoted unqualified symbols remain unqualified
+- malformed qualified symbols are reader errors
+- `quote` returns source data with symbols preserved
+- no public `Name`, `Var`, or `Binding` object is required
+- private compiler binding/reference records exist after resolution
+- executable symbols do not survive as raw symbols after resolution
+- lexical references shadow unqualified namespace references
+- qualified namespace references bypass lexical lookup
+- unresolved ordinary namespace references do not require forward declarations
+- delayed namespace references are explicit compiler nodes
+- unresolved ordinary namespace references warn or fail as documented
 - `function` remains backward-compatible for Stage 1 bodies
+- function bodies are represented as low-level template regions
+- template instructions parse into IR nodes before bytecode emission
+- template labels are local to one function body
+- duplicate template labels are rejected
 - `expr` works inside function bodies
 - `expr` rejects invalid placement and arity
+- `expr` lowering leaves exactly one object value
 - function bodies can mix template instructions and `expr`
 - anonymous `function` can be used as a function value
 - non-capturing functions compile and run
 - capturing functions compile and run
-- lexical references shadow unqualified namespace references
-- qualified namespace references bypass lexical lookup
-- unresolved ordinary namespace references do not require forward declarations
-- unresolved ordinary namespace references warn or fail as documented
-- macro transformers can be manually registered
-- registered macros expand later forms in the same compilation unit
-- `defmacro` can be bootstrapped through manual macro registration
-- `def` can be bootstrapped as a macro
-- `fn`, `let`, and `if` can be bootstrapped as macros
-- compiler pipeline has distinct reader, macro expansion, resolution, analysis,
-  lowering, and emission steps
+- capturing functions still implement `IFunction`
+- closure capture information is explicit before JVM lowering
+- compiler pipeline distinguishes reader data, surface AST, resolved AST,
+  Larquil HIR, JVM IR, and classfile emission
+- compiler IR records function literals, calls, namespace references, lexical
+  locals, captures, template forms, labels, branches, and `expr` boundaries
+- low-level template regions are checked for label locality and stack effect at
+  least to the level required by existing Stage 1 instructions
 - Stage 2 compiler artifacts pass JVM verification when loaded
diff --git a/specs/stage3.md b/specs/stage3.md
new file mode 100644
index 0000000..9b465a6
--- /dev/null
+++ b/specs/stage3.md
@@ -0,0 +1,761 @@
+# Larquil Bootstrap Language (Stage 3)
+
+Stage 3 builds on the Stage 2 compiler spine.
+
+Purpose: bootstrap the first macro-defined Larquil surface language.
+
+Goal: add ordered macro expansion, quasiquote reader syntax, generated symbols,
+and a small set of derived source forms without expanding the primitive
+language kernel.
+
+Stage 3 remains a bootstrap language.
+It is not the final Larquil or Riptide language.
+
+---
+
+# Core Principles
+
+1. Preserve Stage 2 source compatibility.
+2. Keep `function`, `expr`, and `quote` as the primitive source forms.
+3. Insert macro expansion into the Stage 2 compiler pipeline before
+   environment resolution.
+4. Bootstrap `defmacro`; do not make it primitive.
+5. Define higher-level forms as macros over Stage 2 capabilities.
+6. Keep namespace lookup separate from legacy `load-function`.
+7. Use generated-symbol identity for macro hygiene.
+8. Keep macro expansion ordered and phase-sensitive.
+9. Keep internal binding/reference records private to the compiler.
+10. Preserve low-level template regions as the user-facing JVM power boundary.
+
+---
+
+# Relationship To Prior Stages
+
+Stage 3 retains the Stage 2 runtime vocabulary:
+
+- `Symbol`
+- `List<Object>` with `ArrayList` as the bootstrap representation
+- `IFunction`
+- `Namespace`
+- `null`
+- Java `Boolean`
+
+Stage 3 retains Stage 2 symbol reading:
+
+- the reader does not consult the current namespace
+- unqualified symbols remain unqualified
+- qualified symbols preserve namespace and local parts
+- quoted symbols are not resolved
+- executable symbols are resolved by the compiler after macro expansion
+
+Stage 3 retains Stage 2 compiler strata and inserts one new stratum:
+
+```text
+Lread      reader data
+Lsurface   primitive forms and reader-expanded forms recognized
+Lexpanded  macro calls expanded
+Lresolved  executable symbols resolved to internal references
+Lhir       Larquil high-level IR
+Ljvm       JVM-oriented IR
+Lclass     emitted classfile bytes
+```
+
+Low-level template regions are still the low-level source sublanguage.
+
+---
+
+# Reader Additions
+
+Stage 3 adds reader syntax for quasiquote, unquote, and unquote-splicing.
+
+Reader syntax:
+
+```lisp
+`datum
+,expr
+,@expr
+```
+
+Reader expansion:
+
+```lisp
+`x     ; (quasiquote x)
+,x     ; (unquote x)
+,@x    ; (unquote-splicing x)
+```
+
+These are reader transformations only.
+`quasiquote`, `unquote`, and `unquote-splicing` are ordinary symbols in source
+forms.
+The macro layer gives them meaning.
+
+Examples:
+
+```lisp
+`(if ,test ,then ,else)
+`(do ,@body)
+```
+
+Rules:
+
+- the reader expands syntax only
+- `unquote` is meaningful only while expanding `quasiquote`
+- `unquote-splicing` is meaningful only while expanding a list inside
+  `quasiquote`
+- malformed quasiquote forms are macro-expansion errors
+- nested quasiquote may be deferred if bootstrap source does not need it
+
+---
+
+# Macro Tables
+
+A namespace may contain a macro table in addition to its runtime value table.
+
+The macro table maps local symbol names to macro transformer functions.
+The table is namespace/compiler state.
+It is not a public `Var`, `Binding`, or `Slot` object.
+
+Macro lookup:
+
+- unqualified macro names are looked up in the current namespace macro table
+- qualified macro names are looked up in the named namespace macro table
+- macro lookup in operator position happens before ordinary lexical/operator
+  resolution
+- macro lookup happens before ordinary function-call lowering
+
+There is no delayed resolution for macro names.
+A macro must be installed before a form using that macro is expanded.
+
+---
+
+# Macro Transformers
+
+Macro transformers are Larquil functions.
+
+Initial transformer ABI:
+
+```lisp
+(function (form env)
+  ...)
+```
+
+Arguments:
+
+- `form` is the full source form being expanded
+- `env` is an expansion environment object, or `null` until such an object
+  exists
+
+Return value:
+
+- replacement source data
+
+Rules:
+
+- transformer input contains `Symbol`, `ArrayList`, strings, integers,
+  booleans, and `null`
+- transformer output must be valid source data
+- returned source is expanded again until no macro call remains at the current
+  expansion point
+- a macro returning `'foo` returns an unqualified symbol
+- that symbol resolves at the expansion site unless the macro qualifies it or
+  generates a fresh symbol
+- macros should emit qualified symbols for intended global references
+- macros should use generated symbols for fresh locals
+
+---
+
+# Manual Macro Registration
+
+`defmacro` is not primitive.
+It is bootstrapped through explicit compile-time registration.
+
+Macro registration used by later forms in the same compilation unit must happen
+during compilation, not only when the generated loader is run.
+
+Conceptual manual registration body:
+
+```lisp
+(function ()
+  (expr
+    (function (form env)
+      ;; transformer body
+      ...))
+  (store transformer)
+  (expr 'when)
+  (load transformer)
+  (invokestatic "com/tailrecursion/larquil/stage0/BootstrapRuntime"
+                "defineMacro"
+                "(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;")
+  (return))
+```
+
+The exact helper owner/name/descriptors are provisional bootstrap support.
+The required behavior is that bootstrap source can arrange for this body to run
+during compilation before later forms that use the macro are expanded.
+
+Once this exists, `defmacro` can be defined as a macro whose expansion emits
+compile-time macro registration code.
+
+---
+
+# Gensym
+
+Stage 3 exposes generated symbols for macro hygiene.
+
+After bootstrap, the facility should be available as:
+
+```lisp
+larquil.boot/gensym
+```
+
+Examples:
+
+```lisp
+(larquil.boot/gensym)
+(larquil.boot/gensym "tmp")
+```
+
+Rules:
+
+- each call returns a generated symbol
+- generated symbols cannot collide with reader-created source symbols
+- generated symbols remain usable in quoted and quasiquoted forms
+- generated symbol equality is by generated identity, not only by printed text
+- gensym output is unique within one compiler/load session
+
+---
+
+# Derived Forms
+
+The following forms are expected to be defined as macros:
+
+```lisp
+def
+defmacro
+fn
+let
+let*
+if
+do
+set!
+while
+block
+return-from
+return
+```
+
+These forms may become compiler-recognized later for diagnostics or
+optimization.
+They should not become permanent primitives merely for implementation
+convenience.
+
+---
+
+# `defmacro`
+
+Example:
+
+```lisp
+(defmacro when (test &body body)
+  `(if ,test
+       (do ,@body)
+       null))
+```
+
+Expected behavior:
+
+1. create a transformer function
+2. install it in the current namespace macro table
+3. make it available to later forms in the same compilation unit
+4. return an implementation-defined compile-time value
+
+`defmacro` itself is defined only after manual macro registration is available.
+
+Minimum macro lambda-list support:
+
+- fixed positional parameters
+- optional `&body` as an alias for the remaining forms
+
+Example:
+
+```lisp
+(defmacro ignore1 (x)
+  null)
+
+(defmacro when (test &body body)
+  ...)
+```
+
+Full Common Lisp lambda lists are not required in Stage 3.
+
+---
+
+# `def`
+
+Syntax:
+
+```lisp
+(def name value)
+```
+
+Example:
+
+```lisp
+(def answer 42)
+```
+
+Expected behavior:
+
+1. evaluate `value`
+2. assign the result into the current namespace value table under `name`
+3. return the value
+
+Before `def` exists, the same capability may be expressed by explicit top-level
+code:
+
+```lisp
+((function ()
+  (expr 42)
+  (store value)
+  (expr 'answer)
+  (load value)
+  (invokestatic "com/tailrecursion/larquil/stage0/BootstrapRuntime"
+                "defineValue"
+                "(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;")
+  (return)))
+```
+
+The exact helper names are provisional bootstrap support.
+
+---
+
+# `fn`
+
+Syntax:
+
+```lisp
+(fn (param ...)
+  body...)
+```
+
+Expected expansion shape:
+
+```lisp
+(function (param ...)
+  body...)
+```
+
+Example:
+
+```lisp
+(fn (x)
+  (larquil.core/+ x 1))
+```
+
+The first implementation may expand `fn` bodies into Stage 2-compatible
+`function` bodies with `expr` boundaries where needed.
+For example, a single-expression body may lower as:
+
+```lisp
+(function (param ...)
+  (expr body)
+  (return))
+```
+
+Multiple expression bodies may require `do` to be installed first, or may be
+temporarily restricted during bootstrap.
+The long-range direction is ordinary expression-bodied functions.
+
+---
+
+# `let` And `let*`
+
+Parallel `let`:
+
+```lisp
+(let ((x a)
+      (y b))
+  body...)
+```
+
+Expected expansion shape:
+
+```lisp
+((function (x y)
+   body...)
+ a
+ b)
+```
+
+Rules:
+
+- initializers evaluate in the incoming environment
+- bindings are established only for the body
+- lexical shadowing follows from function parameter scope
+
+As with `fn`, an initial implementation may lower the function body through
+`expr` and `do`:
+
+```lisp
+((function (x y)
+   (expr (do body...))
+   (return))
+ a
+ b)
+```
+
+Sequential `let*`:
+
+```lisp
+(let* ((x a)
+       (y (larquil.core/+ x 1)))
+  body...)
+```
+
+Expected expansion shape:
+
+```lisp
+(let ((x a))
+  (let ((y (larquil.core/+ x 1)))
+    body...))
+```
+
+---
+
+# `if`
+
+Syntax:
+
+```lisp
+(if test then else)
+```
+
+Expected behavior:
+
+- evaluate `test`
+- if truthy, evaluate `then`
+- otherwise evaluate `else`
+- return the selected branch value
+
+Truth remains Stage 2-compatible:
+
+- `null` is falsey
+- `Boolean.FALSE` is falsey
+- all other values are truthy
+
+The first implementation may expand `if` to generated labels, temporaries, and
+low-level template branches.
+The compiler may later recognize the expansion for destination-aware lowering.
+
+Stage 2-compatible expansion shape:
+
+```lisp
+((function ()
+  (expr test)
+  (jump-if-false else)
+  (expr then)
+  (return)
+  (label else)
+  (expr else)
+  (return)))
+```
+
+The actual expansion must generate labels that cannot collide with user labels.
+
+---
+
+# `do`
+
+Syntax:
+
+```lisp
+(do form...)
+```
+
+Expected behavior:
+
+- evaluate forms left-to-right
+- discard intermediate values
+- return the final value
+- return `null` when there are no forms
+
+Example:
+
+```lisp
+(do
+  (print "start")
+  (larquil.core/+ 1 2))
+```
+
+Stage 2-compatible expansion shape:
+
+```lisp
+((function ()
+  (expr form1)
+  (pop)
+  ...
+  (expr final-form)
+  (return)))
+```
+
+For an empty `do`, the expansion returns `null`.
+
+---
+
+# `set!`
+
+Syntax:
+
+```lisp
+(set! place value)
+```
+
+Stage 3 only requires lexical variable assignment.
+
+Example:
+
+```lisp
+(let ((x 1))
+  (set! x 2)
+  x)
+```
+
+Rules:
+
+- `place` must resolve to a mutable lexical binding
+- assigning an immutable binding is a compile-time error
+- assigning an unresolved namespace reference is not required
+- namespace assignment should use explicit namespace helper calls until a place
+  model exists
+
+---
+
+# Local Exits
+
+The expected surface forms are:
+
+```lisp
+(block name body...)
+(return-from name value)
+(return value)
+```
+
+Stage 3 may defer full implementation of these forms.
+
+Local-only cases may expand to generated labels and temporaries.
+Nonlocal exits across closures require additional compiler/runtime support and
+must not be hidden behind an unsound macro expansion.
+
+If `return` is provided, it should be equivalent to returning from the nearest
+implicit block established by a function body, or from an explicitly specified
+policy if that implicit block rule is not yet implemented.
+
+---
+
+# Delayed Runtime Resolution
+
+Stage 3 keeps the Stage 2 delayed namespace reference model for ordinary
+runtime values.
+
+Example:
+
+```lisp
+(def even?
+  (fn (n)
+    (if (larquil.core/= n 0)
+        true
+        (odd? (larquil.core/- n 1)))))
+
+(def odd?
+  (fn (n)
+    (if (larquil.core/= n 0)
+        false
+        (even? (larquil.core/- n 1)))))
+```
+
+This source should not require a Clojure-style forward declaration for `odd?`.
+The delayed reference is a namespace reference created by the use of `odd?`
+inside the value assigned by `def even?`.
+It is not a consequence of named top-level `function` by itself.
+
+Rules:
+
+- unresolved ordinary runtime references are allowed during compilation
+- a compilation unit may warn about references that remain unresolved when the
+  unit ends
+- evaluating a still-unbound namespace reference at runtime is an
+  undefined-name error
+- macro names are not delayed in the same way
+
+---
+
+# Compiler Model
+
+Stage 3 preserves the Stage 2 IR boundaries and adds macro expansion before
+resolution.
+
+Minimum pipeline:
+
+1. read source into `Lread`
+2. expand quote/quasiquote reader syntax into ordinary forms
+3. recognize primitive surface forms and reader-expanded forms
+4. run ordered macro expansion to produce `Lexpanded`
+5. resolve executable symbols to compiler-private references
+6. lower to Larquil HIR
+7. analyze functions, locals, captures, and low-level regions
+8. lower HIR to JVM IR
+9. emit class files
+
+Required invariants:
+
+- after macro expansion, no unexpanded macro calls remain in compiled source
+- macro expansion is ordered and phase-sensitive
+- macro expansion happens before ordinary symbol resolution
+- quoted data remains unexpanded data unless a macro explicitly constructs new
+  source from it
+- generated symbols remain distinct from reader-created symbols
+- executable symbols do not survive as raw symbols after resolution
+- delayed namespace references do not require forward declarations
+- closure capture information is explicit before JVM lowering
+- low-level template regions still pass through IR verification
+
+---
+
+# Error Behavior
+
+Reader errors:
+
+- malformed strings
+- unsupported string escapes
+- malformed qualified symbols
+- malformed reader quasiquote syntax
+- unmatched parentheses
+
+Macro-expansion errors:
+
+- macro used before it is installed
+- macro transformer does not return valid source data
+- `unquote` outside quasiquote expansion
+- `unquote-splicing` outside list quasiquote expansion
+- malformed derived-form syntax
+
+Compile-time errors:
+
+- duplicate function parameters
+- duplicate template labels in one function
+- invalid template instruction shape
+- `expr` outside a low-level template region
+- `expr` with the wrong number of operands
+- assignment to an immutable lexical binding
+- closure capture shape the compiler cannot represent
+
+Compile-time warnings:
+
+- unresolved namespace references remaining at compilation-unit end
+- implementation-specific warnings for dynamic namespace calls or missed direct
+  calls may be added later
+
+Runtime errors:
+
+- wrong function arity
+- evaluating an unbound namespace reference
+- invoking a non-callable value
+- JVM verification or linkage errors caused by invalid generated bytecode
+
+---
+
+# Examples
+
+## Macro Definition
+
+```lisp
+(defmacro when (test &body body)
+  `(if ,test
+       (do ,@body)
+       null))
+```
+
+## Function Definition
+
+```lisp
+(def add1
+  (fn (x)
+    (larquil.core/+ x 1)))
+```
+
+## Local Binding
+
+```lisp
+(let ((x 1)
+      (y 2))
+  (larquil.core/+ x y))
+```
+
+## Recursive Definitions
+
+```lisp
+(def fact
+  (fn (n)
+    (if (larquil.core/<= n 1)
+        1
+        (larquil.core/* n (fact (larquil.core/- n 1))))))
+```
+
+## Low-Level Escape Hatch
+
+```lisp
+(def print
+  (function (x)
+    (getstatic "java/lang/System" "out" "Ljava/io/PrintStream;")
+    (load x)
+    (invokevirtual "java/io/PrintStream" "println" "(Ljava/lang/Object;)V")
+    (aconst-null)
+    (return)))
+```
+
+---
+
+# Implementation Discipline
+
+Stage 3 implementation should preserve the bootstrap style:
+
+- prefer Larquil source over Java support where practical
+- keep Java runtime support explicit and small
+- keep namespace and macro helper APIs narrow
+- keep compiler data Larquil-shaped where possible
+- make compiler passes explicit and testable
+- preserve source locations where practical
+- do not expose implementation-private namespace slots as public language
+  objects
+- do not attach compiler meaning to source symbols
+- do not make macro-defined forms permanent primitives merely for convenience
+- do not let low-level template code bypass IR verification
+
+Temporary host support must be documented if Stage 3 source depends on it.
+
+---
+
+# Stage 3 Done Checklist
+
+Stage 3 is done when all items below are true:
+
+- Stage 2-compatible examples still compile and run
+- the reader expands quasiquote, unquote, and unquote-splicing syntax
+- `quasiquote` expands lists, symbols, primitive literals, `unquote`, and
+  `unquote-splicing`
+- malformed quasiquote forms are errors
+- generated symbols do not collide with reader-created symbols
+- `larquil.boot/gensym` or equivalent bootstrap access exists
+- macro transformers can be manually registered
+- registered macros expand later forms in the same compilation unit
+- macro use before macro definition is an expansion error
+- `defmacro` can be bootstrapped through manual macro registration
+- `def` can be bootstrapped as a macro
+- `fn`, `let`, `let*`, `if`, and `do` can be bootstrapped as macros
+- lexical `set!` can be bootstrapped or clearly deferred with an error
+- ordinary runtime forward references do not require declarations
+- compiler pipeline distinguishes reader data, surface AST, expanded source,
+  resolved AST, Larquil HIR, JVM IR, and classfile emission
+- macro expansion preserves Stage 2 symbol and namespace semantics
+- low-level template regions still pass Stage 2 verification
+- Stage 3 compiler artifacts pass JVM verification when loaded