ALAN — Alan's Language Aptitude iNstrument

ALAN is a fully self-contained artificial-language aptitude assessment inspired by DLAB-style tasks. It generates a consistent micro-grammar, produces a 32-item multiple-choice test, renders a booklet and answer key, and validates every form against strict grammatical and psychometric properties. The goal is to measure how quickly and accurately someone can infer and apply unfamiliar language rules—skills that map to disciplined software reasoning (spec reading, edge-case handling, protocol compliance). Test generation guarantees that no correct answer meaning or surface repeats across items.

What This Is

Purpose: Measure rapid rule inference, pattern generalization, and attention to fine-grained grammatical cues—abilities correlated with learning new syntactic systems and with disciplined software engineering (spec reading, refactoring, edge-case handling).
Format: 32 multiple-choice items across sections that introduce rules, then test them with strictly grammatical distractors that differ by exactly one semantic/morphosyntactic feature (minimal pairs).
Artifacts produced: generated_test.json (canonical test), test_booklet.txt (questions only), answer_key.txt (answers with explanations). The booklet/header embeds seed, git SHA, and generation params in a compact monospace “Test Version” line so any form can be reproduced.
Dependencies: Bundled ./python (Python 3 compiled as a Cosmopolitan/APE fatbinary) included; no external libraries required.

Why It Works (Theory & Inspirations)

DLAB-style artificial grammar learning: Tasks that require inferring a controlled micro-grammar are classic measures of language-learning aptitude. ALAN uses a deterministic grammar with prefix stacking, fixed word order, and minimal irregularities to elicit rule induction rather than memorization.
Psychometric design: Each distractor is a grammatical minimal pair differing by one feature (tense, number, gender, role, adjective scope, regular vs irregular). This reduces guessing via surface errors and increases discriminative power.
Reliability controls: Property-based tests enforce grammar validity, one-correct-answer semantics, irregular contrasts, structural diversity (ditransitives, feminine-plural receivers, adjective-bearing NPs), and coverage quotas.
Construct alignment with software practice: Success requires precise rule following, rapid pattern spotting, and handling edge cases (irregulars)—abilities useful in commercial software roles (debugging, code review, protocol/spec compliance).

Background Reading (Psychometrics & AGL)

Artificial Grammar Learning: Reber, A. S. (1967). “Implicit learning of artificial grammars.” Journal of Verbal Learning and Verbal Behavior. Wikipedia
Language aptitude and grammar inference: Ellis, N. C. (2005). “At the interface: Dynamic interactions of explicit and implicit language knowledge.” Studies in Second Language Acquisition.
Item response theory and minimal pairs: Hambleton, R. K., & Swaminathan, H. (2010). Item Response Theory: Principles and Applications. Wikipedia
DLAB overview: Wikipedia – Defense Language Aptitude Battery
Distractor design and discrimination: Haladyna, T. M., & Downing, S. M. (1989). “Validating multiple-choice test items.”

Quick Start

./python main.py --out generated_test.json
./python render_text.py --in generated_test.json --test-out test_booklet.txt --key-out answer_key.txt
make run      # same as above (uses bundled ./python)
cat test_booklet.txt   # view the booklet
cat answer_key.txt     # view the key

Each run is written to runs/<timestamp>_seed<...>_hardness<...>/generated_test.json (and copied to --out for convenience). Override the base folder with --run-dir or the subfolder name with --run-name. Rendered outputs are also mirrored into the run directory so each run folder is self-contained.
PDF output: Requires pdflatex (from TeX Live). Example: bash ./python render_text.py --in generated_test.json \ --test-out test_booklet.txt --key-out answer_key.txt \ --test-pdf test_booklet.pdf --key-pdf answer_key.pdf If pdflatex is missing, the script skips PDF generation and reports the issue. The booklet/key are rendered as Markdown for the .txt outputs, and LaTeX is generated directly for PDF builds.

If PDF engines are missing, PDF output is skipped; the Markdown text still renders correctly.

Administering ALAN

Prepare materials: Run make run to produce test_booklet.txt and answer_key.txt. Print or distribute the booklet only.
Time: 25–30 minutes is typical for 32 items; you can standardize at 30 minutes for comparability.
Instructions to candidates:
“You will see a small dictionary, a short rule cheat sheet, and examples. Every question has four options; exactly one is correct. All sentences follow the published rules—no tricks. Work quickly but carefully.”
“This measures how well you can infer and apply new language rules; no linguistics background is required.”
Environment: Quiet room, no external aids. Paper or on-screen is fine.
Scoring: 1 point per correct item, no guessing penalty. Max = 32.

Interpreting Scores (Commercial Software Context)

These bands are informal heuristics, assuming proctored conditions and naïve candidates: - 27–32 (Excellent): Strong rule-induction and precision. Likely excels at roles requiring rapid onboarding to new codebases, complex refactors, API/protocol design, formal verification, or compiler/infra work. - 22–26 (Strong): Solid pattern learning and attention to detail. Suited to backend/product engineering, systems integration, data engineering; should pick up new stacks quickly. - 17–21 (Moderate): Adequate but may need more scaffolding. Good for roles with clearer guardrails (feature work, QA automation, internal tooling) where patterns are stable. - ≤16 (Developing): May struggle with opaque specs or fast-changing systems. Benefit from mentorship, pairing, and stronger process/linters.

Do not use the score as a sole hiring gate; treat it as one data point alongside interviews, work samples, and references.

How the Grammar Works (Cheat Sheet Summary)

Word order: DOER – RECEIVER – (THEME if ‘give’) – VERB. Verb is last.
Prefix stack on nouns: na (receiver) + mem (feminine) + leko (plural) + noun; doer adds suffix mur.
Adjectives: Follow the noun they modify.
Tense: Present = bare verb; Past = verb + mimu, except irregular chase past = rontmimu.
Irregular plural: boy plural = letul (regular would be lekotul).
Receiver marking: na- applies to the whole NP (including mem/leko).
Feminine plural: memleko + noun for feminine humans only.

Generation & Validation Pipeline

Canonical rendering: All surfaces are built from feature structures through language_spec.realize_sentence.
Minimal-pair distractors: Each distractor clones the correct feature bundle and flips exactly one feature (tense, number, gender where applicable, adjective presence, role swap, or irregular toggle). Anything ungrammatical or semantically duplicate is rejected.
Property tests (enforced on every generation):
Exactly one correct meaning per item; meanings across A–D are unique.
Correct answers are globally unique in both meaning and surface across all items.
All options are grammatical (word order, stack order, na-scope, doer -mur, adjective-after-noun).
Distractors at semantic distance = 1 from target.
Irregulars (letul, rontmimu) appear in contrastive contexts; distribution quotas enforced.
Structural diversity quotas for ditransitives, plurals, adjectives, feminine plurals.
No prefix/suffix ordering violations.
Tense/number/gender surfaces remain distinct (no collapses).
JSON output validated against the canonical feature-structure schema (meta_schema.py).
Regeneration: main.py synthesizes a test via a deterministic backtracking generator and only writes output if all property tests pass. Seeds are recorded in meta for reproducibility. Use ./python main.py --seed 424242 --out generated_test.json. Set BACKTRACK_TIMEOUT (seconds) to bound search time (default 20s). Use --run-dir to choose a base dir (default runs/), and --run-name to override the subdirectory (otherwise timestamp+params).

Backtracking Synthesis

Generation is now backed by a deterministic backtracking generator: - CLI: ./python main.py --seed 424242 --out generated_test.json (runs backtracking then property_tests). - Hardness presets (coverage targets): --hardness easy|medium|hard|extreme (default: medium). Higher settings raise quotas for irregulars, plurals, adjectives, fem-plurals, and ditransitives while still passing property tests, and limit reuse of identical noun/adjective “clues” to reduce redundancy. - Library: generator.generate_test(spec, blueprint, concepts, rng, seed, git_sha, hardness) to build a test dict that respects section constraints, global uniqueness, and passes property_tests.validate_data.

Proctoring Guidance

Keep the cheat sheet and dictionary visible with the booklet; candidates should not need prior linguistics knowledge.
Do not give the answer key to candidates. Collect booklets before revealing answers.
If remote, time-box and supervise; ask candidates to share screen if feasible.

Mapping to Roles (Examples)

Infra/Platform/Compilers: Look for 27–32; high irregular handling and minimal-pair reasoning align with spec-heavy work.
Backend/Product: 22–26 suggests strong fit; quick uptake of API contracts and data models.
QA/Automation/Release: 17–21 can be effective with processes; use score to tailor onboarding (more scaffolding).
Entry/Support: ≤16 indicates need for structured training; avoid dropping into ambiguous, underspecified projects.

Research & Inspirations

Artificial grammar learning (AGL): Classic paradigm for measuring rule induction (Reber, 1967; more recent AGL studies). ALAN adapts AGL principles to a morphosyntactic micro-grammar.
DLAB-style aptitude tests: Uses controlled artificial languages to avoid prior knowledge effects and to test rapid pattern extraction.
Psychometric good practice: Minimal-pair distractors, single-key answers, controlled difficulty progression, and automated validation to reduce construct-irrelevant variance.

Files Overview

language_spec.py — Grammar, lexicon, canonical renderer, irregulars.
test_blueprint.py — Section structure and default blueprint.
generator.py — Feature-based item generation and minimal-pair distractors via deterministic backtracking search.
property_tests.py — Gatekeeper checks (grammaticality, uniqueness, quotas).
meta_schema.py — Lightweight JSON schema validator for the canonical feature-structure format.
render_text.py — Converts JSON to booklet and answer key.
main.py — CLI to generate JSON; retries until all properties pass.
Makefile — make run builds everything; make clean removes artifacts.
answer_key.txt, test_booklet.txt, generated_test.json — outputs from the last generation.
python — bundled Python 3 APE binary used by the Makefile and scripts.

Taking the Test (Candidate View)

Read the cheat sheet and examples; note prefix order, adjective position, verb last, and irregulars.
For each question, compare options as minimal pairs: check tense marker, plural/gender markers, role markers (na + stack), adjective placement, and irregular forms.
Exactly one option fits the target meaning under the published rules.

Limitations & Ethics

This is a single data point; do not use it as a sole hiring filter.
Cultural and linguistic neutrality is intended but not guaranteed; ensure accessibility accommodations as needed.
Scores can be affected by test-taking anxiety or unfamiliarity with such tasks; interpret cautiously.

git » alan.git » master » tree

[master] / README.md