| author | Alan Dipert
<alan@dipert.org> 2025-12-04 06:00:05 UTC |
| committer | Alan Dipert
<alan@dipert.org> 2025-12-04 06:00:05 UTC |
| parent | b75806ad09157a11825fbfc45579b21105011168 |
| README.md | +15 | -4 |
diff --git a/README.md b/README.md index 2a48c24..21944e3 100644 --- a/README.md +++ b/README.md @@ -3,18 +3,21 @@ ALAN is a fully self-contained artificial-language aptitude assessment inspired by DLAB-style tasks. It generates a consistent micro-grammar, produces a 32-item multiple-choice test, renders a booklet and answer key, and validates every form against strict grammatical and psychometric properties. The goal is to measure how quickly and accurately someone can infer and apply unfamiliar language rules—skills that map to disciplined software reasoning (spec reading, edge-case handling, protocol compliance). ## What This Is + - **Purpose:** Measure rapid rule inference, pattern generalization, and attention to fine-grained grammatical cues—abilities correlated with learning new syntactic systems and with disciplined software engineering (spec reading, refactoring, edge-case handling). - **Format:** 32 multiple-choice items across sections that introduce rules, then test them with strictly grammatical distractors that differ by exactly one semantic/morphosyntactic feature (minimal pairs). - **Artifacts produced:** `generated_test.json` (canonical test), `test_booklet.txt` (questions only), `answer_key.txt` (answers with explanations). - **Dependencies:** Python 3 only, no external libraries. ## Why It Works (Theory & Inspirations) + - **DLAB-style artificial grammar learning:** Tasks that require inferring a controlled micro-grammar are classic measures of language-learning aptitude. ALAN uses a deterministic grammar with prefix stacking, fixed word order, and minimal irregularities to elicit rule induction rather than memorization. - **Psychometric design:** Each distractor is a grammatical minimal pair differing by one feature (tense, number, gender, role, adjective scope, regular vs irregular). This reduces guessing via surface errors and increases discriminative power. - **Reliability controls:** Property-based tests enforce grammar validity, one-correct-answer semantics, irregular contrasts, structural diversity (ditransitives, feminine-plural receivers, adjective-bearing NPs), and coverage quotas. - **Construct alignment with software practice:** Success requires precise rule following, rapid pattern spotting, and handling edge cases (irregulars)—abilities useful in commercial software roles (debugging, code review, protocol/spec compliance). ## Background Reading (Psychometrics & AGL) + - Artificial Grammar Learning: Reber, A. S. (1967). “Implicit learning of artificial grammars.” *Journal of Verbal Learning and Verbal Behavior*. [Wikipedia](https://en.wikipedia.org/wiki/Artificial_grammar_learning) - Language aptitude and grammar inference: Ellis, N. C. (2005). “At the interface: Dynamic interactions of explicit and implicit language knowledge.” *Studies in Second Language Acquisition*. - Item response theory and minimal pairs: Hambleton, R. K., & Swaminathan, H. (2010). *Item Response Theory: Principles and Applications*. [Wikipedia](https://en.wikipedia.org/wiki/Item_response_theory) @@ -27,6 +30,7 @@ make run # generates JSON, booklet, and key cat test_booklet.txt # view the booklet cat answer_key.txt # view the key ``` + - **Optional PDF output:** Requires `pandoc` plus a PDF engine (wkhtmltopdf or weasyprint recommended). Example: ```bash python3 render_text.py --in generated_test.json \ @@ -36,6 +40,7 @@ cat answer_key.txt # view the key If no PDF engine is available, the script will skip PDF generation and report the issue. The booklet/key are rendered as Markdown, so bullets/headings convert cleanly to PDF when an engine is present. ## Generate Different Hardness Levels + - **Standard (balanced):** ```bash python3 main.py --seed 424242 --out generated_test.json @@ -66,6 +71,7 @@ cat answer_key.txt # view the key If PDF engines are missing, PDF output is skipped; the Markdown text still renders correctly. ## Administering ALAN + 1. **Prepare materials:** Run `make run` to produce `test_booklet.txt` and `answer_key.txt`. Print or distribute the booklet only. 2. **Time:** 25–30 minutes is typical for 32 items; you can standardize at 30 minutes for comparability. 3. **Instructions to candidates:** @@ -75,6 +81,7 @@ If PDF engines are missing, PDF output is skipped; the Markdown text still rende 5. **Scoring:** 1 point per correct item, no guessing penalty. Max = 32. ## Interpreting Scores (Commercial Software Context) + These bands are informal heuristics, assuming proctored conditions and naïve candidates: - **27–32 (Excellent):** Strong rule-induction and precision. Likely excels at roles requiring rapid onboarding to new codebases, complex refactors, API/protocol design, formal verification, or compiler/infra work. - **22–26 (Strong):** Solid pattern learning and attention to detail. Suited to backend/product engineering, systems integration, data engineering; should pick up new stacks quickly. @@ -84,6 +91,7 @@ These bands are informal heuristics, assuming proctored conditions and naïve ca Do **not** use the score as a sole hiring gate; treat it as one data point alongside interviews, work samples, and references. ## How the Grammar Works (Cheat Sheet Summary) + - **Word order:** DOER – RECEIVER – (THEME if ‘give’) – VERB. Verb is last. - **Prefix stack on nouns:** `na` (receiver) + `mem` (feminine) + `leko` (plural) + noun; doer adds suffix `mur`. - **Adjectives:** Follow the noun they modify. @@ -93,6 +101,7 @@ Do **not** use the score as a sole hiring gate; treat it as one data point along - **Feminine plural:** `memleko + noun` for feminine humans only. ## Generation & Validation Pipeline + - **Canonical rendering:** All surfaces are built from feature structures through `language_spec.realize_sentence`. - **Minimal-pair distractors:** Each distractor clones the correct feature bundle and flips exactly one feature (tense, number, gender where applicable, adjective presence, role swap, or irregular toggle). Anything ungrammatical or semantically duplicate is rejected. - **Property tests (enforced on every generation):** @@ -107,22 +116,26 @@ Do **not** use the score as a sole hiring gate; treat it as one data point along - **Regeneration:** `main.py` will retry seeds until all properties pass; otherwise it fails loudly. Hardness knobs (`--min-irregular`, `--min-irregular-contrast`, `--min-ditransitive`, `--min-plural`, `--min-adjective`, `--min-fem-plural`, `--min-feature-load`) can be scaled in one go with `--hardness-multiplier` (e.g., `2.0` to roughly double thresholds). All chosen params are recorded in `generation_params` in the JSON and printed in the booklet for reproducibility. ## Proctoring Guidance + - Keep the cheat sheet and dictionary visible with the booklet; candidates should not need prior linguistics knowledge. - Do not give the answer key to candidates. Collect booklets before revealing answers. - If remote, time-box and supervise; ask candidates to share screen if feasible. ## Mapping to Roles (Examples) + - **Infra/Platform/Compilers:** Look for 27–32; high irregular handling and minimal-pair reasoning align with spec-heavy work. - **Backend/Product:** 22–26 suggests strong fit; quick uptake of API contracts and data models. - **QA/Automation/Release:** 17–21 can be effective with processes; use score to tailor onboarding (more scaffolding). - **Entry/Support:** ≤16 indicates need for structured training; avoid dropping into ambiguous, underspecified projects. ## Research & Inspirations + - **Artificial grammar learning (AGL):** Classic paradigm for measuring rule induction (Reber, 1967; more recent AGL studies). ALAN adapts AGL principles to a morphosyntactic micro-grammar. - **DLAB-style aptitude tests:** Uses controlled artificial languages to avoid prior knowledge effects and to test rapid pattern extraction. - **Psychometric good practice:** Minimal-pair distractors, single-key answers, controlled difficulty progression, and automated validation to reduce construct-irrelevant variance. ## Files Overview + - `language_spec.py` — Grammar, lexicon, canonical renderer, irregulars. - `test_blueprint.py` — Section structure and default blueprint. - `test_generator.py` — Feature-based item generation and minimal-pair distractors. @@ -134,15 +147,13 @@ Do **not** use the score as a sole hiring gate; treat it as one data point along - `answer_key.txt`, `test_booklet.txt`, `generated_test.json` — outputs from the last generation. ## Taking the Test (Candidate View) + - Read the cheat sheet and examples; note prefix order, adjective position, verb last, and irregulars. - For each question, compare options as minimal pairs: check tense marker, plural/gender markers, role markers (`na` + stack), adjective placement, and irregular forms. - Exactly one option fits the target meaning under the published rules. ## Limitations & Ethics + - This is a single data point; do not use it as a sole hiring filter. - Cultural and linguistic neutrality is intended but not guaranteed; ensure accessibility accommodations as needed. - Scores can be affected by test-taking anxiety or unfamiliarity with such tasks; interpret cautiously. - ---- - -For questions or contributions, open an issue or PR in this repository. ALAN is intentionally small, transparent, and reproducible to keep the construct clear and auditable.***