git » alan.git » master » tree

[master] / README.md

ALAN — Alan's Language Aptitude iNstrument

ALAN is a fully self-contained artificial-language aptitude assessment inspired by DLAB-style tasks. It generates a consistent micro-grammar, produces a 32-item multiple-choice test, renders a booklet and answer key, and validates every form against strict grammatical and psychometric properties. The goal is to measure how quickly and accurately someone can infer and apply unfamiliar language rules—skills that map to disciplined software reasoning (spec reading, edge-case handling, protocol compliance). Test generation guarantees that no correct answer meaning or surface repeats across items.

What This Is

Why It Works (Theory & Inspirations)

Background Reading (Psychometrics & AGL)

Quick Start

./python main.py --out generated_test.json
./python render_text.py --in generated_test.json --test-out test_booklet.txt --key-out answer_key.txt
make run      # same as above (uses bundled ./python)
cat test_booklet.txt   # view the booklet
cat answer_key.txt     # view the key

If PDF engines are missing, PDF output is skipped; the Markdown text still renders correctly.

Administering ALAN

  1. Prepare materials: Run make run to produce test_booklet.txt and answer_key.txt. Print or distribute the booklet only.
  2. Time: 25–30 minutes is typical for 32 items; you can standardize at 30 minutes for comparability.
  3. Instructions to candidates:
  4. “You will see a small dictionary, a short rule cheat sheet, and examples. Every question has four options; exactly one is correct. All sentences follow the published rules—no tricks. Work quickly but carefully.”
  5. “This measures how well you can infer and apply new language rules; no linguistics background is required.”
  6. Environment: Quiet room, no external aids. Paper or on-screen is fine.
  7. Scoring: 1 point per correct item, no guessing penalty. Max = 32.

Interpreting Scores (Commercial Software Context)

These bands are informal heuristics, assuming proctored conditions and naïve candidates: - 27–32 (Excellent): Strong rule-induction and precision. Likely excels at roles requiring rapid onboarding to new codebases, complex refactors, API/protocol design, formal verification, or compiler/infra work. - 22–26 (Strong): Solid pattern learning and attention to detail. Suited to backend/product engineering, systems integration, data engineering; should pick up new stacks quickly. - 17–21 (Moderate): Adequate but may need more scaffolding. Good for roles with clearer guardrails (feature work, QA automation, internal tooling) where patterns are stable. - ≤16 (Developing): May struggle with opaque specs or fast-changing systems. Benefit from mentorship, pairing, and stronger process/linters.

Do not use the score as a sole hiring gate; treat it as one data point alongside interviews, work samples, and references.

How the Grammar Works (Cheat Sheet Summary)

Generation & Validation Pipeline

Backtracking Synthesis

Generation is now backed by a deterministic backtracking generator: - CLI: ./python main.py --seed 424242 --out generated_test.json (runs backtracking then property_tests). - Hardness presets (coverage targets): --hardness easy|medium|hard|extreme (default: medium). Higher settings raise quotas for irregulars, plurals, adjectives, fem-plurals, and ditransitives while still passing property tests, and limit reuse of identical noun/adjective “clues” to reduce redundancy. - Library: generator.generate_test(spec, blueprint, concepts, rng, seed, git_sha, hardness) to build a test dict that respects section constraints, global uniqueness, and passes property_tests.validate_data.

Proctoring Guidance

Mapping to Roles (Examples)

Research & Inspirations

Files Overview

Taking the Test (Candidate View)

Limitations & Ethics