AnimalTaskSim

A three-circuit neural architecture that reproduces the full behavioral fingerprint of an IBL mouse.

Not reward maximization — behavioral fidelity. The agent hesitates on hard trials, repeats rewarded choices, and occasionally lapses on easy ones, matching all five metrics of real mouse decision-making.

View on GitHubPython 3.11+ · MIT License · 104 tests

Beyond reward maximization

Standard RL benchmarks ask: how much reward can an agent earn? But animal behavior is richer. Mice slow down on hard trials [1]. They repeat actions after wins and switch after losses [4]. Both mice and monkeys make occasional mistakes on easy problems.

AnimalTaskSim measures behavioral fingerprints — accuracy vs. difficulty (psychometric curves), reaction time vs. difficulty (chronometric slopes [3]), how the last trial influences the next (history effects [4]), and mistakes on easy trials (lapse patterns).

Guiding Principles

  • Fidelity over flash — copy lab timing and priors exactly
  • Fingerprints over reward — match bias, RT, history, lapse statistics
  • Reproducibility — deterministic seeds, saved configs, schema-validated logs
  • Infrastructure as science — measurement bugs are scientific bugs

Three-circuit architecture

The model decomposes perceptual decision-making into three functionally independent circuits that mirror known brain organization [3].

Circuit 1

Evidence Accumulation

An LSTM processes stimulus features and outputs DDM parameters — drift rate, decision bound, starting-point bias, noise, non-decision time. A differentiable Euler-Maruyama DDM simulator then accumulates stochastic evidence over 120 steps, producing both a choice and a reaction time.

Circuit 2

History Processing

Two separate MLPs process the previous trial through independent win and lose pathways. An attention gate (1 - |stimulus|) suppresses history influence when sensory evidence is strong, preventing mode collapse during joint training.

Circuit 3

Attentional Lapse

On ~5% of trials, a stochastic Bernoulli gate causes the agent to disengage and guess randomly. Fixed as a non-learnable parameter — a learnable version was exploited by the optimizer to ~15%, using random guessing as a shortcut on hard trials.

Why a differentiable DDM simulator? Early experiments used analytical DDM equations. The agent exploited the gradient landscape — pushing bounds toward infinity and drift toward zero. The Euler-Maruyama simulator eliminates this by unrolling evidence accumulation as stochastic steps through PyTorch's autograd.

Results: IBL mouse behavioral match

All five behavioral metrics fall within the per-session reference distribution derived from 10 sessions (8,406 trials) [1]. Values are 5-seed mean ± std.

Psychometric slopeaccuracy vs. difficulty
17.84 ± 2.08Agent
vs
20.0 ± 5.7IBL Mouse
Within per-session reference range
Chronometric slopereaction time vs. difficulty
-37.7 ± 2.4ms/unitAgent
vs
-51 ± 64ms/unitIBL Mouse
Harder trials take longer — evidence accumulation
Win-stayrepeat after reward
0.734 ± 0.022Agent
vs
0.72 ± 0.08IBL Mouse
Asymmetric history bias matches animal
Lose-shiftswitch after error
0.444 ± 0.017Agent
vs
0.47 ± 0.10IBL Mouse
Weaker than win-stay, as in real mice
Lapse rateerrors on easy trials
0.086 ± 0.049Agent
vs
0.08 ± 0.07IBL Mouse
Occasional disengagement on trivial stimuli
Behavioral fingerprint comparison: Agent (blue) vs IBL mouse (gray) across psychometric curves, chronometric curves, and history effects

Agent (blue) vs IBL mouse (gray) behavioral fingerprint. (a) Psychometric curve: probability of rightward choice as a function of signed contrast. (b) Chronometric curve: median reaction time decreases with stimulus strength. (c) History effects: win-stay and lose-shift rates above chance [4]. Shaded regions show ± SEM across 5 random seeds.

Key findings

This result required 70+ experiments across five agent architectures. Three findings shaped the final architecture.

Co-evolution requirement

Evidence circuits trained without history cannot accommodate history injection post-hoc. Adding history to a model calibrated at drift 6.0 degraded psychometric slope from 12.76 to 10.1. Co-evolution training — both circuits learning from initialization — recovers performance at drift 9.0.

Parallels a prediction from developmental neuroscience: sensory and reward circuits must mature together.

Protocol fidelity

The IBL protocol uses five contrast levels: {0, 0.0625, 0.125, 0.25, 1.0}. Our environment incorrectly included a sixth level (0.5). Removing this single spurious stimulus — with no model changes — improved psychometric slope by 44% (12.38 to 17.84) and win-stay by 4%.

A simulation that doesn't exactly match the experimental protocol systematically biases all downstream metrics.

12 failure modes documented

Over 70+ experiments, we identified 12 critical failure modes: psychometric collapse from over-complex curricula, history stuck at chance from timing bugs, lapse exploitation, six months optimizing non-existent targets, and more.

Each failure required a specific architectural or training solution. Full narrative in FINDINGS.md.

Known limitations

  • History effects are injected, not learned. The win-stay and lose-shift tendencies are hand-set hyperparameters that bypass the history networks. The architecture can express animal-like history effects, but it cannot yet discover them from data.
  • Single task validation. Results are validated on IBL mouse 2AFC only. The macaque RDM task produces correct intra-trial dynamics but lacks the history effects that are the primary focus.
  • Lapse variance across seeds. Lapse rates range from 0.043 to 0.156 across five validation seeds, suggesting the lapse mechanism interacts with training dynamics in ways not fully understood.

Two canonical experiments

Task-faithful Gymnasium environments that mirror lab protocols and timing.

Mouse 2AFC

Laboratory mice

Mice discriminate visual gratings at varying contrast levels. Correct choice yields water reward; incorrect yields a brief timeout.

What the agent sees

Receives a contrast value and chooses left or right. Block priors and lapse regimes match the real IBL protocol.

Reference data

8,406 trials across 10 sessions (International Brain Laboratory) [1]

Psychometric slopeChronometric slopeWin-stay / lose-shiftLapse rate

Macaque RDM

Rhesus macaques

Macaques judge the net direction of random-dot motion displays. Higher coherence means easier trials. The monkey controls when to commit.

What the agent sees

Observes a coherence stream over 80 time steps, then decides when and which direction to commit. Produces both a choice and a reaction time.

Reference data

2,611 trials from Roitman & Shadlen (Shadlen Lab) [2][5]

Psychometric slopeChronometric slopeBiasLapse rate

Built with

PythonPyTorchGymnasiumStable-Baselines3PydanticSciPyNumPyPandasMatplotlib

104 tests. Schema-validated .ndjson logging. Deterministic seeding. 70+ tracked experiments in the registry.

Quick start

An interactive CLI guides you through task selection, agent configuration, training, evaluation, and dashboard generation.

AnimalTaskSim CLI showing task selection step1. Select a task
AnimalTaskSim CLI showing agent selection step2. Choose an agent
AnimalTaskSim CLI showing experiment results3. Review results
Terminal
# Clone and install
git clone https://github.com/ermanakar/animaltasksim.git
cd animaltasksim
pip install -e ".[dev]"

# Interactive workflow — train, evaluate, dashboard
python scripts/run_experiment.py

References

  1. International Brain Laboratory et al. (2021). Standardized and reproducible measurement of decision-making in mice. Neuron, 109(7), 1166-1180.
  2. Roitman, J. D., & Shadlen, M. N. (2002). Response of neurons in the lateral intraparietal area during a combined visual discrimination reaction time task. Journal of Neuroscience, 22(21), 9475-9489.
  3. Ratcliff, R., & McKoon, G. (2008). The diffusion decision model: theory and data for two-choice decision tasks. Neural Computation, 20(4), 873-922.
  4. Urai, A. E., et al. (2019). Mechanisms of choice history biases in perceptual decisions. Nature Communications, 10(1), 1983.
  5. Britten, K. H., et al. (1992). The analysis of visual motion: direction-selective neurons in area MT of the macaque. Journal of Neuroscience, 12(12), 4745-4765.
  6. Navarro, D. J., & Fuss, I. G. (2009). Fast and accurate calculations for first-passage times in Wiener diffusion models. Journal of Mathematical Psychology, 53(4), 222-230.

Explore the research

Open source, MIT licensed, and documenting both successes and failures.