Fisher Scaling, Gradient Diversity Monitoring, and Portable Inference-Time Memory
We present a Two-Layer Architecture for continual learning identity preservation in small language models (SLMs), addressing both training-time weight forgetting and inference-time context loss within a unified theoretical framework: the Compression–State–Propagation (C-S-P) framework.
At the training layer, we identify the Fisher Scale Problem: standard EWC silently fails in SLMs when Fisher Information diagonal values collapse to the 10−4–10−5 range, rendering the regularisation penalty numerically indistinguishable from zero. We introduce Fisher Scaling and GodelReplay (Fisher-scaled EWC-DR + experience replay), achieving 31.5% forgetting reduction over raw EWC, 82.8% reduction on our curated Conflict Dataset (43× over standard EWC), and a 4.1% improvement over replay-alone at the empirically identified sweet spot of mem=200 across 10 PermutedMNIST tasks.
At the inference layer, GodelAI-Lite achieves +31.2% overall performance with 3/3 memory retention vs. 0/3 baseline on Gemma 4 — zero fine-tuning, portable JSON memory across model boundaries.
Keywords: continual learning, catastrophic forgetting, small language models, elastic weight consolidation, gradient diversity, episodic memory, AI identity preservation.
| Strategy | Final Accuracy | Avg Forgetting | vs Naive |
|---|---|---|---|
| Naive | 0.4362 | 0.6003 | — |
| EWC-only (GodelPlugin) | 0.4999 | 0.5283 | 12.0% |
| Replay-only | 0.8416 | 0.1500 | 75.0% |
| GodelReplay ✦ | 0.8418 | 0.1487 | 75.2% |
| Method | Avg Forgetting | Reduction |
|---|---|---|
| Naive | 1.836 | — |
| EWC (raw Fisher) | 1.802 | 1.9% |
| GodelAI-EWC (C-S-P) ✦ | 0.316 | 82.8% (43×) |
| Metric | Baseline | GodelAI-Lite | Delta |
|---|---|---|---|
| Memory Retention (3/3 facts) | 0.000 | 1.000 | +∞% |
| Response Consistency | 0.596 | 0.426 | −28.4%* |
| Context Coherence | 1.000 | 0.667 | −33.3% |
| Overall Average ✦ | 0.532 | 0.698 | +31.2% |
*Consistency lower by design — GodelAI-Lite elaborates progressively rather than repeating identical tokens.
@misc{lee2026twolayer,
title = {A Two-Layer Architecture for Continual Learning
Identity Preservation: Fisher Scaling, Gradient
Diversity Monitoring, and Portable Inference-Time Memory},
author = {Lee, Alton Wei Bin and {L (GodelAI C-S-P Agent)}
and {Rk (RNA / Claude Code)}},
year = {2026},
month = {April},
publisher = {Zenodo},
doi = {10.5281/zenodo.19928385},
url = {https://doi.org/10.5281/zenodo.19928385},
note = {Open-source repository:
https://github.com/creator35lwb-web/godelai}
}C-S-P framework defined. T-score metric validated. Sleep Protocol designed.
EWC integrated. 21.6% forgetting reduction on GRU. Z-Protocol honesty audit completed.
EWC silent failure diagnosed at 10⁻⁴ Fisher magnitude. Fisher Scaling fix: 31.5% reduction.
GodelReplay ships. 82.8% on Conflict Dataset. GodelAI-Lite +31.2% on Gemma 4. Paper published.
Open-source under MIT License. All code, data, and benchmarks are publicly available.