Covering as much of human scientific knowledge as possible, encoded as procedural reasoning problems. The goal: guide AI along the same path of discovery that humans have walked - from first principles to frontier research.
A model trained on this curriculum doesn't just get better at maths. It climbs a ladder from mechanical computation to creative reasoning - the same ladder humans climb through years of education.
Install and start generating in under a minute. See Samples for real output with rendered LaTeX.
from engram_generator.curriculum.registry import get_generator gen = get_generator("rsa_encrypt", min_difficulty=3, max_difficulty=5) samples = gen.generate(100) for sample in samples[:3]: print(f"Input: {sample.input_text}") print(f"Target: {sample.target_text}") print(f"Answer: {sample.answer}")
from engram_generator.curriculum.registry import get_all_generators from engram_generator.curriculum.skill_tree import SkillTree generators = get_all_generators() tree = SkillTree(generators, retention_ratio=0.1) # See what's unlocked print(tree.get_unlocked_tasks()) # Level up by proving mastery events = tree.update({"addition": 0.97, "subtraction": 0.85})
from engram_generator.curriculum.reasoning_patterns import ( get_pattern_weights, get_pattern_summary, ) gens = get_all_generators() weights = get_pattern_weights(gens) # Each of the 26 reasoning patterns gets equal training exposure summary = get_pattern_summary(gens) for pattern, count in sorted(summary.items(), key=lambda x: -x[1])[:5]: print(f"{pattern}: {count} generators -> 3.8% of training")
The state space is larger than the observable universe. Every model, at every scale, must learn the algorithms.
| Model | Parameters | Can Memorise | Coverage of 1081 |
|---|---|---|---|
| GPT-2 | 124M | ~134,000 | 10-76 |
| Llama-2 7B | 7B | ~7.5M | 10-74 |
| Llama-2 70B | 70B | ~75M | 10-73 |
| GPT-4 (est.) | ~1.8T | ~1.9B | 10-72 |
| Llama-3.1 405B | 405B | ~438M | 10-72 |
The entire curriculum is 1.85 MB of compressed algorithms, but produces terabytes of unique instances. A compression ratio of 1,250,000:1. The only winning strategy is to learn the algorithms.
The breadth of formalised human knowledge, encoded as reasoning problems.
135 tokens. Every character is its own token. No BPE. No subword merging. Digits stay atomic. LaTeX stays intact. The model learns to read and write mathematical notation as a native language.