§13Training Large Event Models/ 13Live · Global

For teams training and fine-tuning frontier models

Train or fine-tune
point-in-time LLMs.

Static-corpus models hallucinate dated facts and saturate on benchmarks that leaked into their pretraining.

NOSIBLE WORLD is a dated, multilingual, replayable archive that fixes both.

§01/ 06

Static corpus vs point-in-time

Static corpora age. The world does not.

A frontier model trained on text that ends in October still answers questions asked in March. It either hallucinates a dated fact, or recites a benchmark answer that leaked into its training corpus. Dated retrieval fixes both.

[ Static · contaminated ]public eval · pre-2024 snapshot

Question

Which 1968 novel introduced the term “replicant”?

Model answer

Do Androids Dream of Electric Sheep?

// match in pretraining corpus

The question and the answer both appear verbatim in the pretraining set, so the model is reciting from memory rather than reasoning from evidence.

[ NOSIBLE · point-in-time ]as_of = 2026-04-15

Question

As of 2026-04-15, what fraction of Humanity’s Last Exam can the top open model solve?

Model answer

Resolves to events published before 2026-04-15. Cites each source by publication minute.

✓ fact dated to publication minute

The answer resolves to a record published after the eval snapshot. Replay the prompt at any past as_of date and the answer changes accordingly.

§02/ 06

Temporal reasoning & contamination · 2024–2026

Recent papers converge on the same fix.

Frontier LLMs lag humans on temporal reasoning, and static benchmarks now leak into pretraining. The fix is consistent across these papers: pin the training cutoff and score on events that resolve after it.

ARXIV · 2025Pham, Nguyen, Zunjare, Chen, Tseng & Vu

SealQA: Raising the Bar for Reasoning in Search-Augmented Language Models

Direct FreshQA successor where frontier LLMs score near zero on questions about post-cutoff events.

ARXIV · 2025Liu, Han, Yu, Li & You

Time-R1: Towards Comprehensive Temporal Reasoning in LLMs

Three-stage RL curriculum trains a 3B model that beats DeepSeek-R1 on future event prediction beyond knowledge cutoff.

ARXIV · 2025He, Lv, Manela & Wu

Chronologically Consistent Large Language Models

Trains ChronoBERT and ChronoGPT only on text predating each cutoff, proving strict temporal separation preserves NLP benchmarks.

ARXIV · 2025He, Lv, Manela & Wu

Instruction Tuning Chronologically Consistent Language Models

Instruction-tunes a chronologically consistent model family with fixed open weights, the SFT step most point-in-time papers skip.

ACL 2025 ORAL · 2025Li, Armandpour, Mirzadeh et al.

TiC-LM: A Web-Scale Benchmark for Time-Continual LLM Pretraining

114 Common Crawl dumps as a time-stratified pretraining benchmark, comparing meta-schedules and replay ratios for continual learning.

ARXIV · 2025Tan, Frati, Rao, Zhao & Suarez

TARDIS: Mitigating Temporal Misalignment via Representation Steering

Unsupervised representation editing shifts activations toward a target time period with no weight updates.

ARXIV · 2025Park, Zhang & Tanaka

New News: System-2 Fine-tuning for Robust Integration of New Knowledge

Documents the fine-tune versus in-context gap when models try to internalize fresh events. Self-QA pushes news into weights.

ARXIV · 2026Benhenda

Look-Ahead-Bench: A Standardized Benchmark of Look-ahead Bias in Point-in-Time LLMs for Finance

Finance-grade benchmark for point-in-time LLMs, measuring alpha decay across regimes to separate prediction from memorization.

▮ Field note

If your training corpus ends in October, your model lives in October. The world does not.

§03/ 06

NOSIBLE WORLD · the dated corpus

Every event dated to the minute. Every claim replayable to that minute.

One hundred million events mined from the open web, each one carrying a verified first-publication timestamp, persistent actor identifiers, full source evidence, and labels from seven independent ontologies.

100M+Events

300K+Sources

95Languages

30 yearsPoint-in-time

▮ Post-cutoff cases · 2024 to 2026

Regulation
Brussels · 2024·08
EU AI Act enters into force, imposing dataset and copyright disclosure on foundation models.
Capability
San Francisco · 2024·12
OpenAI announces o3 with frontier gains on ARC-AGI, resetting the reasoning benchmark frontier.
AI release
Hangzhou · 2025·01
DeepSeek-R1 open-weights release reprices US AI majors on a single Monday.
Copyright
N.D. Cal. · 2025·08
Bartz v. Anthropic settles for $1.5B over roughly 500K pirated training-corpus titles.
Discovery
SDNY · 2025·11
NYT v. OpenAI: court orders production of 20M ChatGPT logs to plaintiffs.
Benchmark
HLE · 2026·04
Humanity's Last Exam climbs from 10% to 46% in twelve months; static evals saturated.

▮ §04 · Why dates matter

Static corpora start aging the day they ship, and the models trained on them inherit the date. NOSIBLE WORLD is the fix: an open-web archive with every event dated to the publication minute, replayable to any past as_of.

Pretrain on the open web, dated to the minute.

§05/ 06

What you build

Datasets from one ledger.

Each one wires into your existing training and evaluation stack.

§01

Point-in-time pretraining slice

A chronologically ordered token stream where every document carries a verified first-publication timestamp. Replay it byte-for-byte at any past as_of date your eval requires.

§02

Contamination-resistant eval set

Pin a training cutoff, score against events that resolve after it. Forward-window questions grow with the ledger. Compatible with the ForecastBench and AntiLeak-Bench protocols.

§03

Fine-tuning data with timestamps

Instruction-response pairs where every cited fact carries its publication time, source, and language. The model is trained to refuse when the evidence post-dates its corpus.

§06 · Get started

NOSIBLE WORLD: a dated, replayable corpus for training and evaluating point-in-time LLMs.

Start Trial→

§13Training Large Event Models/ 13Live · Global

For teams training and fine-tuning frontier models

Train or fine-tune
point-in-time LLMs.

Static-corpus models hallucinate dated facts and saturate on benchmarks that leaked into their pretraining.

NOSIBLE WORLD is a dated, multilingual, replayable archive that fixes both.

Start Trial→See the research↓

§01/ 06

Static corpus vs point-in-time

Static corpora age. The world does not.

[ Static · contaminated ]public eval · pre-2024 snapshot

Question

Which 1968 novel introduced the term “replicant”?

Model answer

Do Androids Dream of Electric Sheep?

// match in pretraining corpus

The question and the answer both appear verbatim in the pretraining set, so the model is reciting from memory rather than reasoning from evidence.

[ NOSIBLE · point-in-time ]as_of = 2026-04-15

Question

As of 2026-04-15, what fraction of Humanity’s Last Exam can the top open model solve?

Model answer

Resolves to events published before 2026-04-15. Cites each source by publication minute.

✓ fact dated to publication minute

The answer resolves to a record published after the eval snapshot. Replay the prompt at any past as_of date and the answer changes accordingly.

§02/ 06

Temporal reasoning & contamination · 2024–2026

Every event dated to the minute. Every claim replayable to that minute.

100M+Events

300K+Sources

95Languages

30 yearsPoint-in-time

▮ Post-cutoff cases · 2024 to 2026

Regulation
Brussels · 2024·08
EU AI Act enters into force, imposing dataset and copyright disclosure on foundation models.
Capability
San Francisco · 2024·12
OpenAI announces o3 with frontier gains on ARC-AGI, resetting the reasoning benchmark frontier.
AI release
Hangzhou · 2025·01
DeepSeek-R1 open-weights release reprices US AI majors on a single Monday.
Copyright
N.D. Cal. · 2025·08
Bartz v. Anthropic settles for $1.5B over roughly 500K pirated training-corpus titles.
Discovery
SDNY · 2025·11
NYT v. OpenAI: court orders production of 20M ChatGPT logs to plaintiffs.
Benchmark
HLE · 2026·04
Humanity's Last Exam climbs from 10% to 46% in twelve months; static evals saturated.

▮ §04 · Why dates matter

Pretrain on the open web, dated to the minute.

§05/ 06

What you build

Datasets from one ledger.

Each one wires into your existing training and evaluation stack.

§01

Point-in-time pretraining slice

A chronologically ordered token stream where every document carries a verified first-publication timestamp. Replay it byte-for-byte at any past as_of date your eval requires.

§02

Contamination-resistant eval set

Pin a training cutoff, score against events that resolve after it. Forward-window questions grow with the ledger. Compatible with the ForecastBench and AntiLeak-Bench protocols.

§03

Fine-tuning data with timestamps

Instruction-response pairs where every cited fact carries its publication time, source, and language. The model is trained to refuse when the evidence post-dates its corpus.

§06 · Get started

NOSIBLE WORLD: a dated, replayable corpus for training and evaluating point-in-time LLMs.

Start Trial→

Train or fine-tunepoint-in-time LLMs.

Static corpora age. The world does not.

Recent papers converge on the same fix.

SealQA: Raising the Bar for Reasoning in Search-Augmented Language Models

Time-R1: Towards Comprehensive Temporal Reasoning in LLMs

Chronologically Consistent Large Language Models

Instruction Tuning Chronologically Consistent Language Models

TiC-LM: A Web-Scale Benchmark for Time-Continual LLM Pretraining

TARDIS: Mitigating Temporal Misalignment via Representation Steering

New News: System-2 Fine-tuning for Robust Integration of New Knowledge

Look-Ahead-Bench: A Standardized Benchmark of Look-ahead Bias in Point-in-Time LLMs for Finance

Every event dated to the minute. Every claim replayable to that minute.

Datasets from one ledger.

Point-in-time pretraining slice

Contamination-resistant eval set

Fine-tuning data with timestamps

Train or fine-tunepoint-in-time LLMs.

Static corpora age. The world does not.

Recent papers converge on the same fix.

SealQA: Raising the Bar for Reasoning in Search-Augmented Language Models

Time-R1: Towards Comprehensive Temporal Reasoning in LLMs

Chronologically Consistent Large Language Models

Instruction Tuning Chronologically Consistent Language Models

TiC-LM: A Web-Scale Benchmark for Time-Continual LLM Pretraining

TARDIS: Mitigating Temporal Misalignment via Representation Steering

New News: System-2 Fine-tuning for Robust Integration of New Knowledge

Look-Ahead-Bench: A Standardized Benchmark of Look-ahead Bias in Point-in-Time LLMs for Finance

Every event dated to the minute. Every claim replayable to that minute.

Datasets from one ledger.

Point-in-time pretraining slice

Contamination-resistant eval set

Fine-tuning data with timestamps

Train or fine-tune
point-in-time LLMs.

Train or fine-tune
point-in-time LLMs.