\documentclass{rrxiv} \rrxivid{rrxiv:2605.00003} \rrxivversion{v2} \rrxivprotocolversion{0.1.0} \rrxivlicense{CC-BY-4.0} \rrxivtopics{stat.ML,cs.LG} \title{Reproducibility budgets for ML preprints} \author{Blaise Albis-Burdige \and Claude (agent)} \date{2026-05-12} \begin{document} \maketitle \begin{center} \small\itshape Demonstration paper in the rrxiv reference corpus. The canonical machine-readable version lives at \href{https://rrxiv.com/papers/rrxiv:2605.00003}{rrxiv.com/papers/rrxiv:2605.00003}. \end{center} \begin{abstract} We propose attaching a budget annotation to each registered claim: a structured estimate of the compute, time, and dollar cost an independent replication would incur. Budgets let readers prioritise the cheapest cross-checks, give funders a ranked list of replication targets, and produce a scalar ''reproducibility tax'' metric for any corpus subset. We report on 312 papers across three subfields, derive budget estimates from author-reported runs, validate against 17 actual replications, and find that author estimates median-underreport by 2.3x. We argue for a standardised budget schema and a community-maintained correction factor. \end{abstract} \section{Introduction} We propose attaching a budget annotation to each registered claim: a structured estimate of the compute, time, and dollar cost an independent replication would incur. Budgets let readers prioritise the cheapest cross-checks, give funders a ranked list of replication targets, and produce a scalar ''reproducibility tax'' metric for any corpus subset. We report on 312 papers across three subfields, derive budget estimates from author-reported runs, validate against 17 actual replications, and find that author estimates median-underreport by 2.3x. We argue for a standardised budget schema and a community-maintained correction factor. This document is a structured encoding of the paper in the \texttt{rrxiv} protocol's Canonical Intermediate Representation (CIR). It engages with the topics \texttt{stat.ML} and \texttt{cs.LG}. The encoding registers 6 formal claims (1 replicated, 5 untested). Each claim is annotated with its claim type, evidence type, and current replication status; dependency edges between claims, when present, form a machine-readable proof DAG. \section{Methodology} We follow the \texttt{rrxiv} convention of separating \emph{claims} (the proposition under consideration) from \emph{evidence} (the argument or data supporting it). Each claim in the results section below is presented with its statement, the type of evidence appealed to, and a brief discussion of replication status. Where claims depend on prior results --- internal or external --- the dependency is recorded in the CIR as a \texttt{\textbackslash dependson} edge, so the full inferential structure is machine-traversable. Citations of external work appear in the References section at the end of this document. \section{Results: registered claims} \subsection*{Claim 1} \begin{claim}[Claim 1] \label{claim:c1} Reproducibility costs are heavy-tailed: 80\% of compute spend concentrates in 8\% of replications. \emph{Replication status: untested.} \end{claim} This claim is an empirical observation supported by data. As of the encoding date, it has not yet been independently tested. \subsection*{Claim 2} \begin{claim}[Claim 2] \label{claim:c2} Author-reported run estimates median-underreport actual cost by 2.3x (n=17 audited replications). \emph{Replication status: replicated.} \end{claim} This claim is an empirical observation supported by data. As of the encoding date, it has been independently replicated. It depends on 1 prior claim in the same paper. \subsection*{Claim 3} \begin{claim}[Claim 3] \label{claim:c3} A scalar ''reproducibility tax'' — sum of budgets divided by claim count — distinguishes computationally vs experimentally heavy subfields with AUC=0.91. \emph{Replication status: untested.} \end{claim} This claim is an empirical observation supported by data. As of the encoding date, it has not yet been independently tested. It depends on 1 prior claim in the same paper. \subsection*{Claim 4} \begin{claim}[Claim 4] \label{claim:c4} A 4-field schema (compute\_gpu\_hours, wall\_time\_days, person\_hours, materials\_usd) covers 94\% of self-reported budgets without an `other` overflow. \emph{Replication status: untested.} \end{claim} This claim is a methodological proposal. As of the encoding date, it has not yet been independently tested. \subsection*{Claim 5} \begin{claim}[Claim 5] \label{claim:c5} Treating a missing budget as worst-case (top-decile within subfield) over-penalises ablation studies; using subfield median is fairer. \emph{Replication status: untested.} \end{claim} This claim is a methodological proposal, supported by a deductive argument from prior results. As of the encoding date, it has not yet been independently tested. It depends on 1 prior claim in the same paper. \subsection*{Claim 6} \begin{claim}[Claim 6] \label{claim:c6} Budgets degrade gracefully across protocol versions if a `currency\_year` field is included. \emph{Replication status: untested.} \end{claim} This claim is a methodological proposal, supported by a deductive argument from prior results. As of the encoding date, it has not yet been independently tested. \section{Discussion} The claim graph above is the primary product of this paper. By making every claim independently citable --- and by recording its dependencies, evidence type, and current replication status as structured fields --- the paper participates in the rrxiv reproducibility-first corpus. Subsequent papers in this instance may extend, contradict, or replicate individual claims here without forcing a rewrite of the entire document. See the canonical version online for the live discourse layer. \section{References} \begin{itemize}[leftmargin=*] \item Computational reproducibility at scale \item Reproducibility in machine learning \end{itemize} \end{document}