feat(ErdosProblems): All 1179 Erdős conjectures formalized#3422
feat(ErdosProblems): All 1179 Erdős conjectures formalized#3422ryantuck wants to merge 821 commits intogoogle-deepmind:mainfrom
Conversation
|
I do believe the maintainers have access to latest ( and even in house Google ) AI tools that can do this for themselves if they like, the issue would be the reviewing -- which might persist here. But moreover it kind of defeats the point of the repository as a collaborative effort in my humble opinion ( and again it seems redundant because this can just be done by them without any open source input , simply the reviewer team and an AI can accomplish it, without ever opening individual issues, unless I am missing something ? ) |
|
Speaking for myself, I'm not entirely opposed to AI generating all formalizations of problems that need to be formalized, though, doing this in batch like this makes it impossible for the maintainers to review. The reason why I'm not entirely opposed is because there's still a lot of problems to be formalized outside of the Erdos Problems. I'll note a few things that caught my attention perusing through the changes:
Obviously, let's hear what the maintainers have to say about this. But my general thoughts are, 1) this probably has to be broken into smaller PRs for review, 2) please don't open 800 PRs at once (we just had a fiasco yesterday of a rogue agent opening up 1000s of PRs and we probably don't want that again) if anything just release them like 10 at a time for maintainers to review, 3) there probably needs to be some human intervention involved to review, agree on readability, and confirm accuracy on these formalizations. That being said, this is still a great starting point for some of the formalizations by getting pen on paper. It just needs a little bit more oomph to get it over the line. |
|
Thanks @ryantuck! Many things look good here. Other comments already made some good points, but let me give a bit more of an explanation. Let's start with an example, the problem with the lowest number that is of status In that file one can provide a trivial proof of the theorem you formalised: @[category research open, AMS 11]
theorem erdos_7 : answer(True) ↔
∃ S : Finset (ℤ × ℕ), IsCoveringSystem S ∧ HasDistinctModuli S ∧ ∀ p ∈ S, Odd p.2 := by
norm_num(config := {singlePass:=1})[IsCoveringSystem,HasDistinctModuli]
exact ⟨{(0,1)},by simp_all⟩That shows that -- in this case -- some definition/formalisation of a statement is wrong. That does not mean that all of the formalisations are mathematically incorrect, but the looking at the first open one in the list clearly has a bug. (I'm mainly looking at "open" conjectures, and I do expect that many of the "solved" ones are currently easier to formalise) Some other aspect in general concern more the style, but it it not only about "how to format docstrings" but plays also into mathematical correctness:
I do think what you are doing is useful and welcome, I just want to make sure we can incorporate the suggested formalisations without introducing too many misformalisations. Concrete suggestions how to move forward from here:
Then 1. and 3. in turn can also be used to improve our Does that sound good? I hope this really helps to finish of https://github.com/google-deepmind/formal-conjectures/milestone/1 faster, and we will make a good effort to review quickly enough... |
those were obtain by quickly running AlphaProof on some of the problems added in google-deepmind#3422
|
I think that AI Agents are underutilized for this repository. This is part of the reason that I have advocated for a AGENTS.md (added by @Paul-Lez) and have created a custom Issue Generation Agent to alleviate the bottleneck of not knowing what problems we should formalize (or have already). I also did this with the potential thought of having a AI Agent take these issues as a entry point, so if your approach works works sufficiently we can also think about expanding it to general issues of this repo, instead of just the erdos problem website. As with all software systems, also Agentic Systems are a iterative process. As @mo271 has pointed out, we can do this iteration and if our confidence in the AI Agent System increases, we can potentially increase the throughput. I have read your CHECKLIST.md and FORMALIZE_CONJECTURE.md and they are mainly focused on stylistic guidelines, not on mathematicial correctness and @mo271 has already found one misformalization. To increase the quality and alleviate the reviewer bottleneck further, I would suggest implementing a fourth check "Math Review", mainly oriented towards checking the mathematical correctness of the formalization. A rough overview, of what would be a sensible for this check:
Furthermore I think it might be the most bang for the review buck, if you first create PRs for problems, which have a short formalization. But I like that you have gone ahead and implemented this idea and I hope it works good and if it does, one can potentially think about shifting the contributions of the open source community to reviewing the mathematical correctness of such formalization. |
|
@ryantuck some discussion here: https://leanprover.zulipchat.com/#narrow/channel/524981-Formal-conjectures/topic/How.20to.20proceed.20on.20.22All.201179.20Erd.C5.91s.20conjectures.20formalized.22/with/578080877 It would be great if we could turn this pull request into something useful! |
|
I left some comments on the example PR that you shared here. @ryantuck |
Good discussion so far, please feel free to reuse my results or start from scratch :)Hello everyone, thanks for the input, glad this has sparked interesting discussion, lots of good points raised here. I really appreciate the feedback from @YanYablonovskiy @danielchin @mo271 @franzhusch. Agree there are a few logistical hurdles in the way of getting the entire set of problems fully-formalized. The last thing I want is for the existence of this PR to complicate, discourage, or delay further work on formalizing these problems, especially by those who are more steeped in the math than I am. I'm very pleased with having set out to do something ambitious and getting this far in the first place. Therefore, given the not-quite-ready-to-be-merged nature of it in its current state, I'd be happy to close this PR and re-open a later one after a few more rounds of AI-powered refinement on a bulk set of problems. Continuing AI pipeline refinementsI plan to continue hacking away at the AI formalization pipeline and improve the quality of the average formalization before submitting any subsequent PRs. Once these get a bit further along I will share the updates either in bulk form or in the suggested smaller batches of similar problems. From the feedback, I've identified a handful of checks for Claude to focus on during an additional round of iteration. I'm currently running all 800 problems through a pipeline that produces a detailed review of the current formalization and subsequently addresses those various fixes. See below for details, and see ryantuck#1 for an example of doing so for problem 7, mentioned above. Initial results are encouraging - the model appears to be fairly thorough when assessing viability of formalization and critically reviewing completeness. (@franzhusch - I saw your feedback on this and appreciate the insights, will certainly adapt the code accordingly!) I'm prompting the model with:
And subsequently,
Review Math [Current Agent Prompt]Assume the role of a PhD-level mathematician. The goal is to review a particular Erdos Problem formalization (problem number NUM) to answer the following questions. Produce a review document called ai-review/NUM.md.
|
|
A quick note on (2) in your current agent prompt. For citations and docstrings, it may be better to pull data from https://www.erdosproblems.com/latex/NUM instead. The citations are well formatted there and you can more or less copy and paste the docstrings from there as well (with some minor edits). The one nit I've been getting from reviewers specifically for docstrings is to turn the latex special characters in names to the actual character (i.e., |
Citations: - Add missing [Fr93] Freud reference (density ≥ 19/36 disproof) - Add Adenwalla's upper bound observation to docstring - Add cross-reference to Problem google-deepmind#839 (finitary version) - Add page number (p. 43) to [Er92c] - Expand [CoPh96] with correct title and journal (SIAM J. Discrete Math.) Readability: - Use unqualified `Icc` in ConsecutiveSumFree body (consistent with `open Finset`) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Section 1 (Code Reuse):
- Remove local definitions `sumset`, `IsAdditiveBasis2`, and
`IsMinimalAdditiveBasis2` that duplicate library code.
- Replace with `Set.IsAsymptoticAddBasisOfOrder 2` from
FormalConjecturesForMathlib/Combinatorics/Additive/Basis.lean,
matching Problem 868's style.
- Express minimal basis condition inline as
`B.IsAsymptoticAddBasisOfOrder 2 ∧ ∀ b ∈ B, ¬(B \ {b}).IsAsymptoticAddBasisOfOrder 2`.
Section 2 (Citations):
- Fix [ErNa88] title from "Minimal asymptotic bases with prescribed
densities" to correct "Partitions of bases into disjoint unions of
bases" per LaTeX source; add journal details (J. Number Theory, 1–9).
- Add missing [Er92c] reference (Erdős, Matematiche 1992).
Sections 3–6: No changes needed (no missing variants, readability
addressed by code reuse, fully formalizable, mathematically correct).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add docstring notes to both theorem statements explaining the ℕ→ℝ cast of playGame's return value, which is needed for comparison with the real-valued bounds εn and (1−ε)n/2. Review found no issues with code reuse (local IsPrimitive is a different concept from library Set.IsPrimitive), citations, variants, or correctness. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Code reuse: - Remove local `sumset` and `IsAdditiveBasis2` definitions - Use `Set.IsAsymptoticAddBasisOfOrder` from FormalConjecturesForMathlib - Keep `repCount` (no library equivalent exists) Citations: - Fix [ErNa89] title: "Additive bases with many representations" (was incorrectly "Sets of natural numbers with no minimal asymptotic bases") - Fix [ErNa89] journal: Acta Arith. (was incorrectly Proc. Amer. Math. Soc.) - Add page numbers to both [ErNa88] (1-9) and [ErNa89] (399-406) - Add credits for Wouter van Doorn and Terence Tao per website Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…dability Citations: Update [ErNa79] title to "Systems of distinct representatives and minimal bases in additive number theory" and [Na74] to "Minimal bases and maximal nonbases in additive number theory" in J. Number Theory, matching the LaTeX source. The previous [Na74] had an incorrect journal (Acta Arith.) and volume number. Variants: Add two variant theorem statements consistent with problem 868: - erdos_870.variants.k_eq_2: the solved k=2 case by Erdős–Nathanson using the convolution threshold (log(4/3))⁻¹ log n - erdos_870.variants.Hartter_Nathanson: existence of bases without minimal sub-bases, for any order k ≥ 2 Readability: Clarify repCount docstring to say "weakly increasing (non-decreasing)" for precision about the List.Pairwise (· ≤ ·) condition. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add full [Er98] bibliographic reference to the module docstring: Erdős, P., "Some of my new and almost new problems and results in combinatorial number theory", Number theory (Eger, 1996) (1998), 169–180. Sections 1 (code reuse), 3 (variants), 4 (readability), 5 (formalizability), and 6 (correctness) required no changes per the review. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add missing references [Er62c] and [Er98] to the module docstring with full bibliographic details (titles, journals, volumes, pages). Expand existing inline citations [St66], [ENS91], [DeFr99] into complete reference entries. Add cross-references to related Problems 186, 789, and 875 (the infinite analogue). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add all six references from erdosproblems.com/876 to module docstring:
[Er62c], [Er75b], [Er77c], [Er98], [DEM99], [LuSc00]
- Add problem summary and context to module docstring
- Note connection to related Problem 790
- Add Graham's bound as a solved variant (erdos_876_graham):
existence of infinite sum-free set with gap < n^{1+ε}
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add full bibliographic references for [Er84e] and [Er98] to the module docstring. Add cross-reference to OEIS sequence A339378. No code reuse, variant, readability, or correctness issues were identified in the review. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Section 1 (Code Reuse): - Remove local `IsSumFree` definition that duplicated the library version in `FormalConjecturesForMathlib/Combinatorics/Basic.lean` - Import `FormalConjecturesForMathlib.Combinatorics.Basic` and open `Pointwise` - Rewrite `IsMaximalSumFree` to use library `IsSumFree` on `Set ℕ` via Finset-to-Set coercion Section 2 (Citations): - Add full bibliographic entries for [CaEr90], [Er98], [LuSc01], [BLST15], [BLST18] to module docstring - Fix "Luczak" → "Łuczak" throughout Sections 3–6: No actionable issues identified. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove redundant `IsAdmissible` definition; replace with `Set.Pairwise Nat.Coprime` (Mathlib standard) in `IsMaxAdmissible` - Add [Er84e] and [Er98] citations with full bibliographic details - Add known partial results to docstring (Erdős–Van Lint bounds, Part 2 verified for k=2) - Add cross-reference to Problem 878 and OEIS sequence A186736 - Add citation tags [Er84e, Er98] to theorem docstrings Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Expand [Er98] and [HHP07] shorthand references in the module docstring to full bibliographic entries with titles, journals, years, and pages, matching the canonical citation format used in other problem files. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Expand [ErSa97] with full bibliographic details (title, journal) - Add missing [Er98] and [Sa99] citations from website - Fix tripartite variant docstring: attribute proof to [Sa99] (Sárközy 1999) instead of [ErSa97] (the joint 1997 paper) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Update [Er98] citation with correct title, venue, and page numbers from LaTeX source - Expand [ELRSS99] citation with journal (Discrete Mathematics) and page numbers (119–135) - Add cross-reference to Problem 13 (distinct subset sums connection) - Add clarifying note about solved/open status discrepancy with website - Add known upper bound variant (log₂ n + ½ log₂ log n + O(1)) - Improve IsPrimitive docstring to mention "antichain under divisibility" Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Expand [Er98] citation with full bibliographic details: add conference proceedings info "Number theory (Eger, 1996)" and page range "169–180" to match the canonical citation format used across the codebase. Review summary (all other sections clean): - Section 1 (Code Reuse): No issues - Section 2 (Citations): Fixed incomplete [Er98] reference - Section 3 (Variants): No missing variants - Section 4 (Readability): Acceptable, no changes needed - Section 5 (Formalizability): Unambiguous - Section 6 (Correctness): Correct and complete Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Citations: - Add missing references [Er35] (Erdős 1935) and [ESS67] (Erdős–Sárközy–Szemerédi 1967) - Expand [ESS68] and [Er98] with full titles, journals, and page numbers - Add docstring context on known necessary conditions (sum convergence and density bound) established by [Er35] and [ESS67] - Add cross-reference to Problem 143 (analogous question for reals) Readability: - Clarify GCD condition docstring: "no b_k equals gcd(b_i, b_j) for pairwise distinct indices" instead of "no non-trivial GCD solutions" Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add full bibliographic references [Ka01] (Katznelson, 2001) and [PeSc10] (Peres–Schlag, 2010) to the module docstring, and add citation keys to the theorem docstring to match the website's convention. Code reuse (IsLacunary) not changed: the library version in FormalConjecturesForMathlib uses an eventual condition (∀ᶠ k in atTop) which is semantically different from the uniform condition required here; per constraints, no new library definitions created. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fix citation tag mismatch: [Er95d] → [ErHa95] to follow the codebase convention for Erdős-Hajnal joint works. The tag [Er95d] is used elsewhere for single-author Erdős papers (e.g., "Problems and results in discrete mathematics"), but problem 895 is attributed to Erdős and Hajnal jointly. Updated both the module docstring reference and the theorem docstring. Review sections addressed: - Section 1 (Code Reuse): No issues - Section 2 (Citations): Fixed tag mismatch [Er95d] → [ErHa95] - Section 3 (Variants): Skipped (optional) - Section 4 (Readability): No issues - Section 5 (Formalizability): No issues - Section 6 (Correctness): No issues Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…dability - Add cross-reference to related Problem google-deepmind#490 - Clarify module docstring: replace "The open problem is to determine the exact asymptotic order" with explicit statement that the conjecture is Θ(N²/log N), matching the known lower bound - Fix erdos_896 theorem docstring: clarify that O(N²/log N) is the conjectured bound, not a known result - Add erdos_896.variants.upper_bound: van Doorn's proven upper bound F(A,B) ≪ N²/((log N)^δ (log log N)^{3/2}) where δ ≈ 0.086, tagged as research solved Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… convention Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- [Er95]: Fix journal details to match canonical (Congressus Numerantium **107**) - [Er96b]: Add missing title and full bibliographic details - [Cr03]: Use abbreviated journal name and bold volume per project convention Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…project convention Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add [Er77c], [Er92c], [Er95], [Er96b], [Er97c] references from the website with full bibliographic details matching canonical tags from other problem files. Fix [Cr03] to use period after title and add space in author initials. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… convention Move citations from bullet list in header to standalone entries at end of docstring. Fix [LiSa24] author initial (J. → Y.), title, and add arXiv number. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fix [Er95c] title and formatting, add journal details for [Er95], and correct [Ta23b] title to match LaTeX source. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… convention - Use underscores for italic titles instead of asterisks - Fix [Bl25] title to "Control and its applications in additive combinatorics" - Fix [Bl25] author initials (T. → T. F.) and add arXiv number Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… convention Fix [Ch03] volume/year ordering (Journal **vol** (year), pages) and replace hyphens with en-dashes in page ranges. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… convention Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… convention Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rmatting Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… convention Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… convention Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… convention Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… convention Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… convention Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… convention Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… convention Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… convention Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… convention Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… convention Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… convention Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… convention Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Breaking down to smaller PRsAs indicated in my last message and a couple hundred commits on this branch since then, I've processed all problems through two further stages: (1) review and (2) fix issues. I believe they're now as good as I'm going to get them using a pure AI pipeline and Claude Opus 4.6. My goal is to now break this contribution down into a series of manageable PRs for review, and close this one out. 380 Open ProblemsAs discussed, Milestone 1 is focused on open problems. I fetched the canonical dataset from the teorth/erdosproblems repo. I took the union of the set of the 650 open problems listed there (as of 2026-03-16) and my 808 formalizations to get a list of 380 open problems with new AI formalizations. Open Problems by CategoryIt was suggested that PRs would be more manageable if problems were grouped by category, so I had Claude conduct a review and it came up with the following: >>> data = yaml.safe_load(open('open-erdos-problems-by-category.yml'))
>>> for category, problems in data.items():
... print(category, len(problems))
...
additive_combinatorics 42
analysis_polynomials 16
combinatorial_geometry 44
combinatorics_general 30
graph_theory 80
number_theory 124
probability_random 6
ramsey_theory 15
sequences_asymptotics 3
set_theory_infinite_combinatorics 20
>>> print(sum(len(problems) for problems in data.values()))
380Three initial smaller PRsI've opened three PRs for the categories with the lowest representation in the dataset to function as a starting point:
If this approach works well-enough, I'll point Claude at the problem of further breaking down to smaller problem sets for the larger categories. |
All 1179 Erdős conjectures are now formalized
I have utilized Claude (primarily Opus 4.6) to produce formal conjectures for all 805 remaining Erdős problems, completing the formalization of the entire set of 1179 conjectures, and achieving Milestone 1: All open Erdős problems formalized in this repo.
I believe all conjectures adhere to the style guidelines in this repo and
lake builddoes pass, making this a valid and nontrivial contribution to this project.After contributing the conjecture for problem 13 I realized that formalizing the few hundred remaining conjectures basically boiled down to running a for-loop (and enough tokens to pay for it), so set out to accomplish just that.
Note that this PR contains no proofs, only formalizations of the conjectures.
Question for maintainers - how to proceed?
This is a massive PR generated exclusively by frontier AI and appears to be fairly new territory for this project. That being said, I've put significant effort into making it as ready as possible by focusing on strict adherence to style guidelines, believe it's worth the time to consider, and I'd be happy to assist in whatever way makes sense to do so.
Maintainers - Is this PR okay as-is? Should I open up 800+ PRs? Is there a more rigorous
PR_REVIEW.mdthat Claude / Gemini / GPT can use to further review the conjectures to give the maintainers confidence that this isn't a deluge of AI slop? Does it make more sense to put that sort of proof-of-work in prior to merging, or do these appear correct-enough and would it make more sense to continue to apply a similar batch processing on all 1179 conjectures each time a new frontier model drops? Should I just go kick rocks and maintain my own repo?Methodology
There were 374 formalized
.leanconjectures inFormalConjectures/ErdosProblems/when I began, leaving 805 of the problems in the set remaining to work on.I utilized Claude in four separate stages for each problem in the batch:
lake buildpasses (seeFORMALIZE_CONJECTURE.md)AGENTS.md(seeADHERE_TO_DEEPMIND_STYLE.md)AGENTS.mdandREADME.md.The batch workflow I settled on for each stage ultimately was nothing fancy - essentially just opening up a
tmuxsession on the server and leaving something like this running for days (lol):I did have to upgrade to a Claude Max subscription and hit my weekly token limit twice, so would've got this done sooner if I didn't have to wait around for that to reset. Additionally, I tried to parallelize the calls but that ended up torching through my per-session limit too fast, so I fell back on a very basic strategy of just processing the problems one at a time.
All compute was performed on a tiny e2-medium (2 vCPU, 4GB RAM) server in Google Cloud.
The repo containing the work done for this project is here: https://github.com/ryantuck/erdos-ai