-
Notifications
You must be signed in to change notification settings - Fork 710
Liam's Notes on Cabal Codebase (Draft, will remove name on finalization)
('23, Oct 4)
Hi folks, I hope people don't mind that I'm using repo access to set up a guide to the current codebase and help with navigation.
Right now, my goal is to
- Help provide a space for my own notes as I traverse the codebase.
- Provide a basis for a future codebase guide to help onboard people who wish to assist with the codebase in the future.
- Provide a public page for people on the project to correct me ergonomically when I'm wrong.
This wiki page is intended to be transient either way; if I give up, I'll delete this page, if I finish up, it's just uncollected notes and will be reformatted into a broader guide.
As for the codebase itself, this is the central repository for Cabal, the standard GHC Haskell build tool.
The codebase dates all the way back to 2003 or 2004, I believe, and while it's hallowed, it is also a product that, in its history, managed gain an enormous amount of capability in a short period of time, is seemingly a bit understaffed, and has had to keep up with breaking changes in GHC.
Beyond a question of history, cabal is also a more ambitious project than it may appear. It is intended to be a unified build-tool supporting any Haskell compiler, and this still remains as a goal. Likewise, as it is a fundamental build-tool, it also cannot make use of more recent amenities in the Haskell ecosystem, and must often reduplicate library code if it wishes to use them.
As of this writing, the total size, if you git clone it, is about 132k SLOC, and by the time you read this, it might have grown to 140K SLOC or been reduced to something closer to 110k SLOC as the provisions for older "v1" commands are removed.
Of the non-repo parts of the codebase, CONTRIBUTING.MD is the official guide to the codebase, containing:
- Build Instructions
- Advice on Tests
- Quality Assurance
- Code Style, and library (dependencies shipped with GHC up to 5 years old, which at the time of this writing includes 8.6.2) / language extensions constraints (everything but Template Haskell)
README.MD contains the official notes to the project, including contact information and a link to the official manual
As for the repository itself, the four main parts of the repo are, in order of sub-repository size:
- cabal-install (executable: cabal), the actual CLI tool that Haskell programmers use to build Haskell libraries and applications, whether directly, through Stack, or through Nix.
This is the largest repo by size, clocking in at around 60-70k SLOC, especially since it has backward compatibility for "v1" commands.
- Cabal. The "core" of Cabal. This is what actually executes the build instructions and so on.
Notably, Cabal can actually be used without cabal-install (Cabal was originally used with Setup.hs files), and Cabal is what's getting called by Stack and Nix, not cabal-install.
Moreover, as the oldest and most stable part of Cabal, it is also the part with the highest code quality.
-
Cabal-syntax. This module contains the parser for at least the .cabal manifest (I believe, but cannot confirm quickly, that the .project file parser is in the cabal-install module).
-
cabal-install-solver. This is the dependency solver for cabal.
Cabal uses its own custom Prelude, but since Cabal is intended to support other implementations of Haskell beyond just GHC, it has to use other means to disable the default Prelude than GHC's -XNoImplicitPrelude. The Haskell Report allows the use of import Prelude () for most of this purpose, but still results in imports of instances.
Many of the individual repositories contain their own Prelude, but they seem to commonly point to Distribution.Compat.Prelude in Cabal-syntax at the end.
The first interesting idiosyncrasy you might find on cabal-install is the ultra-sparse main. This is a norm with many Haskell applications, where the actual application is just a very thin wrapper around a library, allowing for easy reuse of existing code.
It runs getArgs, then passes the args to another main in the library section of the repository.
Here, we have a bit more complexity.
The main, to begin with, is not actually a main, but rather an initialization function.
Most of the initialization calls are self-explanatory, but one point to note is the Response File support.
The args given to the main function are split, then processed via expandResponse, which grants support for response files, a way to override command line argument limits.
The processed result is then pushed into the mainWorker function, which, using topHandler as an exception handler, runs commandsRun to, given a commands list in the where clause of mainWorker, to produce data is then pattern-matched into IO actions for actual execution.