|
| 1 | +--- |
| 2 | +title: Project Quarantine |
| 3 | +description: Handling project quarantine lifecycle status for suspected malware |
| 4 | +authors: |
| 5 | + - miketheman |
| 6 | +date: 2024-12-30 |
| 7 | +tags: |
| 8 | + - security |
| 9 | +--- |
| 10 | + |
| 11 | +Earlier this year, I wrote briefly about new functionality added to PyPI, the |
| 12 | +[ability to quarantine projects](./2024-08-16-safety-and-security-engineer-year-in-review.md#project-lifecycle-status-quarantine). |
| 13 | +This feature allows PyPI administrators to mark a project as potentially harmful, |
| 14 | +and prevent it from being easily installed by users to prevent further harm. |
| 15 | + |
| 16 | +In this post I'll discuss the implementation, and further improvements to come. |
| 17 | + |
| 18 | +<!-- more --> |
| 19 | + |
| 20 | +## Background |
| 21 | + |
| 22 | +Malware on PyPI is a persistent problem. |
| 23 | + |
| 24 | +PyPI has concepts of Projects, Releases, and Files[^1]. |
| 25 | +These are all discrete data models[^2], |
| 26 | +and behave slightly differently based on their characteristics. |
| 27 | +A Project may have 0 or more Releases, a Release may have 1 or more Files. |
| 28 | + |
| 29 | +Researchers will often report a given Project as malware, |
| 30 | +and will link to a specific location in a File for a given Release, |
| 31 | +per the [PyPI Security process](https://pypi.org/security/). |
| 32 | + |
| 33 | +PyPI will receive malware reports[^3] that are often relevant to an entire Project. |
| 34 | +Simply put: a Project, along with all of its Releases (usually 1) |
| 35 | +and Files (usually 1-2) are all part of a similar campaign, |
| 36 | +and should be removed from PyPI to protect end users. |
| 37 | +This is not universally true, as malware has been added to established, |
| 38 | +mature Projects via a new Release after some sort of account access takeover, |
| 39 | +so there may be a need to consider reporting malware for a given Release/File - |
| 40 | +something not yet fully implemented via Observations |
| 41 | +or the [beta Malware API](./2024-03-06-malware-reporting-evolved.md#via-api). |
| 42 | + |
| 43 | +When reviewing and acting on malware reports, |
| 44 | +PyPI Admins had one main tool at their disposal: |
| 45 | +**complete removal of the Project from the PyPI database**. |
| 46 | +This is often coupled with prohibiting the Project name from being reused. |
| 47 | +PyPI has functionality irrespective of malware to prevent File name reuse. |
| 48 | + |
| 49 | +The impact of these removals can be disruptive, |
| 50 | +and removals are pretty much irrevocable - |
| 51 | +it's the same mechanism PyPI warns project owners about |
| 52 | +when they elect to remove their project from the index[^4]. |
| 53 | + |
| 54 | +Further, the longer a malicious Project remains publicly available, |
| 55 | +the greater the potential for end users to install |
| 56 | +and become victims of said malware. |
| 57 | +With the current full-time security staff for PyPI == 1, |
| 58 | +there is potential for malware to remain installable by users for longer periods of time, |
| 59 | +and asking volunteer PyPI Admins for extra hours of work is not sustainable. |
| 60 | + |
| 61 | +Reducing the time window when a malicious Project/Release/File is available |
| 62 | +for end users to become victims is an improvement, |
| 63 | +and further reduces the incentive for malicious actors |
| 64 | +to use PyPI as their distribution method. |
| 65 | + |
| 66 | +## Implementation |
| 67 | + |
| 68 | +The implementation of Project Quarantine shape as I learned more about the |
| 69 | +possible states a project could be in. |
| 70 | +I jotted down some basic requirements for the feature: |
| 71 | + |
| 72 | +- Project exists on PyPI that has Releases and Files |
| 73 | +- Project is not installable (hidden from simple index) while in quarantine |
| 74 | +- Project is not modifiable by the project owner while in quarantine |
| 75 | +- Project state is visible to Project Owners, security researchers, and PyPI Administrators |
| 76 | +- Project state can be reverted by a PyPI Administrator to restore general visibility |
| 77 | +- Project can be removed/deleted by a PyPI Administrator |
| 78 | + |
| 79 | +With those in mind, I set out to implement the feature. |
| 80 | + |
| 81 | +### Take a page from the book of Yank |
| 82 | + |
| 83 | +Prior to this change, an existing feature was "Yank", |
| 84 | +per [PEP 592](https://peps.python.org/pep-0592/). |
| 85 | + |
| 86 | +A Project with no Releases will be listed in the Simple Repository API[^5], |
| 87 | +but the resulting detail page will not have any links, |
| 88 | +making it effectively uninstallable[^6]. |
| 89 | +One idea was when quarantining a Project, |
| 90 | +we could mark it as having no Releases, |
| 91 | +and thus excluding it from the index. |
| 92 | + |
| 93 | +The difference from "yank" is that a yanked Release is still installable by clients, |
| 94 | +and quarantined items should not be installable - |
| 95 | +so we'd have to explore where to make the change and how that would impact clients. |
| 96 | +Yank is also applied to a Release (and all of its Files), not a Project. |
| 97 | +We could apply a change to every Release for a Project, instead of Project-wide, |
| 98 | +and thus set ourselves up for quarantining individual Releases. |
| 99 | + |
| 100 | +This ends up more complex, trying to account for a rare edge case |
| 101 | +where a mature Project has a new Release that needs to be quarantined, |
| 102 | +and would prevent disruption of existing users of prior Releases. |
| 103 | + |
| 104 | +We accept that this might happen, and track very closely if and when it does, |
| 105 | +and defer implementation until that time. |
| 106 | + |
| 107 | +### Create an Observer-only visibility API |
| 108 | + |
| 109 | +I had previously built a new beta API infrastructure |
| 110 | +to allow Observers to report malicious Projects. |
| 111 | +One idea was to add a new authenticated API endpoint |
| 112 | +to allow querying the current list of quarantined Projects, |
| 113 | +and supply links to their Releases and Files for consumption. |
| 114 | + |
| 115 | +Thus, a researcher could download the artifacts in question, |
| 116 | +but not via `pip install ...` |
| 117 | + |
| 118 | +I ended up not pursuing this approach, |
| 119 | +as the beta authenticated APIs are still being developed, |
| 120 | +and I didn't want to add more functionality before we swing back |
| 121 | +and figure out some critical authentication and authorization issues |
| 122 | +needed for the future of management API endpoints. |
| 123 | + |
| 124 | +### Lifecycle Status |
| 125 | + |
| 126 | +The exploration to remove items from the Simple Repository API paid off, |
| 127 | +and pointed me in the direction that turned into `LifecycleStatus`, |
| 128 | +which is a new status applied to a Project. |
| 129 | + |
| 130 | +A state diagram to illustrate the flow of the Project through the states: |
| 131 | + |
| 132 | +```mermaid |
| 133 | +stateDiagram-v2 |
| 134 | + [*] --> None : default Project state |
| 135 | + None --> QuarantineEnter : Project quarantined, no longer in Simple API |
| 136 | + QuarantineEnter --> QuarantineExit : Admin clears Project, for general visibility |
| 137 | + QuarantineExit --> QuarantineEnter : Project re-quarantined (rare) |
| 138 | +``` |
| 139 | + |
| 140 | +Adding `LifecycleStatus` state to the Project model helps |
| 141 | +other functions in the code make a single-point decisions, |
| 142 | +and allows for a more complex state machine to be implemented in the future. |
| 143 | +Potential states could include "Archived", "Deprecated", and others. |
| 144 | + |
| 145 | +### Admin Interface |
| 146 | + |
| 147 | +Since the point of the implementation is to allow PyPI Admins to manage the state, |
| 148 | +and oftentimes during nights, weekends, and holidays, and from a phone web browser, |
| 149 | +I wanted to make the interface as simple as possible. |
| 150 | + |
| 151 | +When developing the Admin interface, I recorded a video to share with the team, |
| 152 | +so they could see the changes in action and provide feedback. |
| 153 | + |
| 154 | +<figure> |
| 155 | + <a href="https://www.loom.com/share/a472c06ab76542fca1ecaaef2a419f3d"> |
| 156 | + <img alt="GIF of screencapture" src="https://cdn.loom.com/sessions/thumbnails/a472c06ab76542fca1ecaaef2a419f3d-with-play.gif"> |
| 157 | + </a> |
| 158 | + <figcaption> |
| 159 | + <a href="https://www.loom.com/share/a472c06ab76542fca1ecaaef2a419f3d"> |
| 160 | + <p>Admin Interface for Quarantine 👩💻 - Watch Video</p> |
| 161 | + </a> |
| 162 | + <p> |
| 163 | + Note: Some of the UIs in the video may have changed since the recording, |
| 164 | + almost all data is mocked. |
| 165 | + </p> |
| 166 | + </figcaption> |
| 167 | +</figure> |
| 168 | + |
| 169 | +As we use the admin interface more, we'll likely find areas to improve, |
| 170 | +and iterate to make the process more efficient. |
| 171 | + |
| 172 | +## Usage |
| 173 | + |
| 174 | +Since August, the Quarantine feature has been in use, |
| 175 | +with PyPI Admins marking ~140 reported projects as Quarantined. |
| 176 | + |
| 177 | + |
| 178 | + |
| 179 | +Of these, **only a single project** has exited Quarantine, others have been removed. |
| 180 | + |
| 181 | +The one project cleared was a project containing obfuscated code, |
| 182 | +in violation of the PyPI |
| 183 | +[Acceptable Use Policy](https://policies.python.org/pypi.org/Acceptable-Use-Policy/). |
| 184 | +The project owner corrected the violation after being contacted by PyPI Admins. |
| 185 | +I've created some outreach templates to help with this process, |
| 186 | +and have reached out to 20+ project owners to inform them of their violation, |
| 187 | +and to provide guidance on how to correct it. |
| 188 | + |
| 189 | +## Future Improvement - Automation |
| 190 | + |
| 191 | +The next step in the Quarantine feature is to add the ability to |
| 192 | +automatically place a Project in Quarantine when "enough credible reports" are received. |
| 193 | +That's in quotes because we're still working on defining what "enough" and "credible" mean - |
| 194 | +and how to automate the process without causing undue harm to legitimate projects. |
| 195 | + |
| 196 | +To date, we've onboarded a number of security researchers, |
| 197 | +internally known as "Observers" to use a beta API endpoint to submit malware reports. |
| 198 | +We also allow any authenticated PyPI user to submit a malware report |
| 199 | +via a web form on a Project's page (technically a Release... but that's a different story). |
| 200 | +To prevent abuse of the quarantine system, we could place a minimum requirement |
| 201 | +of Observers reporting a given Project, |
| 202 | +as well as only consider a single non-Observer report in the calculation. |
| 203 | + |
| 204 | +For example, these combinations of reports for a Project would result in a quarantined project: |
| 205 | + |
| 206 | +- 2+ Observer reports |
| 207 | +- 1 Observer + 1 non-Observer report |
| 208 | + |
| 209 | +This is only one idea so far - we could explore other combinations as they surface. |
| 210 | + |
| 211 | +The idea behind Auto-Quarantine is to support the concept of |
| 212 | +receiving multiple reports **for the same Project** during nights and weekends, |
| 213 | +and reduce the Project's time alive on PyPI, |
| 214 | +while preserving the ability to revert the state in a non-destructive manner |
| 215 | +in the event of a false-positive. |
| 216 | + |
| 217 | +This will likely also pair with the need to add a "notify admins" feature. |
| 218 | +Probably a webhook to our Slack channel, so we can be notified in real-time |
| 219 | +when a Project is quarantined, and can take action as needed, as well |
| 220 | +as adding more visibility to quarantined projects in the Admin interface. |
| 221 | + |
| 222 | +There's plenty of chewy bits to work on, |
| 223 | +and I'm excited to see how `LifecycleStatus` evolves, |
| 224 | +and share more about it in the future. |
| 225 | + |
| 226 | +<!-- footnotes --> |
| 227 | + |
| 228 | +[^1]: See <https://pypi.org/help/#packages> for more |
| 229 | +[^2]: See https://github.com/pypi/warehouse/blob/main/warehouse/packaging/models.py for more |
| 230 | +[^3]: Referred to internally as `Observations(kind="is_malware")` |
| 231 | +[^4]: |
| 232 | + Yes, it's true, some of the database objects can be reconstructed, |
| 233 | + but it is time-consuming and tricky, used only in severe catastrophe situations. |
| 234 | +[^5]: See [Simple repository API - Python Packaging User Guide](https://packaging.python.org/en/latest/specifications/simple-repository-api/) for more |
| 235 | +[^6]: |
| 236 | + Roughly 3% of Projects in the simple index have 0 releases. |
| 237 | + Excluding these would save ~1 MB of the ~29 MB main index HTML response. |
0 commit comments