Skip to content

Commit 07fcf1d

Browse files
authored
Merge branch 'main' into dm/published-date
2 parents 1f9e954 + 16cc5db commit 07fcf1d

File tree

21 files changed

+645
-231
lines changed

21 files changed

+645
-231
lines changed

Dockerfile

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# First things first, we build an image which is where we're going to compile
22
# our static assets with. We use this stage in development.
3-
FROM node:23.4.0-bookworm AS static-deps
3+
FROM node:23.5.0-bookworm AS static-deps
44

55
WORKDIR /opt/warehouse/src/
66

Loading
+237
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,237 @@
1+
---
2+
title: Project Quarantine
3+
description: Handling project quarantine lifecycle status for suspected malware
4+
authors:
5+
- miketheman
6+
date: 2024-12-30
7+
tags:
8+
- security
9+
---
10+
11+
Earlier this year, I wrote briefly about new functionality added to PyPI, the
12+
[ability to quarantine projects](./2024-08-16-safety-and-security-engineer-year-in-review.md#project-lifecycle-status-quarantine).
13+
This feature allows PyPI administrators to mark a project as potentially harmful,
14+
and prevent it from being easily installed by users to prevent further harm.
15+
16+
In this post I'll discuss the implementation, and further improvements to come.
17+
18+
<!-- more -->
19+
20+
## Background
21+
22+
Malware on PyPI is a persistent problem.
23+
24+
PyPI has concepts of Projects, Releases, and Files[^1].
25+
These are all discrete data models[^2],
26+
and behave slightly differently based on their characteristics.
27+
A Project may have 0 or more Releases, a Release may have 1 or more Files.
28+
29+
Researchers will often report a given Project as malware,
30+
and will link to a specific location in a File for a given Release,
31+
per the [PyPI Security process](https://pypi.org/security/).
32+
33+
PyPI will receive malware reports[^3] that are often relevant to an entire Project.
34+
Simply put: a Project, along with all of its Releases (usually 1)
35+
and Files (usually 1-2) are all part of a similar campaign,
36+
and should be removed from PyPI to protect end users.
37+
This is not universally true, as malware has been added to established,
38+
mature Projects via a new Release after some sort of account access takeover,
39+
so there may be a need to consider reporting malware for a given Release/File -
40+
something not yet fully implemented via Observations
41+
or the [beta Malware API](./2024-03-06-malware-reporting-evolved.md#via-api).
42+
43+
When reviewing and acting on malware reports,
44+
PyPI Admins had one main tool at their disposal:
45+
**complete removal of the Project from the PyPI database**.
46+
This is often coupled with prohibiting the Project name from being reused.
47+
PyPI has functionality irrespective of malware to prevent File name reuse.
48+
49+
The impact of these removals can be disruptive,
50+
and removals are pretty much irrevocable -
51+
it's the same mechanism PyPI warns project owners about
52+
when they elect to remove their project from the index[^4].
53+
54+
Further, the longer a malicious Project remains publicly available,
55+
the greater the potential for end users to install
56+
and become victims of said malware.
57+
With the current full-time security staff for PyPI == 1,
58+
there is potential for malware to remain installable by users for longer periods of time,
59+
and asking volunteer PyPI Admins for extra hours of work is not sustainable.
60+
61+
Reducing the time window when a malicious Project/Release/File is available
62+
for end users to become victims is an improvement,
63+
and further reduces the incentive for malicious actors
64+
to use PyPI as their distribution method.
65+
66+
## Implementation
67+
68+
The implementation of Project Quarantine shape as I learned more about the
69+
possible states a project could be in.
70+
I jotted down some basic requirements for the feature:
71+
72+
- Project exists on PyPI that has Releases and Files
73+
- Project is not installable (hidden from simple index) while in quarantine
74+
- Project is not modifiable by the project owner while in quarantine
75+
- Project state is visible to Project Owners, security researchers, and PyPI Administrators
76+
- Project state can be reverted by a PyPI Administrator to restore general visibility
77+
- Project can be removed/deleted by a PyPI Administrator
78+
79+
With those in mind, I set out to implement the feature.
80+
81+
### Take a page from the book of Yank
82+
83+
Prior to this change, an existing feature was "Yank",
84+
per [PEP 592](https://peps.python.org/pep-0592/).
85+
86+
A Project with no Releases will be listed in the Simple Repository API[^5],
87+
but the resulting detail page will not have any links,
88+
making it effectively uninstallable[^6].
89+
One idea was when quarantining a Project,
90+
we could mark it as having no Releases,
91+
and thus excluding it from the index.
92+
93+
The difference from "yank" is that a yanked Release is still installable by clients,
94+
and quarantined items should not be installable -
95+
so we'd have to explore where to make the change and how that would impact clients.
96+
Yank is also applied to a Release (and all of its Files), not a Project.
97+
We could apply a change to every Release for a Project, instead of Project-wide,
98+
and thus set ourselves up for quarantining individual Releases.
99+
100+
This ends up more complex, trying to account for a rare edge case
101+
where a mature Project has a new Release that needs to be quarantined,
102+
and would prevent disruption of existing users of prior Releases.
103+
104+
We accept that this might happen, and track very closely if and when it does,
105+
and defer implementation until that time.
106+
107+
### Create an Observer-only visibility API
108+
109+
I had previously built a new beta API infrastructure
110+
to allow Observers to report malicious Projects.
111+
One idea was to add a new authenticated API endpoint
112+
to allow querying the current list of quarantined Projects,
113+
and supply links to their Releases and Files for consumption.
114+
115+
Thus, a researcher could download the artifacts in question,
116+
but not via `pip install ...`
117+
118+
I ended up not pursuing this approach,
119+
as the beta authenticated APIs are still being developed,
120+
and I didn't want to add more functionality before we swing back
121+
and figure out some critical authentication and authorization issues
122+
needed for the future of management API endpoints.
123+
124+
### Lifecycle Status
125+
126+
The exploration to remove items from the Simple Repository API paid off,
127+
and pointed me in the direction that turned into `LifecycleStatus`,
128+
which is a new status applied to a Project.
129+
130+
A state diagram to illustrate the flow of the Project through the states:
131+
132+
```mermaid
133+
stateDiagram-v2
134+
[*] --> None : default Project state
135+
None --> QuarantineEnter : Project quarantined, no longer in Simple API
136+
QuarantineEnter --> QuarantineExit : Admin clears Project, for general visibility
137+
QuarantineExit --> QuarantineEnter : Project re-quarantined (rare)
138+
```
139+
140+
Adding `LifecycleStatus` state to the Project model helps
141+
other functions in the code make a single-point decisions,
142+
and allows for a more complex state machine to be implemented in the future.
143+
Potential states could include "Archived", "Deprecated", and others.
144+
145+
### Admin Interface
146+
147+
Since the point of the implementation is to allow PyPI Admins to manage the state,
148+
and oftentimes during nights, weekends, and holidays, and from a phone web browser,
149+
I wanted to make the interface as simple as possible.
150+
151+
When developing the Admin interface, I recorded a video to share with the team,
152+
so they could see the changes in action and provide feedback.
153+
154+
<figure>
155+
<a href="https://www.loom.com/share/a472c06ab76542fca1ecaaef2a419f3d">
156+
<img alt="GIF of screencapture" src="https://cdn.loom.com/sessions/thumbnails/a472c06ab76542fca1ecaaef2a419f3d-with-play.gif">
157+
</a>
158+
<figcaption>
159+
<a href="https://www.loom.com/share/a472c06ab76542fca1ecaaef2a419f3d">
160+
<p>Admin Interface for Quarantine 👩‍💻 - Watch Video</p>
161+
</a>
162+
<p>
163+
Note: Some of the UIs in the video may have changed since the recording,
164+
almost all data is mocked.
165+
</p>
166+
</figcaption>
167+
</figure>
168+
169+
As we use the admin interface more, we'll likely find areas to improve,
170+
and iterate to make the process more efficient.
171+
172+
## Usage
173+
174+
Since August, the Quarantine feature has been in use,
175+
with PyPI Admins marking ~140 reported projects as Quarantined.
176+
177+
![Quarantine Projects Admin Activity](../assets/2024-12-30-quarantine-verdicts.png)
178+
179+
Of these, **only a single project** has exited Quarantine, others have been removed.
180+
181+
The one project cleared was a project containing obfuscated code,
182+
in violation of the PyPI
183+
[Acceptable Use Policy](https://policies.python.org/pypi.org/Acceptable-Use-Policy/).
184+
The project owner corrected the violation after being contacted by PyPI Admins.
185+
I've created some outreach templates to help with this process,
186+
and have reached out to 20+ project owners to inform them of their violation,
187+
and to provide guidance on how to correct it.
188+
189+
## Future Improvement - Automation
190+
191+
The next step in the Quarantine feature is to add the ability to
192+
automatically place a Project in Quarantine when "enough credible reports" are received.
193+
That's in quotes because we're still working on defining what "enough" and "credible" mean -
194+
and how to automate the process without causing undue harm to legitimate projects.
195+
196+
To date, we've onboarded a number of security researchers,
197+
internally known as "Observers" to use a beta API endpoint to submit malware reports.
198+
We also allow any authenticated PyPI user to submit a malware report
199+
via a web form on a Project's page (technically a Release... but that's a different story).
200+
To prevent abuse of the quarantine system, we could place a minimum requirement
201+
of Observers reporting a given Project,
202+
as well as only consider a single non-Observer report in the calculation.
203+
204+
For example, these combinations of reports for a Project would result in a quarantined project:
205+
206+
- 2+ Observer reports
207+
- 1 Observer + 1 non-Observer report
208+
209+
This is only one idea so far - we could explore other combinations as they surface.
210+
211+
The idea behind Auto-Quarantine is to support the concept of
212+
receiving multiple reports **for the same Project** during nights and weekends,
213+
and reduce the Project's time alive on PyPI,
214+
while preserving the ability to revert the state in a non-destructive manner
215+
in the event of a false-positive.
216+
217+
This will likely also pair with the need to add a "notify admins" feature.
218+
Probably a webhook to our Slack channel, so we can be notified in real-time
219+
when a Project is quarantined, and can take action as needed, as well
220+
as adding more visibility to quarantined projects in the Admin interface.
221+
222+
There's plenty of chewy bits to work on,
223+
and I'm excited to see how `LifecycleStatus` evolves,
224+
and share more about it in the future.
225+
226+
<!-- footnotes -->
227+
228+
[^1]: See <https://pypi.org/help/#packages> for more
229+
[^2]: See https://github.com/pypi/warehouse/blob/main/warehouse/packaging/models.py for more
230+
[^3]: Referred to internally as `Observations(kind="is_malware")`
231+
[^4]:
232+
Yes, it's true, some of the database objects can be reconstructed,
233+
but it is time-consuming and tricky, used only in severe catastrophe situations.
234+
[^5]: See [Simple repository API - Python Packaging User Guide](https://packaging.python.org/en/latest/specifications/simple-repository-api/) for more
235+
[^6]:
236+
Roughly 3% of Projects in the simple index have 0 releases.
237+
Excluding these would save ~1 MB of the ~29 MB main index HTML response.

requirements/deploy.in

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,2 @@
11
gunicorn==23.0.0
2-
ddtrace==2.18.0
2+
ddtrace==2.18.1

0 commit comments

Comments
 (0)