Skip to content

Commit ed75808

Browse files
authored
Adding Data Challenge story (#2568)
1 parent dc3aa8f commit ed75808

5 files changed

+118
-0
lines changed
Lines changed: 118 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,118 @@
1+
---
2+
layout: irispost
3+
title: "Accelerating Discovery at the Revamped Large Hadron Collider with Unprecedentedly Fast Data Flow"
4+
author: Adam Hadhazy
5+
image: /assets/images/posts/2025-06-12-200Gbps-Data-Challenge-image3.jpg
6+
image-whole: true
7+
image-caption: >
8+
"A live exercise at the 2024 IRIS-HEP retreat in Seattle stress-tested how well the systems work when many people perform data analysis in parallel. The picture shows the audience participating in this test. Credit: Matthew Feickert (University of Wisconsin - Madison)"
9+
summary: >
10+
"A look at the 200 Gbps data challenge conducted during the 2024 IRIS-HEP retreat, exploring advances in distributed data access and analysis for high-energy physics."
11+
figure-class: center
12+
---
13+
14+
The biggest machine ever built, the Large Hadron Collider (LHC), is getting supercharged. In a few years’ time, particles will start smashing particles together inside the experiment’s 27-kilometer-long ring as part of the High-Luminosity LHC (HL-LHC), a planned upgrade that will deliver 10 times more data than the original collider. This anticipated data avalanche will pose major analytical bottlenecks for researchers as they sift for new insights and hopefully breakthroughs in physics.
15+
16+
To facilitate these efforts, an international team recently undertook an exercise geared toward dramatically boosting data flow rates for HL-LHC analyses. The exercise, called the [200 Gbps Challenge](https://iris-hep.org/projects/200gbps.html) — referring to gigabits per second (Gbps), or billions of bits of data transmitted across a network—successfully demonstrated ways to achieve this goal, which represents a speedup on the order of 10—100 times over what is presently available. “What we're accomplishing with this Challenge is simulating what physics analysis will be like in the HL-LHC era,” says [Alex Held](https://dsi.wisc.edu/staff/held-alex/), a research scientist at the University of Wisconsin—Madison’s Data Science Institute and a co-coordinator of the 200 Gbps Challenge. “We're laying the groundwork and infrastructure to satisfy the analysis needs of physicists in the future.”
17+
18+
In practice, data rates of this 200 Gbps caliber will mean that many physicists can conduct their
19+
analyses in near real-time, versus having to wait sometimes for days for data processing to occur
20+
as is typical currently—a discovery-slowing process that would become even more onerous
21+
given the HL-LHC’s gargantuan data output.
22+
23+
“A phrase you'll often hear bandied about is “time to science’ or “time to insight,’ where we want to reduce that time it takes for a person to analyze data,” says Carl Lundstedt, the Grid System Administrator at the University of Nebraska’s Holland Computing Center, a participating facility in the 200 Gbps Challenge. “The analysis process is very iterative, and if those iteration steps take a long time, it can really obscure the science that the physicist is looking for.”
24+
25+
“We don't want to go into a situation where we have data coming off the HL-LHC and the analysis pipeline isn't ready for the physicist to do science,” Lundstedt added.
26+
27+
The 200 Gbps Challenge was fostered by the (Institute for Research and Innovation in Software for High Energy Physics [IRIS-HEP](https://iris-hep.org/), an institute headquartered in the [Princeton Institute for Computational Science and Engineering (PICSciE)](https://researchcomputing.princeton.edu/about/about-picscie) at Princeton University.
28+
29+
For the exercise, researchers sought to analyze 25% of a 180-terabyte dataset in 30 minutes. The HL-LHC is expected to routinely generate such massive datasets describing the particle shrapnel
30+
produced by collisions of protons or heavy ions, such as lead, detected within the experiment’s main instruments.
31+
32+
Two of those instruments are [CMS](https://www.home.cern/science/experiments/cms) and [ATLAS](https://home.cern/science/experiments/atlas), each of which is roughly the size of a multi-story building. Nebraska, mentionedprior, hosts a research group that works extensively with CMS. A second research group that participated in the 200 Gbps Challenge is at the University of Chicago, with ATLAS data analysis being their specialty. Each facility was selected to participate in the Challenge because they work with different LHC detectors while also conveniently possessing varied data analysis infrastructures and capabilities, allowing the team to explore diverse strategies for obtaining 200Gbps.
33+
34+
The primary components in play for data analysis are central processing units (CPUs), where
35+
number-crunching data decompression takes place, and the networks over which data flows to
36+
end users. “Broadly speaking, at University of Chicago, we have lots of CPUs, but we’re limited by the
37+
network,” says Held. “In Nebraska, we have fewer CPUs and instead the network is amazing. So
38+
we had to look at different approaches to hit our very aggressive 200 gigabit target.”
39+
40+
An initial problem affecting analyses at both sites: the globally distributed nature of LHC and
41+
eventual HL-LHC data. Pulling that data together quickly is often not possible. “Reading data
42+
from Australia to Chicago might not always be very timely because you're inherently limited by
43+
how fast of a connection you have,” says Held. “Plus, if you're reading data from a hundred
44+
different sites all across the globe, one of them might be down for maintenance at any point of
45+
the day, and that holds things up.”
46+
47+
To combat this issue, the Challenge team established a large system of data caches, so relevant
48+
data could be read once and then stored near where it would be needed for analysis. “You have
49+
immediate access to data and it's very stable and very fast,” says Lincoln Bryant, a Linux System
50+
Administrator at the University of Chicago involved in the Challenge.
51+
52+
53+
{% include figure.html
54+
file="/assets/images/posts/2025-06-12-200Gbps-Data-Challenge-image4.png"
55+
alt="A depiction of key ingredients in the data flow. Data distributed at WLCG sites flows to Analysis facility with caches and worker nodes"
56+
caption="A depiction of key ingredients in the data flow: data starts out distributed across the Worldwide LHC Computing Grid (WLCG) and is ingested into a facility through a set of XCache instances acting as caches. Credit: <https://arxiv.org/abs/2503.15464>"
57+
img-style="width: 1000px"
58+
figure-style="width: min-content"
59+
class="left"
60+
%}
61+
62+
Optimizing speed at the University of Chicago site required examining the network diagram and
63+
moving server racks and other hardware around to ensure there were no bottlenecking slow links.
64+
“For instance, you might have network paths that are only connected at 40 gigabits, and we have
65+
to deepen these or reorganize how the cluster is set up to enable 200 gigabit workflows,” says
66+
Bryant.
67+
68+
On the software side, the team sought to maximize benefits from parallel computing, where
69+
many portions of a data set needing analysis are handled simultaneously and integrated at the
70+
end. To effectively fan out and manage these portions, especially those that are interdependent,
71+
the team’s software engineers turned to dynamic task scheduling programs such as Dask, an open
72+
source project, and TaskVine, developed at Notre Dame University.
73+
74+
With the attentive deployment of these scalable computing libraries in sync with the rearranged
75+
hardware, the Challenge team demonstrated sustained data flow rates at 200 Gbps, illuminating a
76+
path forward for next-generation HL-LHC data analysis.
77+
78+
{% include figure.html
79+
file="/assets/images/posts/2025-06-12-200Gbps-Data-Challenge-image1.png"
80+
alt="Graph showing successful delivery and processing of data at sustained 200 Gbps throughput"
81+
caption="Throughput rate of the XCache servers in Flatiron at Nebraska, each color representing one of the eight XCache servers clearly indicating a sustained rate in excess of 200 Gbps. Credit: Oksana Shadura (University of Nebraska - Lincoln)"
82+
img-style="width: 1000px"
83+
figure-style="width: min-content"
84+
class="left"
85+
%}
86+
87+
“There’s an overall goal in our field of high-energy physics of moving as far as possible in the
88+
direction where the bottleneck is not a researcher waiting for computing to happen, but instead
89+
where the researcher is carefully looking at results, making sense of them, and figuring out next
90+
steps,” says Held. “We’re helping to meet that goal and setting the stage for a highly productive
91+
HL-LHC run.”
92+
93+
Looking ahead, the team will continue working on increasing the reliability of their 200 Gbps
94+
approach, which after all was hammered out only over the span of an intensive several-week
95+
period last Spring. Due to the disbursing of many small tasks during the massive parallel
96+
computing process, errors that crop up in one task can lead to downstream problems. This issue
97+
is also commonly encountered and dealt with in conventional analyses, but Held and colleagues
98+
hope they can ultimately imbue their setup with excellent stability. “Overall, we want to
99+
empower scientists to get the analyses done in a radical new way that’s very quick and
100+
worry-free so they can instead just focus on the physics,” says Held.
101+
102+
{% include figure.html
103+
file="/assets/images/posts/2025-06-12-200Gbps-Data-Challenge-image2.png"
104+
alt="Alex Held, a research scientist at the University of Wisconsin—Madison’s Data Science Institute presents 'The 200 Gbps Challenge: Imagining HL-LHC analysis facilities' at CHEP 2024"
105+
caption="Alex Held, a research scientist at the University of Wisconsin—Madison’s Data Science Institute presents 'The 200 Gbps Challenge: Imagining HL-LHC analysis facilities' at CHEP 2024. Credit: Peter Elmer (Princeton University)"
106+
img-style="width: 500px"
107+
figure-style="width: min-content"
108+
%}
109+
110+
Based on all they learned during the Challenge, the research team is already setting themselves a
111+
new benchmark of 400 Gbps—doubling the already unprecedented data rate levels obtained.
112+
Remarkably, achieving these even more blazing speeds is not expected to require significant new
113+
hardware or novel software tactics; team members feel that they can wring still greater flow rates
114+
out of largely what’s at hand with innovative thinking and collective diligence.
115+
116+
“The history of high energy physics has always been to push the large-data frontier,” says
117+
Lundstedt. “We’re excited to have this opportunity to contribute to pushing that frontier into the
118+
200-gigabit, even 400-gigabit space.”
670 KB
Loading
4.79 MB
Loading
405 KB
Loading
392 KB
Loading

0 commit comments

Comments
 (0)