Skip to content

Commit 57f25fd

Browse files
authored
Merge pull request #125 from JinZhou5042/jinzhou
profile: jin zhou
2 parents 2177c92 + 66c4d16 commit 57f25fd

File tree

2 files changed

+39
-0
lines changed

2 files changed

+39
-0
lines changed

assets/images/team/Jin_Zhou.png

983 KB
Loading

pages/postdocs/jinzhou.md

+39
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
---
2+
layout: postdoc
3+
pagetype: postdoc
4+
shortname: jinzhou
5+
postdoc-name: Jin Zhou
6+
title: PhD Student
7+
active: True
8+
dates:
9+
start: 2024-08-15
10+
end: 2025-09-15
11+
photo: /assets/images/team/Jin_Zhou.png
12+
institution: University of Notre Dame
13+
14+
project_title: Scalable Data Analysis Applications for High Energy Physics
15+
project_goal: >
16+
- Accelerate the execution of CMS analysis applications.
17+
- Reduce storage consumption to enable more ambitious computations.
18+
- Enhance fault tolerance by breaking long tasks into smaller ones and implementing effective checkpointing strategies.
19+
20+
mentors:
21+
- Douglas Thain (Cooperative Computing Lab, University of Notre Dame)
22+
- Kevin Lannon (Physics department, University of Notre Dame)
23+
24+
current_status: >
25+
<br>
26+
<b>2025 Q1 </b>
27+
<br>
28+
29+
* Progress
30+
*
31+
* Developed the large-input first (LIF) algorithm and the pruning algorithm which effectively reduce the storage consumption by over 90% while running hundreds of thousands of tasks.
32+
* Enhanced the resource allocation and temp file replication on the task scheduler side.
33+
* Attempted to submit a paper to IPDPS 2025 though was rejected.
34+
* Next steps
35+
* Sketch a paper about effectively using limited storage to accomplish enormous computations.
36+
* Develop an algorithm that divides long running tasks in DV5 into smaller ones, which reduces the overhead of rerunning tasks on worker evictions but increases the latency of scheduling a large number of small tasks, so the next plan would be trying to strike a balance between task scheduling and fault tolerance.
37+
* Develop an algorithm that checkpoints remote temp files on time to reduce the risk of losing critical files.
38+
39+
---

0 commit comments

Comments
 (0)