blocks migrated empty (no files) in phys03

**Impact of the bug**
CRAB

**Describe the bug**
a block gets migrated from global to phys03 w/o the files.

**How to reproduce it**
rare problem. can not be reproduced.

**Expected behavior**
The block in phys03 should be an exact replica of the one in global

**Additional context and error message**

I have seen this a few times, but did not report immediately. Now I have taken steps to properly document.
* block name `/JetMET0/Run2025G-PromptReco-v1/AOD#2149f597-8e88-4195-b9b9-5633d8ba6258`
* in `phys03` it is present w/o any file
```
belforte@lxplus802/TC3> dasgoclient --query 'block block=/JetMET0/Run2025G-PromptReco-v1/AOD#2149f597-8e88-4195-b9b9-5633d8ba6258 instance=prod/phys03'   
/JetMET0/Run2025G-PromptReco-v1/AOD#2149f597-8e88-4195-b9b9-5633d8ba6258
belforte@lxplus802/TC3> dasgoclient --query 'file block=/JetMET0/Run2025G-PromptReco-v1/AOD#2149f597-8e88-4195-b9b9-5633d8ba6258 instance=prod/phys03'
belforte@lxplus802/TC3> 
```
* while in `global` there are 184 files
```
belforte@lxplus802/TC3> dasgoclient --query 'block block=/JetMET0/Run2025G-PromptReco-v1/AOD#2149f597-8e88-4195-b9b9-5633d8ba6258 instance=prod/global'
/JetMET0/Run2025G-PromptReco-v1/AOD#2149f597-8e88-4195-b9b9-5633d8ba6258
belforte@lxplus802/TC3> dasgoclient --query 'file block=/JetMET0/Run2025G-PromptReco-v1/AOD#2149f597-8e88-4195-b9b9-5633d8ba6258 instance=prod/global'|wc -l
184
belforte@lxplus802/TC3> 
```

When CRAB was trying to migrate this block, migration failed with (from migration server logs):
```
[2026-04-22 08:09:51.344719155 +0000 UTC m=+2904.031899684] migrate.go:1119: insert block dump record failed with DBSError Code:128 Description:Not defined Function:dbs.bulkblocks.checkBlockExist Message:Block /JetMET0/Run2025G-PromptReco-v1/AOD#2149f597-8e88-4195-b9b9-5633d8ba6258 already exists Error: nil
[2026-04-22 08:09:51.393973243 +0000 UTC m=+2904.081153787] migrate.go:1378: update migration request 6499526 to status 3
```

Logs from April 22 are still on cephs, so here's more info.
* migration server pods: example of one attempt to migrate
(CRAB Publisher makes a list of parent files which are not present in phys03, than asks DBS to migrate the corresponding blocks, it the block is already at destination it is supposed to get migration  status 4 and will be happy, instead here it gets status 9, so ti deletes that migraion-id and tries again ... until CRAB operator detects this and blacklist the task whose publication attempts hit this problem)
```
belforte@vocms0755/dbs-logs> grep "6499526 to"  dbs2go-phys03-migratio*.log-20260422
dbs2go-phys03-migration-644c89d6cb-rvbn9.log-20260422:[2026-04-22 08:10:23.257140175 +0000 UTC m=+5556.053575592] migrate.go:1378: update migration request 6499526 to status 1
dbs2go-phys03-migration-644c89d6cb-rvbn9.log-20260422:[2026-04-22 08:10:30.368651523 +0000 UTC m=+5563.165086940] migrate.go:1378: update migration request 6499526 to status 3
dbs2go-phys03-migration-644c89d6cb-rvbn9.log-20260422:[2026-04-22 08:11:23.654446037 +0000 UTC m=+5616.450881454] migrate.go:1378: update migration request 6499526 to status 1
dbs2go-phys03-migration-644c89d6cb-rvbn9.log-20260422:[2026-04-22 08:11:24.921014543 +0000 UTC m=+5617.717449960] migrate.go:1378: update migration request 6499526 to status 3
dbs2go-phys03-migration-694ff69b48-tmsd4.log-20260422:[2026-04-22 08:09:48.920184766 +0000 UTC m=+2901.607365308] migrate.go:1378: update migration request 6499526 to status 1
dbs2go-phys03-migration-694ff69b48-tmsd4.log-20260422:[2026-04-22 08:09:51.393973243 +0000 UTC m=+2904.081153787] migrate.go:1378: update migration request 6499526 to status 3
dbs2go-phys03-migration-694ff69b48-tmsd4.log-20260422:[2026-04-22 08:10:49.886143633 +0000 UTC m=+2962.573324164] migrate.go:1378: update migration request 6499526 to status 1
dbs2go-phys03-migration-694ff69b48-tmsd4.log-20260422:[2026-04-22 08:10:51.602646326 +0000 UTC m=+2964.289826859] migrate.go:1378: update migration request 6499526 to status 3
dbs2go-phys03-migration-694ff69b48-tmsd4.log-20260422:[2026-04-22 08:11:49.898859751 +0000 UTC m=+3022.586040282] migrate.go:1378: update migration request 6499526 to status 9
belforte@vocms0755/dbs-logs> 
```

looking for this block name in all migration log files for April 22 finds many repetition of that. 
```
belforte@vocms0755/dbs-logs> grep "/JetMET0/Run2025G-PromptReco-v1/AOD#2149f597-8e88-4195-b9b9-5633d8ba6258" dbs2go-phys03-mig*.log-20260422 > /tmp/ml
belforte@vocms0755/dbs-logs> grep ID /tmp/ml|cut -d{ -f2|cut -d' ' -f 1|sort|uniq 
MIGRATION_REQUEST_ID:6496490
MIGRATION_REQUEST_ID:6499403
MIGRATION_REQUEST_ID:6499461
MIGRATION_REQUEST_ID:6499508
MIGRATION_REQUEST_ID:6499526
MIGRATION_REQUEST_ID:6499616
MIGRATION_REQUEST_ID:6499622
MIGRATION_REQUEST_ID:6499704
MIGRATION_REQUEST_ID:6499716
MIGRATION_REQUEST_ID:6499727
MIGRATION_REQUEST_ID:6499947
MIGRATION_REQUEST_ID:6499987
MIGRATION_REQUEST_ID:6500006
MIGRATION_REQUEST_ID:6500022
MIGRATION_REQUEST_ID:6500028
MIGRATION_REQUEST_ID:6500037
MIGRATION_REQUEST_ID:6500053
MIGRATION_REQUEST_ID:6500059
MIGRATION_REQUEST_ID:6500082
MIGRATION_REQUEST_ID:6500090
MIGRATION_REQUEST_ID:6500114
MIGRATION_REQUEST_ID:6500129
MIGRATION_REQUEST_ID:6500147
MIGRATION_REQUEST_ID:6500163
MIGRATION_REQUEST_ID:6500177
MIGRATION_REQUEST_ID:6500220
MIGRATION_REQUEST_ID:6500257
MIGRATION_REQUEST_ID:6500280
MIGRATION_REQUEST_ID:6500308
belforte@vocms0755/dbs-logs> 
```
The DBS migration pods were continuously restarting that day, due to a memory issue with too many lumis in a different migration. I do not know if that could be the origin of the problem. 
I have been unable to find when the block was inserted in `phys03`.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

blocks migrated empty (no files) in phys03 #148

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

blocks migrated empty (no files) in phys03 #148

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions