Introduce inconsistent fit closure test data #2180

comane · 2024-10-17T14:34:46Z

Reimplements #1682

This pull request introduces a new class, InconsistentCommonData, to handle inconsistencies within closure tests and includes several related changes across multiple files. The most important changes include the addition of the new class, updates to the configuration parsing, and new filtering methods for handling inconsistent closure data.

New Class and Methods:

validphys2/src/validphys/closuretest/inconsistent_closuretest/inconsistent_ct.py: Added InconsistentCommonData class with methods to introduce inconsistencies in closure tests, including systematic_errors property, select_systype_table_indices, rescale_systematics, process_commondata, and export_uncertainties.

Configuration Updates:

validphys2/src/validphys/config.py: Added parse_inconsistent_data_settings method to parse inconsistent data settings from the YAML file.
validphys2/src/validphys/config.py: Updated produce_filter_data to include inconsistent_fakedata parameter for filtering inconsistent closure data. [1] [2]

Filtering Methods:

validphys2/src/validphys/filters.py: Added filter_inconsistent_closure_data_by_experiment and _filter_inconsistent_closure_data methods to handle filtering of inconsistent closure data. [1] [2]

Testing:

validphys2/src/validphys/tests/test_inconsistent_ct.py: Added unit tests for InconsistentCommonData class methods, including tests for the getter and setter of systematic_errors, select_systype_table_indices, rescale_systematics, process_commondata, and export_uncertainties.

Benchmarking of Code

Test against old code to see whether we reproduce it, eg the same Ratio bias variance results.

Test 0.0

Test that an inconsistent ct fit (using the inconsistent closure test filter) with lambda = 1.0 gives the same results as a consistent closure test:

https://vp.nnpdf.science/VIdN_z0ETx6Ph0szzKLY0g==

Further tests..

This is the type of agreement found between old and new branch for a DIS only inconsistent closure test with lambda = 0
https://vp.nnpdf.science/sH-C_csbTGCPCdEGFXj_1w==

I also used parallel_models: true for the new fit. Is this compatible with the type of agreement that is found between standard runned fits and parallel_models run fits? (@scarlehoff)

Example of inconsistent closure test runcard:

inconsistent_data_settings:

  treatment_names: [MULT]
  names_uncertainties: [CORR, SPECIAL]

  inconsistent_datasets:
    - HERA_NC_318GEV_EM-SIGMARED
    - HERA_NC_251GEV_EP-SIGMARED
    - HERA_NC_300GEV_EP-SIGMARED
    - HERA_NC_318GEV_EP-SIGMARED

  sys_rescaling_factor: 0.0


closuretest:
  errorsize: 1.0
  fakedata: true
  inconsistent_fakedata: true
  fakenoise: true
  fakepdf: 210223-mw-000_fakepdf
  filterseed: 1
  printpdf4gen: false
  rancutmethod: 0
  rancutprob: 1.0
  rancuttrnval: false

datacuts:
  q2min: 3.49
  t0pdfset: 210223-mw-000_fakepdf
  use_cuts: internal
  w2min: 12.5

dataset_inputs:
- dataset: NMCPD_dw_ite
  frac: 0.75
- dataset: SLACP_dwsh
  frac: 0.75
- dataset: SLACD_dw_ite
  frac: 0.75
- dataset: BCDMSP_dwsh
  frac: 0.75
- dataset: BCDMSD_dw_ite
  frac: 0.75
- dataset: CHORUSNUPb_dw_ite
  frac: 0.75
- dataset: CHORUSNBPb_dw_ite
  frac: 0.75
- cfac:
  - MAS
  dataset: NTVNUDMNFe_dw_ite
  frac: 0.75
- dataset: HERACOMBNCEM
  frac: 0.75
- dataset: HERACOMBNCEP575
  frac: 0.75
- dataset: HERACOMBNCEP820
  frac: 0.75
- dataset: HERACOMBNCEP920
  frac: 0.75
- dataset: HERACOMBCCEP
  frac: 0.75
- dataset: HERACOMB_SIGMARED_C
  frac: 0.75

debug: false

description: DIS only data. Some datasets (see partition) are left out of the fit.
  Partition is chosen as in NNPDF40_hyperopt. An inconsistency of type 2 is introduced
  for some of the in-sample datasets, namely all of the HERA NC ones.

fitting:
  savepseudodata: true
  basis:
  - fl: sng
    largex:
    - 1.498
    - 3.138
    smallx:
    - 1.121
    - 1.154
    trainable: false
  - fl: g
    largex:
    - 3.266
    - 6.214
    smallx:
    - 0.9224
    - 1.149
    trainable: false
  - fl: v
    largex:
    - 1.6
    - 3.588
    smallx:
    - 0.5279
    - 0.8017
    trainable: false
  - fl: v3
    largex:
    - 1.761
    - 3.427
    smallx:
    - 0.2011
    - 0.4374
    trainable: false
  - fl: v8
    largex:
    - 1.589
    - 3.378
    smallx:
    - 0.5775
    - 0.8357
    trainable: false
  - fl: t3
    largex:
    - 1.763
    - 3.397
    smallx:
    - -0.484
    - 1.0
    trainable: false
  - fl: t8
    largex:
    - 1.572
    - 3.496
    smallx:
    - 0.6714
    - 0.9197
    trainable: false
  - fl: t15
    largex:
    - 1.503
    - 3.636
    smallx:
    - 1.073
    - 1.164
    trainable: false
  fitbasis: EVOL

genrep: true

integrability:
  integdatasets:
  - dataset: INTEGXT8
    maxlambda: 1e2
  - dataset: INTEGXT3
    maxlambda: 1e2

maxcores: 4
mcseed: 75955222
nnseed: 2080989803

parameters:
  activation_per_layer:
  - tanh
  - tanh
  - linear
  dropout: 0.0
  epochs: 17000
  initializer: glorot_normal
  integrability:
    initial: 10
    multiplier: null
  layer_type: dense
  nodes_per_layer:
  - 25
  - 20
  - 8
  optimizer:
    clipnorm: 6.073e-06
    learning_rate: 0.002621
    optimizer_name: Nadam
  positivity:
    initial: 184.8
    multiplier: null
  stopping_patience: 0.1
  threshold_chi2: 3.5
positivity:
  posdatasets:
  - dataset: POSF2U
    maxlambda: 1e6
  - dataset: POSF2DW
    maxlambda: 1e6
  - dataset: POSF2S
    maxlambda: 1e6
  - dataset: POSFLL
    maxlambda: 1e6
  - dataset: POSDYU
    maxlambda: 1e10
  - dataset: POSDYD
    maxlambda: 1e10
  - dataset: POSDYS
    maxlambda: 1e10
  - dataset: POSF2C
    maxlambda: 1e6
  - dataset: POSXUQ
    maxlambda: 1e6
  - dataset: POSXUB
    maxlambda: 1e6
  - dataset: POSXDQ
    maxlambda: 1e6
  - dataset: POSXDB
    maxlambda: 1e6
  - dataset: POSXSQ
    maxlambda: 1e6
  - dataset: POSXSB
    maxlambda: 1e6
  - dataset: POSXGL
    maxlambda: 1e6

save: weights.h5

theory:
  theoryid: 200
trvlseed: 376191634

… property

…s it is done for the central values

scarlehoff · 2024-11-13T10:12:20Z

This is the type of agreement found between old and new branch for a DIS only inconsistent closure test with lambda = 0
https://vp.nnpdf.science/sH-C_csbTGCPCdEGFXj_1w==

I also used parallel_models: true for the new fit. Is this compatible with the type of agreement that is found between standard runned fits and parallel_models run fits? (@scarlehoff)

I would need to see chi2 / distances / etc. By running in parallel the seeding might happen differently at some stage but the initialization (if everything goes well) should be equivalent.

By eye they look too different to me, but if all metrics are very close it could well be a random fluctuation.

RoyStegeman · 2024-12-05T18:54:04Z

Looking at the plots I share @scarlehoff's impression, but indeed a full compare report including chi2s and distances would be needed to say it more conclusively.

From Stefano's comments during the PC I have the impression that the paper is about to be finished, so I think this PR deserves some priority. It would be very good to have this merged before the paper is on arxiv, because otherwise I'm afraid it may never happen.

comane added closure tests validphys labels Oct 17, 2024

comane requested review from scarlehoff, RoyStegeman and giovannidecrescenzo October 17, 2024 14:35

comane force-pushed the introduce_inconsistent_fit_data branch 2 times, most recently from 5d33ab7 to 36371a3 Compare October 18, 2024 22:02

comane added 12 commits October 20, 2024 13:01

added inconsistent_data_settings parser

0d050bf

added module with InconsistentCommonData class

5d0993b

added explicit node for filtering of inconsistent fakedata

afe592e

added _filter_inconsistent_closure_data to filters

6737047

added test for inconsistent coredata commondata class

f967e4b

renamed type err to treatment err

42a082b

add log warning for ict and avoid circular import error

ab96658

changed names to treatmnet and names)uncertainties

ca16479

adapted inconsistent cd class to write on systematics table

29ef858

override systematic_errors method with a property and added setter to…

58b4e75

… property

reindex systematic errors for inconsistent datasets in the same way a…

21cd934

…s it is done for the central values

adapted tests

809dd6d

comane force-pushed the introduce_inconsistent_fit_data branch from 36371a3 to 809dd6d Compare October 20, 2024 12:01

RoyStegeman mentioned this pull request Dec 5, 2024

Introduction of an inconsistency within a Closure Test #1682

Open

giovannidecrescenzo approved these changes Feb 10, 2025

View reviewed changes

giovannidecrescenzo self-requested a review February 10, 2025 14:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce inconsistent fit closure test data #2180

Introduce inconsistent fit closure test data #2180

comane commented Oct 17, 2024 •

edited

Loading

scarlehoff commented Nov 13, 2024

RoyStegeman commented Dec 5, 2024

Introduce inconsistent fit closure test data #2180

Are you sure you want to change the base?

Introduce inconsistent fit closure test data #2180

Conversation

comane commented Oct 17, 2024 • edited Loading

New Class and Methods:

Configuration Updates:

Filtering Methods:

Testing:

Benchmarking of Code

Test 0.0

Further tests..

scarlehoff commented Nov 13, 2024

RoyStegeman commented Dec 5, 2024

comane commented Oct 17, 2024 •

edited

Loading