Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

T4 Creation + DNN Draft PR #151

Draft
wants to merge 16 commits into
base: CMSSW_14_1_0_pre3_LST_X_LSTCore_realfiles_batch1_devel
Choose a base branch
from

Conversation

jchismar
Copy link

@jchismar jchismar commented Feb 5, 2025

Draft PR for T4 creation and introducing a T4 DNN (using the architecture from the T5 DNN).

@GNiendorf
Copy link
Member

GNiendorf commented Feb 7, 2025

I can't see at first glance any mistakes. What I would suggest is to make the same plots showing for abs(eta_1) < 1.1 or some value in the barrel the distribution of inner radii / outer radii from the actual features vector only.

features = np.array(features_list).T
eta_list = np.array(eta_list).T

So to make the plot directly from "features" in the cell below this and see if you still have good separation between real/fake. You could also remove the eta requirement and make the same plot for comparison. The DNN right now is claiming it doesn't see a use of the eta_1 feature (and nearly all of the other hit-based features), so I think that would help clarify where the issue is.

You could also remake the same plot after downsampling/etc. as a second check.

@GNiendorf
Copy link
Member

GNiendorf commented Feb 7, 2025

Or, retrain with just eta_1 and the radii variables and see if eta_1 is used by the DNN then. That would also help clarify things.

@GNiendorf
Copy link
Member

GNiendorf commented Feb 7, 2025

Like I said on the skype chat, if you look at your feature importances

Feature 18 importance: 0.0486
Feature 17 importance: 0.0409
Feature 8 importance: 0.0104
Feature 6 importance: 0.0045
Feature 16 importance: 0.0037
Feature 3 importance: 0.0019
Feature 14 importance: 0.0010
Feature 1 importance: 0.0000
Feature 10 importance: -0.0010
Feature 11 importance: -0.0013
Feature 4 importance: -0.0013
Feature 7 importance: -0.0023
Feature 2 importance: -0.0048
Feature 15 importance: -0.0057
Feature 0 importance: -0.0073
Feature 5 importance: -0.0094
Feature 12 importance: -0.0103
Feature 9 importance: -0.0229
Feature 13 importance: -0.0272

Nearly all of your features have ~0 importance or <0 importance. If you look at the T5 DNN feature importances:

Feature 21 importance: 0.3800
Feature 20 importance: 0.2052
Feature 0 importance: 0.2036
Feature 22 importance: 0.1572
Feature 17 importance: 0.1333
Feature 12 importance: 0.1323
Feature 13 importance: 0.1207
Feature 5 importance: 0.1142
Feature 2 importance: 0.0741
Feature 16 importance: 0.0638
Feature 15 importance: 0.0420
Feature 8 importance: 0.0402
Feature 9 importance: 0.0399
Feature 6 importance: 0.0305
Feature 7 importance: 0.0274
Feature 4 importance: 0.0269
Feature 3 importance: 0.0247
Feature 14 importance: 0.0162
Feature 10 importance: 0.0128
Feature 19 importance: 0.0117
Feature 11 importance: 0.0106
Feature 18 importance: 0.0089
Feature 1 importance: 0.0001

Or the T3 DNN importances:

Feature 5 importance: 0.0285
Feature 13 importance: 0.0258
Feature 0 importance: 0.0237
Feature 2 importance: 0.0187
Feature 9 importance: 0.0184
Feature 12 importance: 0.0161
Feature 3 importance: 0.0123
Feature 4 importance: 0.0098
Feature 11 importance: 0.0093
Feature 10 importance: 0.0089
Feature 7 importance: 0.0079
Feature 8 importance: 0.0056
Feature 6 importance: 0.0014
Feature 1 importance: 0.0000

most features are used. In both cases eta_1 (first eta value, feature 0) is the third most important feature. So I think narrowing the search there will help solve this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants