Add scaling/normalisation of KPI outputs #77

KasiaKoz · 2024-03-26T14:57:50Z

Resolves: #69
Fixes: #73

This PR makes updates to KPI outputs to give a scaled/normalised value alongside the actual KPI value.
The output types of the methods change, majority of them are now maps with two keys: "actual" and "normalised". They are still saved to csv, now with two columns: "actual" and "normalised" (examples below for the integration test files).

[Bonus] When working on this, adding tests specifically, I found a bug in GHG KPI and fixed it right there and then.

Normalisation

We normalise the KPI outputs using an affine transformation. It maps linearly the given interval (unique to each KPI) to [0,10]. I made Normaliser an interface, for if/when we want to change the type of transformation. The bounds of the mapped interval ([0,10]) were deliberately kept changeable - this made it easier to include the normaliser in tests in a simpler capacity (multiplicative constant). Thinking about it now, maybe I should have mocked this, the linear normaliser is tested separately - let me know what you think. I think being able to change scale bounds is still useful if one day we decide we want to map to [0,1] for example.

LinearNormaliser defaults to [0,10] right now, but I still wanted to specify the mapped interval in MatsimKpiGenerator. Example:

LinearNormaliser normaliser = new LinearNormaliser(0, 10, 0, 100)

maps values between 0 and 100 to values between 0 and 10.

One funny thing you will notice is that sometimes the interval for a KPI is flipped, big values are bad and small values are good. This is handled by the same normaliser (using factors of -1 under the hood) and the normaliser object is specified like this:

LinearNormaliser normaliser = new LinearNormaliser(0, 10, 100, 0)

i.e. mapping values between 100 and 0 to values between 0 and 10.

Let me know if that API is clear, I think of it as [0, 10] -> [100, 0]. I briefly considered a dedicated class for the reverse normaliser, but I thought it might look more confusing new ReverseLinearNormaliser normaliser = ReverseLinearNormaliser(0, 10, 0, 100), because you need to pay attention to the name of the class as well as the interval values. That class would be almost identical to the LinearNormaliser class too, with only a couple of small changes. I think what I've got here fells more natural but keen to get some opinions on that.

Tests

Majority of the new code is tests for the KPIs that were being normalised. The more complicated KPIs get more tests.
This also meant more content being added to the builders that set up the inputs to produce different behaviour in the tests. Very happy to be plugging those holes.

Integration tests

Outputs that are benchmarks for integration tests changed for all of the KPIs being scaled. Below are the details.

smol

KPI

old outputs (content of csv file)

new outputs (content of csv file)

Affordability

0.34

actual	normalised
0.34	10.0

PT Wait Time

601.0 (~10min)

actual	normalised
601.0	4.98

Occupancy.

0.17

actual	normalised
0.17	0.0

GHG

1.61 (9.67 after bug fix)

actual	normalised
9.67	0.0

Travel Time

33.31818181818182

actual	normalised
33.32	7.09

Access to Mobility services (bus)

11.11

actual	normalised
11.11	1.11

Access to Mobility services (rail)

0.0

actual	normalised
0.0	0.0

Access to Mobility services (used PT)

0.0

actual	normalised
0.0	0.0

Congestion

mode	Mean [delayRatio]
bus	2.83
car	3.57

mode	Mean [delayRatio]	Normalised [Mean [delayRatio]]
bus	2.83	0.95
car	3.57	0.0

drt

KPI

old outputs (content of csv file)

new outputs (content of csv file)

Affordability

1.98

actual	normalised
1.98	0.0

PT Wait Time

657.0 (~11mins)

actual	normalised
657.0	4.05

Occupancy.

0.09

actual	normalised
0.09	0.0

GHG

5.1 (35.7 after bug fix)

actual	normalised
35.7	0.0

Travel Time

46.75

actual	normalised
46.75	5.41

Access to Mobility services (bus)

0.0

actual	normalised
0.0	0.0

Access to Mobility services (rail)

0.0

actual	normalised
0.0	0.0

Access to Mobility services (used PT)

11.11

actual	normalised
11.11	1.11

Congestion

mode	Mean [delayRatio]
bus	2.83
drt	6.24

mode	Mean [delayRatio]	Normalised [Mean [delayRatio]]
bus	2.83	0.95
car	6.24	0.0

…ctor

mfitz

Looks pretty good. I like the normaliser interface. I'm very glad to see increasing unit test coverage.

There are a couple of things I've raised that are quick fixes (the changelog comments, for example). And I'm not keen on some aspects of some of the builders in the tests.

I only skimmed quite a lot of the tests. They're big and complicated. That is largely because of the awkward design of the code under test (which is definitely not the fault of this PR) - we have a God class, and a strange API. Those things are now starting to impact the maintainability of the code, and one symptom of that is how difficult it is to write unit tests around the KPI calculations.

The fixtures and setup of each test are very involved, there are large numbers of assertions in each test, and it's difficult to match the expectations to the fixtures. Given that the KPI calculations are arguably the most important area of the code, that is something that we should address soon.

CHANGELOG.md

src/main/java/com/arup/cml/abm/kpi/KpiCalculator.java

src/main/java/com/arup/cml/abm/kpi/LinearNormaliser.java

src/test/java/com/arup/cml/abm/kpi/builders/ScoringConfigBuilder.java

...est/java/com/arup/cml/abm/kpi/tablesaw/TestTablesawAccessToMobilityWithLinearNormaliser.java

divyasharma-arup

Thanks Kasia! It's looking very good and a straight forward implementation.

I have two comments:

How should a user interact with the normalisation feature to adjust the bounds of the KPI's? Should they modify the MatsimKpiGenerator.java file? It'll be helpful to maybe just add a couple of sentences in the README on this feature and the expected functionality.
I don't think the Affordability KPI writes correctly in the case where it needs to output -1/-1. Do you mind checking this?

src/main/java/com/arup/cml/abm/kpi/matsim/run/MatsimKpiGenerator.java

src/main/java/com/arup/cml/abm/kpi/tablesaw/TablesawKpiCalculator.java

...est/java/com/arup/cml/abm/kpi/tablesaw/TestTablesawAffordabilityKpiWithLinearNormaliser.java

KasiaKoz

Tidied up changelog and javadoc in LinearNormaliser
Refactored test fixture builders - changed Legs/TripsBuilder -> Legs/TripsTableBuilder which rely on Leg/TripBuilders which build the Leg/Trips with a nicer .withX methods
Also refactored ScoringConfigBuilder to move away from use of defaulted values and nicer .withX methods. This still remains in a couple of places (plans, network & transit schedule builders), I just ran out of time today

src/main/java/com/arup/cml/abm/kpi/LinearNormaliser.java

src/main/java/com/arup/cml/abm/kpi/matsim/run/MatsimKpiGenerator.java

src/main/java/com/arup/cml/abm/kpi/tablesaw/TablesawKpiCalculator.java

src/test/java/com/arup/cml/abm/kpi/builders/LegsBuilder.java

src/test/java/com/arup/cml/abm/kpi/builders/PersonsBuilder.java

mfitz

👍

src/main/java/com/arup/cml/abm/kpi/LinearNormaliser.java

src/test/java/com/arup/cml/abm/kpi/builders/TripBuilder.java

… fewer things at any one time

KasiaKoz · 2024-04-08T10:53:25Z

A few somewhat small changes:

added extra test for affordability for the case when we don't have income info or income related subpop
removed a trivial default constructor in TripBuilder
refactored tests in Access to Mobility tests so that smaller chunks of the outputs get tested at any given time

divyasharma-arup

Thanks Kasia!

mfitz

The tests are looking better now 👍

KasiaKoz added 28 commits March 13, 2024 17:10

add linear scaling factor class

a054240

simplify if statements for reversed scaling factor by multiplying by -1

28eb327

add a couple more test for linear scale factor

1ff7a66

add tests for affordability KPI

bbd765b

add scaling for affordability KPI

66b84ed

refactor to make it more obvious we're testing with linear scaling fa…

f092004

…ctor

no test in test names

f65eaaa

add tests for pt wait time

d6bdf97

add scaling for pt wait time kpi

2ecd790

rename pt wait test

587543f

add test for occupancy rate kpi

d7cb640

add scaling for occupancy rate kpi

c538374

fix typo in occupancy test file

c1e432a

improve vehicles builder

f4af05f

add test for ghg, find and fix bug when calculating per capita

32e160a

update changelog with ghg bug fix #73

43f663b

add scaling for GHG KPI

6d5541b

add tests for Travel Time KPI

70e494d

add scaling to travel time

7ca4cb5

add test for mobility access KPI

18140e3

add scaling for mobility access KPI output

d78d6a6

add more tests for congestion

40f8663

add scaling for congestion KPi output

4de1e03

rename scaling to normalisation

dac3e25

output actual and normalised KPI outputs

db691b2

update changelog with scaling/normalisation

2d608fe

remove more mentions of scaling

cef5bd9

use camel case

81a712b

KasiaKoz requested review from mfitz and divyasharma-arup March 26, 2024 14:58

mfitz requested changes Mar 27, 2024

View reviewed changes

divyasharma-arup requested changes Mar 27, 2024

View reviewed changes

KasiaKoz added 3 commits March 28, 2024 17:47

apply PR comment changes: docs tidy

ab5646d

apply PR comment changes: fewer magic numbers

616ac28

apply PR comment changes: refactor builders

bed3a50

KasiaKoz commented Mar 28, 2024

View reviewed changes

divyasharma-arup previously approved these changes Apr 2, 2024

View reviewed changes

mfitz previously approved these changes Apr 2, 2024

View reviewed changes

src/main/java/com/arup/cml/abm/kpi/LinearNormaliser.java Show resolved Hide resolved

src/test/java/com/arup/cml/abm/kpi/builders/TripBuilder.java Outdated Show resolved Hide resolved

apply PR comment changes: extra test and remove useless constructor

a93c313

KasiaKoz dismissed stale reviews from mfitz and divyasharma-arup via a93c313 April 8, 2024 10:22

apply PR comment changes: refactor access to mobility tests to assert…

4e2cc85

… fewer things at any one time

KasiaKoz requested review from mfitz and divyasharma-arup April 8, 2024 10:53

divyasharma-arup approved these changes Apr 9, 2024

View reviewed changes

mfitz approved these changes Apr 9, 2024

View reviewed changes

KasiaKoz merged commit 317cc32 into main Apr 9, 2024
1 check passed

KasiaKoz deleted the add-scaling-factors branch April 9, 2024 13:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add scaling/normalisation of KPI outputs #77

Add scaling/normalisation of KPI outputs #77

KasiaKoz commented Mar 26, 2024 •

edited

Loading

mfitz left a comment

divyasharma-arup left a comment

KasiaKoz left a comment

mfitz left a comment

KasiaKoz commented Apr 8, 2024

divyasharma-arup left a comment

mfitz left a comment

Add scaling/normalisation of KPI outputs #77

Add scaling/normalisation of KPI outputs #77

Conversation

KasiaKoz commented Mar 26, 2024 • edited Loading

Normalisation

Tests

Integration tests

smol

drt

mfitz left a comment

Choose a reason for hiding this comment

divyasharma-arup left a comment

Choose a reason for hiding this comment

KasiaKoz left a comment

Choose a reason for hiding this comment

mfitz left a comment

Choose a reason for hiding this comment

KasiaKoz commented Apr 8, 2024

divyasharma-arup left a comment

Choose a reason for hiding this comment

mfitz left a comment

Choose a reason for hiding this comment

KasiaKoz commented Mar 26, 2024 •

edited

Loading