Elara is altering link ids #240

D-Dulius · 2025-02-07T11:53:18Z

We have discovered an elara bug while working on the Basildon ABM build. For various validation exercises we use some of the elara outputs, in this specific case link_vehicle_speeds_car_average.geojson.

Comparing this network with the genet network used by the same sim we can see for links with the same node numbers, the link ids change.

For example, sim 2019_baseline_new_network_controller_10pc_20250108 according to the matsim config uses network v5_mad_prairie via:

<param name="inputNetworkFile" value="/efs/basildon/network/v5_mad_prairie/plus_wider_te/network.xml"/>

For link id 402018 in link_vehicle_speeds_car_average.geojson, this link has the following from, to nodes: 5177106139033910521 and 5177106139066155527. Whereas in v5_mad_prairie the link id is 68892 (same from, to node ids).

Upon finding this, I also remembered seeing this exact same issue flagged as a comment in Alex K's old BERTIE validation jupyter notebooks (some of which we still use and converted into scripts as part of our current BERTIE validation workflow), see: https://github.com/arup-group/te_post_processing/blob/6f141b0cdb94025df21c6174cf468b23ac8ff81f/benchmarking_2023_refresh/Dashboard/SERTM%20Benchmark%20-%20By%20Vehicle%20Type_v2.py#L189C5-L196C63

At the time I inherited the above I had no real idea about elara or what it did and because the notebooks were so old I assumed whatever this bug was, was fixed by the time I took over looking after validation for our V2 refresh of BERTIE in 2023. The current workaround used in the benchmarking scripts is to use the from, to node ids and join to a version of the network pre-elara. This is what I have suggested we do currently for Basildon while we investigate this issue (elara seems to be only altering link ids, node ids are unaffected, confirmed by the above example).

The text was updated successfully, but these errors were encountered:

divyasharma-arup · 2025-02-10T09:30:01Z

Very quick question - we had talked about indexing the network data because the string information for link_ids makes the datasets very large. Are the link_ids in link_vehicle_speeds_car_average.geojson all integers, or do you see formats such as: 5177106139033910521_5177106139066155527? I'm just wondering if the elara file has been re-indexed for any reason.

divyasharma-arup · 2025-02-10T09:34:35Z

Reading through some of what is posted on slack, it seems like the feature of re-indexing elara outputs to reduce size (and improve run times) means the link_ids are no longer compatible with the original network file. I don't know if we have an "elara" version of the network file @syhwawa, that has the re-indexed link_ids?

D-Dulius · 2025-02-10T09:43:43Z

Very quick question - we had talked about indexing the network data because the string information for link_ids makes the datasets very large. Are the link_ids in link_vehicle_speeds_car_average.geojson all integers, or do you see formats such as: 5177106139033910521_5177106139066155527? I'm just wondering if the elara file has been re-indexed for any reason.

@divyasharma-arup I can see some link ids with the long node to node format i.e 5177106139033910521_5177106139066155527 in link_vehicle_speeds_car_average.geojson, I didn't realise elara re-indexing link ids was actually a feature not a bug! This has caused problems though because I didn't know, i.e I gave the elara output above to some of the team/Dan for their network review and so when those links were handed over to Neil the link ids didn't match the genet network.

Though if elara was re-indexing properly then that should imply there should be no link ids which are non-intergers in the elara outputs, is that right?

D-Dulius · 2025-02-10T09:48:44Z

I understand the reasons for reindexing link ids like you say above (run times) but would it not be better to just apply the reindexing at the genet/network creation stage? So we have a consistent network both pre/post elara?

divyasharma-arup · 2025-02-10T10:06:59Z

Though if elara was re-indexing properly then that should imply there should be no link ids which are non-intergers in the elara outputs, is that right?

Assuming this is the reason why the link_ids are different between the two files, yes, I'd expect link_vehicle_speeds_car_average.geojson to only have integer link_ids...and I agree, the re-indexing should be consistent.

Just a note that elara's scope is MATSim outputs (I think). So things like synthesis inputs (population & network) will have their original information -- which we want to maintain for traceability of the process. Therefore, we should probably only be comparing elara outputs with each other. Have you ever worked with output_network.xml? I am hoping that the link_ids will be consistent here, as we definitely assume they are in a lot of our code...!

D-Dulius · 2025-02-10T10:20:16Z

I haven't worked with output_network.xml no, link_vehicle_speeds_car_average.geojson we use specifically for benchmarking routed link-based journey times now, and because I thought the network pre/post elara was consistent I gave the elara output for others in the Basildon team for network reviews, no way they would be able to know how to use an xml for this purpose (i.e the strategic modellers who have been resourced on the project), so something to think about for us potentially.

(elara outputs are easier to use for non-CMLers with no programming background as they can just stick it in QGIS)

D-Dulius · 2025-02-10T10:25:15Z

I myself don't really have any experience using/parsing xml files hehe, can geopandas read in xml files?

divyasharma-arup · 2025-02-10T10:30:02Z

ersa object is useful for that: sample script.

Your point is noted that the output_network.xml is not easy to work with (which is why it's parsed in the above). Something for us to think about in regards to whether ersa should be the home for this code of parsing the network or whether it should be elara.

divyasharma-arup · 2025-02-10T10:31:36Z

you can do this with ersa:

def create_scenario(path_scenario):
    s = Scenario(data={
        'network': Network(path=os.path.join(path_scenario, 'output_network.xml'), crs='27700'),
        'link_logs': Table(
            path=os.path.join(path_scenario, 'vehicle_link_log_all.csv'))
    })
    return s


baseline = create_scenario(path_baseline)
baseline.network.link_lengths

D-Dulius · 2025-02-10T10:34:41Z

you can do this with ersa:

def create_scenario(path_scenario):
    s = Scenario(data={
        'network': Network(path=os.path.join(path_scenario, 'output_network.xml'), crs='27700'),
        'link_logs': Table(
            path=os.path.join(path_scenario, 'vehicle_link_log_all.csv'))
    })
    return s


baseline = create_scenario(path_baseline)
baseline.network.link_lengths

Ah yes of course, forgot ersa loads in the output xml files, in that case I have indirectly worked with xml files via ersa lol, I will take a look at the xml for link ids which were not able to be matched last week

syhwawa · 2025-02-10T15:46:15Z

Reading through some of what is posted on slack, it seems like the feature of re-indexing elara outputs to reduce size (and improve run times) means the link_ids are no longer compatible with the original network file.

Could you remind me where was conversation/thread about this please? @divyasharma-arup.

TBH, I thought the link_id should be the same compared to the network file but it seems it doesn't.

Use ersa to load the matsim output network might be the easiest way and it's worth examining the elara code to check where the reindexing happens.

divyasharma-arup · 2025-03-17T10:46:43Z

hi @D-Dulius,

I had a look into this, as I know it can drive issues when investigating potential network changes needed as part of validation/calibration. Here is what I found:

I think Elara link_counts data is based on the MATSim output_network. I've written a quick check to confirm that these two files appear to having matching from, to ids based on the link_id. Nothing returned as mismatching for the whole file. These link_ids appear to be a mix of integers and strings, so it isn't obvious if there was a reindexing happening.
I also checked your input network file for the specific from, to ids and I can confirm the link_id is different.
So therefore, my theory is that network link ids change as part of the MATSim simulation process. I am not sure why.

Initial loading of data:

link_counts = gp.read_file('/mnt/efs/basildon/2019_baseline_new_network_controller_10pc_20250108/600/elara/link_vehicle_counts_car.geojson')
scenario = create_scenario('/mnt/efs/basildon/2019_baseline_new_network_controller_10pc_20250108/600')
network_links = scenario.network.links

Elara Link Counts
/mnt/efs/basildon/2019_baseline_new_network_controller_10pc_20250108/600/elara/link_vehicle_counts_car.geojson

      link_id                 from                   to    
402018  5177106139033910521  5177106139066155527

MATSim Output Network
/mnt/efs/basildon/2019_baseline_new_network_controller_10pc_20250108/600/output_network.xml.gz

                       from                   to  
id                                                                      
402018  5177106139033910521  5177106139066155527

MATSim Input Network
/mnt/efs/basildon/network/v5_mad_prairie/plus_wider_te/network.xml

<link id="68892" from="5177106139033910521" to="5177106139066155527">

Therefore, @syhwawa, is there any way to confirm that there is a process where MATSim changes input link ids when running sims/writing outputs? I'm not sure if this might be part of Network Synthesis or Simulation. Is there any place that stores the "original" link id and the resulting "MATSim" link id?

Without that, we may have to utilise Alex's process of matching from, to ids instead of using link ids to find the "original" Network link ids. We may need to work with model reviewers to refence from, to ids instead of purely the link id to be confident we're speaking about the same links.

divyasharma-arup · 2025-03-17T11:27:25Z

Also just a note that I've checked different sims that use the same network and it seems the elara output networks match (visually spot checked by plotting specific link_ids). So, if there is an alteration process from input_network -> output_network, it appears to be consistent between different simulation outputs.

D-Dulius · 2025-03-19T09:38:12Z

So in summary Matsim is altering link ids during simulation and not elara? @divyasharma-arup

D-Dulius · 2025-03-19T09:41:36Z

Shall we rename the issue and move it somewhere else then? (currently opened as an elara issue)

neilmt · 2025-03-19T10:12:01Z

The link_id values that Gerry was quoting from Tramola for Basildon don't match the link_id values in either the network standard outputs or the elara outputs. E.g. as @anarcoteron noted, what Tramola listed as 556837 is 457563 in both the standard outputs and elara outputs.

divyasharma-arup · 2025-03-19T14:00:29Z

Actually sorry @D-Dulius , I think for this issue we were just looking at the wrong input network file.

This is an original network file, that doesn't have "wider te".

/mnt/efs/basildon/network/v5_mad_prairie/network.xml

<link id="68892" from="5177106139033910521" to="5177106139066155527" freespeed="12.5" capacity="600" permlanes="1" oneway="1" modes="bus,bike,car,walk" length="52.67531441066485">

This is the one that was ultimately used by the sim.
/mnt/efs/basildon/network/v5_mad_prairie/plus_wider_te/network.xml

<link id="68892" from="5177175543941106057" to="5177175544033489349" freespeed="8.34" capacity="600.0" permlanes="1.0" oneway="1" modes="car" length="39.63713396805407">

<link id="402018" from="5177106139033910521" to="5177106139066155527" freespeed="12.5" capacity="600.0" permlanes="1.0" oneway="1" modes="car,bus" length="52.67531441066485">

D-Dulius · 2025-03-19T14:07:26Z

You've lost me hehe, I refer to the wider te network in the issue description when comparing against elara

divyasharma-arup · 2025-03-19T14:07:52Z

Yeah but when I look at the data I think I'm seeing something different from what you originally said.

D-Dulius · 2025-03-19T14:21:00Z

So there is no mismatch between elara and the output matsim network after all?

divyasharma-arup · 2025-03-19T14:21:20Z

I think that's my conclusion, we were just referring to the wrong network file.

D-Dulius · 2025-03-19T14:22:00Z

@neilmt When you flagged these mismatches do you remember which network file you used?

D-Dulius · 2025-03-19T14:22:32Z

I'm going through the old thread but now can't find the specific file path

D-Dulius · 2025-03-19T14:28:50Z

This link points me to v6 of the network:

https://eu-west-1.console.aws.amazon.com/s3/buckets/arup-ukimea-te-abm-prod-efs-archive?region=eu-west-1&bucketType=general&prefix=basildon%2Fnetwork%2Fv6_marinated_dessert%2F&showversions=false&tab=objects

D-Dulius · 2025-03-19T14:29:29Z

D-Dulius · 2025-03-19T14:31:03Z

divyasharma-arup · 2025-03-19T14:31:31Z

I think Daumantas if you can recreate your problem, then we can have a look. Otherwise perhaps we should close this issue.

D-Dulius · 2025-03-19T14:34:46Z

This was something flagged by @neilmt and I can't be certain which network he used to initially compare against the elara output I gave him (I think when I created this issue I assumed it was the same as the one defined in the matsim config above which was not the case), so I'm closing this issue given the above numbers you've exported @divyasharma-arup which show no mismatch.

D-Dulius · 2025-03-19T14:40:25Z

To avoid this problem in the future we should avoid using the standard genet output network_links.parquet as it looks like it does not include the wider TE network whereas elara does if I understand correctly.

D-Dulius · 2025-03-19T14:51:16Z

This should've been the red flag, lol we noticed this earlier! 😬

D-Dulius · 2025-03-19T15:00:39Z

There is something I am still not quite understanding though, if the sim ends up using the wider te network then what are we using the network without the wider te bit for?

neilmt · 2025-03-19T15:43:36Z

To avoid this problem in the future we should avoid using the standard genet output network_links.parquet as it looks like it does not include the wider TE network whereas elara does if I understand correctly.

I don't think this is the case. There still are discrepancies between elara output link_id values and the standard output link_id values - and the latter values are what should be used for specifying network edits with genet. From memory the discrepancies were often to do with the active mode network - I'll have a look at the discrepancies once I've ticked off the tasks I'm currently looking at.

D-Dulius · 2025-03-19T15:47:28Z

@neilmt What file specifically are you referring to when you say standard output? @divyasharma-arup saw zero link id mismatches comparing elara output networks with the matsim output network file /mnt/efs/basildon/network/v5_mad_prairie/plus_wider_te/network.xml

D-Dulius added the bug Something isn't working label Feb 7, 2025

D-Dulius assigned D-Dulius, divyasharma-arup, syhwawa, neilmt and gac55 Feb 7, 2025

D-Dulius closed this as completed Mar 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Elara is altering link ids #240

Elara is altering link ids #240

D-Dulius commented Feb 7, 2025 •

edited

Loading

divyasharma-arup commented Feb 10, 2025

divyasharma-arup commented Feb 10, 2025

D-Dulius commented Feb 10, 2025 •

edited

Loading

D-Dulius commented Feb 10, 2025

divyasharma-arup commented Feb 10, 2025

D-Dulius commented Feb 10, 2025 •

edited

Loading

D-Dulius commented Feb 10, 2025

divyasharma-arup commented Feb 10, 2025

divyasharma-arup commented Feb 10, 2025

D-Dulius commented Feb 10, 2025

syhwawa commented Feb 10, 2025

divyasharma-arup commented Mar 17, 2025

divyasharma-arup commented Mar 17, 2025

D-Dulius commented Mar 19, 2025 •

edited

Loading

D-Dulius commented Mar 19, 2025

neilmt commented Mar 19, 2025

divyasharma-arup commented Mar 19, 2025

D-Dulius commented Mar 19, 2025

divyasharma-arup commented Mar 19, 2025

D-Dulius commented Mar 19, 2025

divyasharma-arup commented Mar 19, 2025

D-Dulius commented Mar 19, 2025 •

edited

Loading

D-Dulius commented Mar 19, 2025

D-Dulius commented Mar 19, 2025

D-Dulius commented Mar 19, 2025

D-Dulius commented Mar 19, 2025

divyasharma-arup commented Mar 19, 2025

D-Dulius commented Mar 19, 2025 •

edited

Loading

D-Dulius commented Mar 19, 2025 •

edited

Loading

D-Dulius commented Mar 19, 2025

D-Dulius commented Mar 19, 2025

neilmt commented Mar 19, 2025 •

edited

Loading

D-Dulius commented Mar 19, 2025 •

edited

Loading

Elara is altering link ids #240

Elara is altering link ids #240

Comments

D-Dulius commented Feb 7, 2025 • edited Loading

divyasharma-arup commented Feb 10, 2025

divyasharma-arup commented Feb 10, 2025

D-Dulius commented Feb 10, 2025 • edited Loading

D-Dulius commented Feb 10, 2025

divyasharma-arup commented Feb 10, 2025

D-Dulius commented Feb 10, 2025 • edited Loading

D-Dulius commented Feb 10, 2025

divyasharma-arup commented Feb 10, 2025

divyasharma-arup commented Feb 10, 2025

D-Dulius commented Feb 10, 2025

syhwawa commented Feb 10, 2025

divyasharma-arup commented Mar 17, 2025

divyasharma-arup commented Mar 17, 2025

D-Dulius commented Mar 19, 2025 • edited Loading

D-Dulius commented Mar 19, 2025

neilmt commented Mar 19, 2025

divyasharma-arup commented Mar 19, 2025

D-Dulius commented Mar 19, 2025

divyasharma-arup commented Mar 19, 2025

D-Dulius commented Mar 19, 2025

divyasharma-arup commented Mar 19, 2025

D-Dulius commented Mar 19, 2025 • edited Loading

D-Dulius commented Mar 19, 2025

D-Dulius commented Mar 19, 2025

D-Dulius commented Mar 19, 2025

D-Dulius commented Mar 19, 2025

divyasharma-arup commented Mar 19, 2025

D-Dulius commented Mar 19, 2025 • edited Loading

D-Dulius commented Mar 19, 2025 • edited Loading

D-Dulius commented Mar 19, 2025

D-Dulius commented Mar 19, 2025

neilmt commented Mar 19, 2025 • edited Loading

D-Dulius commented Mar 19, 2025 • edited Loading

D-Dulius commented Feb 7, 2025 •

edited

Loading

D-Dulius commented Feb 10, 2025 •

edited

Loading

D-Dulius commented Feb 10, 2025 •

edited

Loading

D-Dulius commented Mar 19, 2025 •

edited

Loading

D-Dulius commented Mar 19, 2025 •

edited

Loading

D-Dulius commented Mar 19, 2025 •

edited

Loading

D-Dulius commented Mar 19, 2025 •

edited

Loading

neilmt commented Mar 19, 2025 •

edited

Loading

D-Dulius commented Mar 19, 2025 •

edited

Loading