Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elara is altering link ids #240

Closed
D-Dulius opened this issue Feb 7, 2025 · 33 comments
Closed

Elara is altering link ids #240

D-Dulius opened this issue Feb 7, 2025 · 33 comments
Assignees
Labels
bug Something isn't working

Comments

@D-Dulius
Copy link

D-Dulius commented Feb 7, 2025

We have discovered an elara bug while working on the Basildon ABM build. For various validation exercises we use some of the elara outputs, in this specific case link_vehicle_speeds_car_average.geojson.

Comparing this network with the genet network used by the same sim we can see for links with the same node numbers, the link ids change.

For example, sim 2019_baseline_new_network_controller_10pc_20250108 according to the matsim config uses network v5_mad_prairie via:

<param name="inputNetworkFile" value="/efs/basildon/network/v5_mad_prairie/plus_wider_te/network.xml"/>

For link id 402018 in link_vehicle_speeds_car_average.geojson, this link has the following from, to nodes: 5177106139033910521 and 5177106139066155527. Whereas in v5_mad_prairie the link id is 68892 (same from, to node ids).

Upon finding this, I also remembered seeing this exact same issue flagged as a comment in Alex K's old BERTIE validation jupyter notebooks (some of which we still use and converted into scripts as part of our current BERTIE validation workflow), see: https://github.com/arup-group/te_post_processing/blob/6f141b0cdb94025df21c6174cf468b23ac8ff81f/benchmarking_2023_refresh/Dashboard/SERTM%20Benchmark%20-%20By%20Vehicle%20Type_v2.py#L189C5-L196C63

At the time I inherited the above I had no real idea about elara or what it did and because the notebooks were so old I assumed whatever this bug was, was fixed by the time I took over looking after validation for our V2 refresh of BERTIE in 2023. The current workaround used in the benchmarking scripts is to use the from, to node ids and join to a version of the network pre-elara. This is what I have suggested we do currently for Basildon while we investigate this issue (elara seems to be only altering link ids, node ids are unaffected, confirmed by the above example).

Image

Image

@D-Dulius D-Dulius added the bug Something isn't working label Feb 7, 2025
@divyasharma-arup
Copy link
Contributor

Very quick question - we had talked about indexing the network data because the string information for link_ids makes the datasets very large. Are the link_ids in link_vehicle_speeds_car_average.geojson all integers, or do you see formats such as: 5177106139033910521_5177106139066155527? I'm just wondering if the elara file has been re-indexed for any reason.

@divyasharma-arup
Copy link
Contributor

Reading through some of what is posted on slack, it seems like the feature of re-indexing elara outputs to reduce size (and improve run times) means the link_ids are no longer compatible with the original network file. I don't know if we have an "elara" version of the network file @syhwawa, that has the re-indexed link_ids?

@D-Dulius
Copy link
Author

D-Dulius commented Feb 10, 2025

Very quick question - we had talked about indexing the network data because the string information for link_ids makes the datasets very large. Are the link_ids in link_vehicle_speeds_car_average.geojson all integers, or do you see formats such as: 5177106139033910521_5177106139066155527? I'm just wondering if the elara file has been re-indexed for any reason.

@divyasharma-arup I can see some link ids with the long node to node format i.e 5177106139033910521_5177106139066155527 in link_vehicle_speeds_car_average.geojson, I didn't realise elara re-indexing link ids was actually a feature not a bug! This has caused problems though because I didn't know, i.e I gave the elara output above to some of the team/Dan for their network review and so when those links were handed over to Neil the link ids didn't match the genet network.

Though if elara was re-indexing properly then that should imply there should be no link ids which are non-intergers in the elara outputs, is that right?

@D-Dulius
Copy link
Author

I understand the reasons for reindexing link ids like you say above (run times) but would it not be better to just apply the reindexing at the genet/network creation stage? So we have a consistent network both pre/post elara?

@divyasharma-arup
Copy link
Contributor

Though if elara was re-indexing properly then that should imply there should be no link ids which are non-intergers in the elara outputs, is that right?

Assuming this is the reason why the link_ids are different between the two files, yes, I'd expect link_vehicle_speeds_car_average.geojson to only have integer link_ids...and I agree, the re-indexing should be consistent.

Just a note that elara's scope is MATSim outputs (I think). So things like synthesis inputs (population & network) will have their original information -- which we want to maintain for traceability of the process. Therefore, we should probably only be comparing elara outputs with each other. Have you ever worked with output_network.xml? I am hoping that the link_ids will be consistent here, as we definitely assume they are in a lot of our code...!

@D-Dulius
Copy link
Author

D-Dulius commented Feb 10, 2025

I haven't worked with output_network.xml no, link_vehicle_speeds_car_average.geojson we use specifically for benchmarking routed link-based journey times now, and because I thought the network pre/post elara was consistent I gave the elara output for others in the Basildon team for network reviews, no way they would be able to know how to use an xml for this purpose (i.e the strategic modellers who have been resourced on the project), so something to think about for us potentially.

(elara outputs are easier to use for non-CMLers with no programming background as they can just stick it in QGIS)

@D-Dulius
Copy link
Author

I myself don't really have any experience using/parsing xml files hehe, can geopandas read in xml files?

@divyasharma-arup
Copy link
Contributor

ersa object is useful for that: sample script.

Your point is noted that the output_network.xml is not easy to work with (which is why it's parsed in the above). Something for us to think about in regards to whether ersa should be the home for this code of parsing the network or whether it should be elara.

@divyasharma-arup
Copy link
Contributor

you can do this with ersa:

def create_scenario(path_scenario):
    s = Scenario(data={
        'network': Network(path=os.path.join(path_scenario, 'output_network.xml'), crs='27700'),
        'link_logs': Table(
            path=os.path.join(path_scenario, 'vehicle_link_log_all.csv'))
    })
    return s


baseline = create_scenario(path_baseline)
baseline.network.link_lengths

@D-Dulius
Copy link
Author

you can do this with ersa:

def create_scenario(path_scenario):
    s = Scenario(data={
        'network': Network(path=os.path.join(path_scenario, 'output_network.xml'), crs='27700'),
        'link_logs': Table(
            path=os.path.join(path_scenario, 'vehicle_link_log_all.csv'))
    })
    return s


baseline = create_scenario(path_baseline)
baseline.network.link_lengths

Ah yes of course, forgot ersa loads in the output xml files, in that case I have indirectly worked with xml files via ersa lol, I will take a look at the xml for link ids which were not able to be matched last week

@syhwawa
Copy link
Contributor

syhwawa commented Feb 10, 2025

Reading through some of what is posted on slack, it seems like the feature of re-indexing elara outputs to reduce size (and improve run times) means the link_ids are no longer compatible with the original network file.

Could you remind me where was conversation/thread about this please? @divyasharma-arup.

TBH, I thought the link_id should be the same compared to the network file but it seems it doesn't.

Use ersa to load the matsim output network might be the easiest way and it's worth examining the elara code to check where the reindexing happens.

@divyasharma-arup
Copy link
Contributor

hi @D-Dulius,

I had a look into this, as I know it can drive issues when investigating potential network changes needed as part of validation/calibration. Here is what I found:

  1. I think Elara link_counts data is based on the MATSim output_network. I've written a quick check to confirm that these two files appear to having matching from, to ids based on the link_id. Nothing returned as mismatching for the whole file. These link_ids appear to be a mix of integers and strings, so it isn't obvious if there was a reindexing happening.
  2. I also checked your input network file for the specific from, to ids and I can confirm the link_id is different.
  3. So therefore, my theory is that network link ids change as part of the MATSim simulation process. I am not sure why.

Initial loading of data:

link_counts = gp.read_file('/mnt/efs/basildon/2019_baseline_new_network_controller_10pc_20250108/600/elara/link_vehicle_counts_car.geojson')
scenario = create_scenario('/mnt/efs/basildon/2019_baseline_new_network_controller_10pc_20250108/600')
network_links = scenario.network.links

Elara Link Counts
/mnt/efs/basildon/2019_baseline_new_network_controller_10pc_20250108/600/elara/link_vehicle_counts_car.geojson

      link_id                 from                   to    
402018  5177106139033910521  5177106139066155527

MATSim Output Network
/mnt/efs/basildon/2019_baseline_new_network_controller_10pc_20250108/600/output_network.xml.gz

                       from                   to  
id                                                                      
402018  5177106139033910521  5177106139066155527   

MATSim Input Network
/mnt/efs/basildon/network/v5_mad_prairie/plus_wider_te/network.xml

<link id="68892" from="5177106139033910521" to="5177106139066155527">

Therefore, @syhwawa, is there any way to confirm that there is a process where MATSim changes input link ids when running sims/writing outputs? I'm not sure if this might be part of Network Synthesis or Simulation. Is there any place that stores the "original" link id and the resulting "MATSim" link id?

Without that, we may have to utilise Alex's process of matching from, to ids instead of using link ids to find the "original" Network link ids. We may need to work with model reviewers to refence from, to ids instead of purely the link id to be confident we're speaking about the same links.

@divyasharma-arup
Copy link
Contributor

Also just a note that I've checked different sims that use the same network and it seems the elara output networks match (visually spot checked by plotting specific link_ids). So, if there is an alteration process from input_network -> output_network, it appears to be consistent between different simulation outputs.

@D-Dulius
Copy link
Author

D-Dulius commented Mar 19, 2025

So in summary Matsim is altering link ids during simulation and not elara? @divyasharma-arup

@D-Dulius
Copy link
Author

Shall we rename the issue and move it somewhere else then? (currently opened as an elara issue)

@neilmt
Copy link

neilmt commented Mar 19, 2025

The link_id values that Gerry was quoting from Tramola for Basildon don't match the link_id values in either the network standard outputs or the elara outputs. E.g. as @anarcoteron noted, what Tramola listed as 556837 is 457563 in both the standard outputs and elara outputs.

@divyasharma-arup
Copy link
Contributor

Actually sorry @D-Dulius , I think for this issue we were just looking at the wrong input network file.

This is an original network file, that doesn't have "wider te".

/mnt/efs/basildon/network/v5_mad_prairie/network.xml

<link id="68892" from="5177106139033910521" to="5177106139066155527" freespeed="12.5" capacity="600" permlanes="1" oneway="1" modes="bus,bike,car,walk" length="52.67531441066485">

This is the one that was ultimately used by the sim.
/mnt/efs/basildon/network/v5_mad_prairie/plus_wider_te/network.xml

<link id="68892" from="5177175543941106057" to="5177175544033489349" freespeed="8.34" capacity="600.0" permlanes="1.0" oneway="1" modes="car" length="39.63713396805407">

<link id="402018" from="5177106139033910521" to="5177106139066155527" freespeed="12.5" capacity="600.0" permlanes="1.0" oneway="1" modes="car,bus" length="52.67531441066485">

@D-Dulius
Copy link
Author

You've lost me hehe, I refer to the wider te network in the issue description when comparing against elara

@divyasharma-arup
Copy link
Contributor

Yeah but when I look at the data I think I'm seeing something different from what you originally said.

@D-Dulius
Copy link
Author

So there is no mismatch between elara and the output matsim network after all?

@divyasharma-arup
Copy link
Contributor

I think that's my conclusion, we were just referring to the wrong network file.

@D-Dulius
Copy link
Author

D-Dulius commented Mar 19, 2025

@neilmt When you flagged these mismatches do you remember which network file you used?

@D-Dulius
Copy link
Author

I'm going through the old thread but now can't find the specific file path

@D-Dulius
Copy link
Author

Image

@D-Dulius
Copy link
Author

Image

@divyasharma-arup
Copy link
Contributor

I think Daumantas if you can recreate your problem, then we can have a look. Otherwise perhaps we should close this issue.

@D-Dulius
Copy link
Author

D-Dulius commented Mar 19, 2025

This was something flagged by @neilmt and I can't be certain which network he used to initially compare against the elara output I gave him (I think when I created this issue I assumed it was the same as the one defined in the matsim config above which was not the case), so I'm closing this issue given the above numbers you've exported @divyasharma-arup which show no mismatch.

@D-Dulius
Copy link
Author

D-Dulius commented Mar 19, 2025

To avoid this problem in the future we should avoid using the standard genet output network_links.parquet as it looks like it does not include the wider TE network whereas elara does if I understand correctly.

@D-Dulius
Copy link
Author

This should've been the red flag, lol we noticed this earlier! 😬

Image

@D-Dulius
Copy link
Author

There is something I am still not quite understanding though, if the sim ends up using the wider te network then what are we using the network without the wider te bit for?

@neilmt
Copy link

neilmt commented Mar 19, 2025

To avoid this problem in the future we should avoid using the standard genet output network_links.parquet as it looks like it does not include the wider TE network whereas elara does if I understand correctly.

I don't think this is the case. There still are discrepancies between elara output link_id values and the standard output link_id values - and the latter values are what should be used for specifying network edits with genet. From memory the discrepancies were often to do with the active mode network - I'll have a look at the discrepancies once I've ticked off the tasks I'm currently looking at.

@D-Dulius
Copy link
Author

D-Dulius commented Mar 19, 2025

@neilmt What file specifically are you referring to when you say standard output? @divyasharma-arup saw zero link id mismatches comparing elara output networks with the matsim output network file /mnt/efs/basildon/network/v5_mad_prairie/plus_wider_te/network.xml

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants