Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When an entity is loaded from 2 sources, entity data from the 1st source is lost #289

Open
enriquepablo opened this issue Jan 15, 2025 · 5 comments

Comments

@enriquepablo
Copy link
Contributor

For filtering with trust info, we need to add a few attributes to the discojson format: registrationAutority, the attributes entity-category, entity-category-support, assurance-certification, for IdPs, and DiscoveryResponses for SPs.

When the load pipe loads several sources, it accumulates all entities in a single dictionary keyed by entityID, here. This means that only the data in the last source loaded is going to be kept. There is a comment there saying "TODO: merge", but what we have there are EntityDescriptor XML elements, which for example can only carry at most one RegistrationInfo element.

Code Version

master

Expected Behavior

We would want to keep all the data in each entity until it is used by discojson.

Current Behavior

Data that is different accross sources is lost.

Possible Solution

One possibility would be to parse the entities e.g. around the line of code referenced above, and keep the loosable information in a new dictionary attached to the store, that could then be accessed in the discojson pipe.

Steps to Reproduce

  1. Load an entity from 2 sources, with a different registrationAuthority in each case
  2. Try to access both registrationAuthorities in the discojson pipe
@enriquepablo
Copy link
Contributor Author

Adding a test pipeline to reproduce the issue. Put the 3 files in a directory, adjust the paths in test.yaml, and execute pyff test.yaml. Note that the select pipe in the test has dedup False set, and we obtain 2 identical copies of the entity JSON; if dedup is set to True (default), you just obtain a single copy of the same.

Both XML files are identical except for the RegistrationInfo. The RegistrationInfo from the 1st surce is lost.

Well, github does not allow me to attach yaml or xml files, so I'll paste them below.

test.yaml

- load:
  - file:///path/to/test/directory/test-idp-1.xml
  - file:///path/to/test/directory/test-idp-2.xml
- select dedup False
- discojson
- publish:
    output: "./test.json"
    raw: true
    update_store: false

test-idp-1.xml

<md:EntityDescriptor xmlns:md="urn:oasis:names:tc:SAML:2.0:metadata" xmlns:shibmd="urn:mace:shibboleth:metadata:1.0" xmlns:mdrpi="urn:oasis:names:tc:SAML:metadata:rpi"
                     entityID="https://idp.example.com/saml2/idp/metadata.php">
  <md:IDPSSODescriptor protocolSupportEnumeration="urn:oasis:names:tc:SAML:2.0:protocol">
    <md:Extensions>
      <mdrpi:RegistrationInfo registrationAuthority="http://www.swamid.se/" registrationInstant="2015-02-11T11:09:51Z">
        <mdrpi:RegistrationPolicy xml:lang="en">http://swamid.se/policy/mdrps</mdrpi:RegistrationPolicy>
      </mdrpi:RegistrationInfo>
      <shibmd:Scope regexp="false">example.com</shibmd:Scope>
      <mdui:UIInfo xmlns:mdui="urn:oasis:names:tc:SAML:metadata:ui">
        <mdui:DisplayName xml:lang="sv">Example universitet</mdui:DisplayName>
        <mdui:DisplayName xml:lang="en">Example University</mdui:DisplayName>
        <mdui:Description xml:lang="sv">Identity Provider för Example universitet</mdui:Description>
        <mdui:Description xml:lang="en">Identity Provider for Example University</mdui:Description>
        <mdui:InformationURL xml:lang="sv">http://www.example.com/</mdui:InformationURL>
        <mdui:InformationURL xml:lang="en">http://www.example.com/english/</mdui:InformationURL>
        <mdui:Logo height="63" width="358">https://www.example.com/static/images/umu_logo.jpg</mdui:Logo>
        <mdui:Logo xml:lang="sv" height="63" width="358">https://www.example.com/static/images/logo.jpg</mdui:Logo>
        <mdui:Logo xml:lang="en" height="63" width="350">https://www.example.com/static/images/logo_eng.jpg</mdui:Logo>
        <mdui:Keywords xml:lang="sv">exempel</mdui:Keywords>
        <mdui:Keywords xml:lang="en">example</mdui:Keywords>
      </mdui:UIInfo>
      <mdui:DiscoHints xmlns:mdui="urn:oasis:names:tc:SAML:metadata:ui">
        <mdui:DomainHint>example.com</mdui:DomainHint>
        <mdui:DomainHint>example.net</mdui:DomainHint>
        <mdui:IPHint>10.0.0.0/8</mdui:IPHint>
      </mdui:DiscoHints>
    </md:Extensions>
    <md:ArtifactResolutionService Binding="urn:oasis:names:tc:SAML:2.0:bindings:SOAP" Location="https://idp.example.com/saml2/idp/ArtifactResolutionService.php" index="0"/>
    <md:SingleLogoutService Binding="urn:oasis:names:tc:SAML:2.0:bindings:HTTP-Redirect" Location="https://idp.example.com/saml2/idp/SingleLogoutService.php"/>
    <md:NameIDFormat>urn:oasis:names:tc:SAML:2.0:nameid-format:transient</md:NameIDFormat>
    <md:SingleSignOnService Binding="urn:oasis:names:tc:SAML:2.0:bindings:HTTP-Redirect" Location="https://idp.example.com/saml2/idp/SSOService.php"/>
  </md:IDPSSODescriptor>
  <md:Organization>
    <md:OrganizationName xml:lang="sv">ExempelU</md:OrganizationName>
    <md:OrganizationName xml:lang="en">ExampleU</md:OrganizationName>
    <md:OrganizationDisplayName xml:lang="sv">Exempel Universitetet</md:OrganizationDisplayName>
    <md:OrganizationDisplayName xml:lang="en">The Example University</md:OrganizationDisplayName>
    <md:OrganizationURL xml:lang="sv">http://www.example.com</md:OrganizationURL>
    <md:OrganizationURL xml:lang="en">http://www.example.com/english</md:OrganizationURL>
  </md:Organization>
  <md:ContactPerson contactType="administrative">
    <md:Company>Example University</md:Company>
    <md:SurName>Example helpdesk</md:SurName>
    <md:EmailAddress>[email protected]</md:EmailAddress>
  </md:ContactPerson>
  <md:ContactPerson contactType="technical">
    <md:Company>Example University</md:Company>
    <md:SurName>Example helpdesk</md:SurName>
    <md:EmailAddress>[email protected]</md:EmailAddress>
  </md:ContactPerson>
  <md:ContactPerson contactType="support">
    <md:Company>Example University</md:Company>
    <md:SurName>Servicedesk Example universitet</md:SurName>
    <md:EmailAddress>[email protected]</md:EmailAddress>
  </md:ContactPerson>
</md:EntityDescriptor>

test-idp-2.xml

<md:EntityDescriptor xmlns:md="urn:oasis:names:tc:SAML:2.0:metadata" xmlns:shibmd="urn:mace:shibboleth:metadata:1.0" xmlns:mdrpi="urn:oasis:names:tc:SAML:metadata:rpi"
                     entityID="https://idp.example.com/saml2/idp/metadata.php">
  <md:IDPSSODescriptor protocolSupportEnumeration="urn:oasis:names:tc:SAML:2.0:protocol">
    <md:Extensions>
      <mdrpi:RegistrationInfo registrationAuthority="https://www.carsi.edu.cn" registrationInstant="2020-03-27T09:48:13Z">
        <mdrpi:RegistrationPolicy xml:lang="zh">https://www.carsi.edu.cn/index_zh.htm</mdrpi:RegistrationPolicy>
      </mdrpi:RegistrationInfo>
      <shibmd:Scope regexp="false">example.com</shibmd:Scope>
      <mdui:UIInfo xmlns:mdui="urn:oasis:names:tc:SAML:metadata:ui">
        <mdui:DisplayName xml:lang="sv">Example universitet</mdui:DisplayName>
        <mdui:DisplayName xml:lang="en">Example University</mdui:DisplayName>
        <mdui:Description xml:lang="sv">Identity Provider för Example universitet</mdui:Description>
        <mdui:Description xml:lang="en">Identity Provider for Example University</mdui:Description>
        <mdui:InformationURL xml:lang="sv">http://www.example.com/</mdui:InformationURL>
        <mdui:InformationURL xml:lang="en">http://www.example.com/english/</mdui:InformationURL>
        <mdui:Logo height="63" width="358">https://www.example.com/static/images/umu_logo.jpg</mdui:Logo>
        <mdui:Logo xml:lang="sv" height="63" width="358">https://www.example.com/static/images/logo.jpg</mdui:Logo>
        <mdui:Logo xml:lang="en" height="63" width="350">https://www.example.com/static/images/logo_eng.jpg</mdui:Logo>
        <mdui:Keywords xml:lang="sv">exempel</mdui:Keywords>
        <mdui:Keywords xml:lang="en">example</mdui:Keywords>
      </mdui:UIInfo>
      <mdui:DiscoHints xmlns:mdui="urn:oasis:names:tc:SAML:metadata:ui">
        <mdui:DomainHint>example.com</mdui:DomainHint>
        <mdui:DomainHint>example.net</mdui:DomainHint>
        <mdui:IPHint>10.0.0.0/8</mdui:IPHint>
      </mdui:DiscoHints>
    </md:Extensions>
    <md:ArtifactResolutionService Binding="urn:oasis:names:tc:SAML:2.0:bindings:SOAP" Location="https://idp.example.com/saml2/idp/ArtifactResolutionService.php" index="0"/>
    <md:SingleLogoutService Binding="urn:oasis:names:tc:SAML:2.0:bindings:HTTP-Redirect" Location="https://idp.example.com/saml2/idp/SingleLogoutService.php"/>
    <md:NameIDFormat>urn:oasis:names:tc:SAML:2.0:nameid-format:transient</md:NameIDFormat>
    <md:SingleSignOnService Binding="urn:oasis:names:tc:SAML:2.0:bindings:HTTP-Redirect" Location="https://idp.example.com/saml2/idp/SSOService.php"/>
  </md:IDPSSODescriptor>
  <md:Organization>
    <md:OrganizationName xml:lang="sv">ExempelU</md:OrganizationName>
    <md:OrganizationName xml:lang="en">ExampleU</md:OrganizationName>
    <md:OrganizationDisplayName xml:lang="sv">Exempel Universitetet</md:OrganizationDisplayName>
    <md:OrganizationDisplayName xml:lang="en">The Example University</md:OrganizationDisplayName>
    <md:OrganizationURL xml:lang="sv">http://www.example.com</md:OrganizationURL>
    <md:OrganizationURL xml:lang="en">http://www.example.com/english</md:OrganizationURL>
  </md:Organization>
  <md:ContactPerson contactType="administrative">
    <md:Company>Example University</md:Company>
    <md:SurName>Example helpdesk</md:SurName>
    <md:EmailAddress>[email protected]</md:EmailAddress>
  </md:ContactPerson>
  <md:ContactPerson contactType="technical">
    <md:Company>Example University</md:Company>
    <md:SurName>Example helpdesk</md:SurName>
    <md:EmailAddress>[email protected]</md:EmailAddress>
  </md:ContactPerson>
  <md:ContactPerson contactType="support">
    <md:Company>Example University</md:Company>
    <md:SurName>Servicedesk Example universitet</md:SurName>
    <md:EmailAddress>[email protected]</md:EmailAddress>
  </md:ContactPerson>
</md:EntityDescriptor>

@enriquepablo
Copy link
Contributor Author

enriquepablo commented Feb 3, 2025

Detailed solution

The main requirements are:

  • By default, pyFF's behaviour doesn't change
  • We add options to the load and select pipes to not deduplicate entities that come from different sources and have the same entityID.

Code interventions

  • In the load pipe, entities from all sources are accumulated in a Store.entites dict keyed by entityID, here. With this proposal, if load receives a no dedup option, it would key the entities with both entityID and md_source.
  • In the select pipe we would use this PR.

Intended Usage

The objective of this proposal is to allow the MDQ service to pre-filter results according to certain entity attributes. In principle, these are: entity-category, entity-category-support, assurance-certification, and registrationAuthority.
So the MDQ server, when it loads the entity metadata, will look for entities with the same entityID, will merge their values for the above attributes, and use the merged result to index a single copy of the entity. These attributes are not provided to the frontend, so the entity data sent to the frontend will not change.

Problems

  • The same entityID in 2 different federations may stand for different entities. This is a problem that stands right now, is not introduced by this change, so I'd consider it orthogonal to this question.
  • We are merging entity metadata that may be different due to policy. Again, I think this problem is orthogonal to this solution. As things stand right now, assuming that for example OpenAthens is sourced before SWAMID, a user of an IdP that is registered in both federations, and wants to access an OpenAthens SP, will receive the IdP entity metadata that was registered with SWAMID. Even more: we are already able to filter by metadata source. So if an SP chooses to pre-filter results by md_source=OpenAthens, it will receive the metadata registered with SWAMID for all entities registered in both.

@enriquepablo
Copy link
Contributor Author

To insist on the above. SeamlessAccess is not SAML. To start with, there is a mismatch in the uniqueness of entityID's: in SAML, they are unique by federation, but SeamlessAccess wuold like them to be universally unique.
So SeamlessAccess is a service on top of SAML that needs to deal with this mismatch, in the sense that better allows it to provide the intended service.

If pyFF is allowed to produce a JSON list of entities with duplicates for entities registered in more than one source, it won't be doing anything wrong in the SAML sense - there will be no merging of data from different sources.

Then it is thiss-mdq who will need to deduplicate the IdP list received from pyFF. But thiss-mdq will be serving for SeamlessAccess, so it does not need (or can) be fully SAML compliant. At this point this needs to be useful rather than compliant.

@mikaelfrykholm
Copy link
Contributor

I have a hard time following the changes needed for this. Can you write this up as a pull request?
I have a feeling it might be pretty substantial changes to get this to work.

@enriquepablo
Copy link
Contributor Author

This is a POC for this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants