The Two Meanings of "Collection" in Arclight
EAD uses @level="collection" as a descriptive attribute indicating the level of description (collection, series, subseries, file, item, etc.). It has no structural meaning — any component at any position in the hierarchy can carry it.
Arclight treats "collection" as structural, synonymous with "the top-level document of an EAD." We should consider making this distinction clearer. Doing so would address:
Proposed fix
Add method to cover the structural case:
def root_document?
component_level == 0
end
Either replace all structural uses of SolrDocument#collection? with this new method or alias collection? to the new method.
Remove the hardcoded level_ssm: 'collection' lets level_ssm carry the actual @level value for root documents, consistent with how components are indexed.
The one thing to verify before removing the hardcoded value: any place that currently relies on level_ssm: 'collection' to mean "this is a root document" needs to be audited and switched to the root-based check. The breadcrumb partitioner, access_component.rb, the route filter, and within_collection are the known callers.
routes.rb:
get 'collections' => 'catalog#index', defaults: { f: { component_level_isim: [0] } }
catalog_controller.rb:
Note: this could just be leftover cruft to remove, see #1642
config.add_search_field 'within_collection' do |field|
field.solr_parameters = { fq: '-component_level_isim:0' }
end
Analysis (with some help from Claude Sonnet 4.6)
Indexer forcibly assigns level_ssm: 'collection' to all top-level documents
ead2_config.rb (the indexer for the archdesc/root document) hardcodes:
to_field 'level_ssm' do |_record, accumulator|
accumulator << 'collection'
end
Every EAD root document gets level_ssm: collection regardless of its actual @Level attribute. The real value is preserved only in level_ssim (the facetable field), and even there a synthetic 'Collection' entry is appended unless the actual level was already collection. So a @Level="fonds" document ends up with both "Fonds" and "Collection" in level_ssim.
What breaks: An EAD whose archdesc carries @Level="fonds", @Level="recordgrp", or any non-collection level will be incorrectly presented and faceted as a Collection.
SolrDocument#collection? checks level_ssm, which is always 'collection' for the root
def collection?
level&.parameterize == 'collection'
end
level reads from level_ssm. Because every root document has level_ssm: collection, collection? is really asking "is this the root document?" — not "does this EAD component have @Level='collection'?". The method name suggests EAD semantics but implements structural semantics.
What breaks when a non-root component has @level="collection": A nested component indexed by ead2_component_config.rb gets its real level in level_ssm (not hardcoded to 'collection'). So a nested <c @Level="collection"> will have level_ssm: collection, making collection? return true for it — but it is not a root document and does not have the fields the rest of the code expects a root to have (_root_ pointing to itself, normalized_title_ssm used as the collection name for sub-components, etc.).
Breadcrumb component partitions by collection? assuming one collection at the top
breadcrumbs_hierarchy_component.rb:
collections, @parents_under_collection = document.parents.partition(&:collection?)
@collection = collections.first
This walks the parent chain and partitions into "collections" vs. "everything else under a collection." It assumes exactly one collection exists in the ancestry and that it sits at the top. With nested @level="collection" components, you get multiple hits from partition, and collections.first picks the wrong one.
arclight/parent.rb:
def collection?
level == 'collection'
end
This checks the level attribute of the Parent struct, which is populated from parent_levels_ssm. Those levels come from ead2_component_config.rb:
to_field 'parent_levels_ssm' do |_record, accumulator, _context|
accumulator.concat settings[:parent].output_hash['parent_levels_ssm'] || []
accumulator.concat settings[:parent].output_hash['level_ssm'] || []
end
The root document's level_ssm is 'collection' (hardcoded), so the breadcrumb logic does work for normal cases. But when there is a nested collection component, level_ssm for that component will also be 'collection', and the breadcrumb partitioner will treat it as a collection boundary — potentially creating two "collection" entries.
The collections route and within_collection search filter use level_ssim:Collection
routes.rb:
get 'collections' => 'catalog#index', defaults: { f: { level: ['Collection'] } }
catalog_controller.rb:
config.add_search_field 'within_collection' do |field|
field.solr_parameters = { fq: '-level_ssim:Collection' }
end
Because 'Collection' is synthetically appended to level_ssim for all root documents, these filters work for the common case. But they inadvertently include any component with @level="collection" in the "Collections" listing, and they exclude those same nested components from "within collection" searches.
Issue #1533 (nested collections crashing): the normalized_id failure
The crash on arclight/normalized_id.rb raising IDNotFound happens because a nested <c @level="collection"> in the EAD gets indexed as a component with id built as root_id_ref_id. If the nested collection component itself has sub-components, those sub-components' parent chain includes the nested collection's id. When the breadcrumb/hierarchy code tries to reconstruct that parent chain and resolve the collection document via the _root_ subquery, the nested collection's document record may not have the expected root pointing to itself (it points to the actual root). The cascade of assumptions about what a "collection" document looks like breaks down.
The Two Meanings of "Collection" in Arclight
EAD uses
@level="collection"as a descriptive attribute indicating the level of description (collection, series, subseries, file, item, etc.). It has no structural meaning — any component at any position in the hierarchy can carry it.Arclight treats "collection" as structural, synonymous with "the top-level document of an EAD." We should consider making this distinction clearer. Doing so would address:
Proposed fix
Add method to cover the structural case:
Either replace all structural uses of
SolrDocument#collection?with this new method or alias collection? to the new method.Remove the hardcoded
level_ssm: 'collection'letslevel_ssmcarry the actual@levelvalue for root documents, consistent with how components are indexed.The one thing to verify before removing the hardcoded value: any place that currently relies on
level_ssm: 'collection'to mean "this is a root document" needs to be audited and switched to the root-based check. The breadcrumb partitioner, access_component.rb, the route filter, and within_collection are the known callers.routes.rb:catalog_controller.rb:Note: this could just be leftover cruft to remove, see #1642
Analysis (with some help from Claude Sonnet 4.6)
Indexer forcibly assigns
level_ssm: 'collection'to all top-level documentsead2_config.rb(the indexer for the archdesc/root document) hardcodes:Every EAD root document gets level_ssm: collection regardless of its actual @Level attribute. The real value is preserved only in level_ssim (the facetable field), and even there a synthetic 'Collection' entry is appended unless the actual level was already collection. So a @Level="fonds" document ends up with both "Fonds" and "Collection" in level_ssim.
What breaks: An EAD whose archdesc carries @Level="fonds", @Level="recordgrp", or any non-collection level will be incorrectly presented and faceted as a Collection.
SolrDocument#collection?checks level_ssm, which is always 'collection' for the rootlevelreads fromlevel_ssm. Because every root document haslevel_ssm: collection,collection?is really asking "is this the root document?" — not "does this EAD component have @Level='collection'?". The method name suggests EAD semantics but implements structural semantics.What breaks when a non-root component has
@level="collection": A nested component indexed byead2_component_config.rbgets its real level inlevel_ssm(not hardcoded to 'collection'). So a nested <c @Level="collection"> will have level_ssm: collection, making collection? return true for it — but it is not a root document and does not have the fields the rest of the code expects a root to have (_root_pointing to itself,normalized_title_ssmused as the collection name for sub-components, etc.).Breadcrumb component partitions by collection? assuming one collection at the top
breadcrumbs_hierarchy_component.rb:This walks the parent chain and partitions into "collections" vs. "everything else under a collection." It assumes exactly one collection exists in the ancestry and that it sits at the top. With nested
@level="collection"components, you get multiple hits from partition, andcollections.firstpicks the wrong one.arclight/parent.rb:This checks the level attribute of the Parent struct, which is populated from
parent_levels_ssm. Those levels come from ead2_component_config.rb:The root document's
level_ssmis'collection'(hardcoded), so the breadcrumb logic does work for normal cases. But when there is a nested collection component,level_ssmfor that component will also be'collection', and the breadcrumb partitioner will treat it as a collection boundary — potentially creating two "collection" entries.The
collectionsroute andwithin_collectionsearch filter uselevel_ssim:Collectionroutes.rb:catalog_controller.rb:Because 'Collection' is synthetically appended to
level_ssimfor all root documents, these filters work for the common case. But they inadvertently include any component with@level="collection"in the "Collections" listing, and they exclude those same nested components from "within collection" searches.Issue #1533 (nested collections crashing): the
normalized_idfailureThe crash on
arclight/normalized_id.rbraisingIDNotFoundhappens because a nested<c @level="collection">in the EAD gets indexed as a component withidbuilt asroot_id_ref_id. If the nested collection component itself has sub-components, those sub-components' parent chain includes the nested collection's id. When the breadcrumb/hierarchy code tries to reconstruct that parent chain and resolve the collection document via the_root_subquery, the nested collection's document record may not have the expected root pointing to itself (it points to the actual root). The cascade of assumptions about what a "collection" document looks like breaks down.