Skip to content

Arclight conflates structural and descriptive meanings of "collection" #1641

@corylown

Description

@corylown

The Two Meanings of "Collection" in Arclight

EAD uses @level="collection" as a descriptive attribute indicating the level of description (collection, series, subseries, file, item, etc.). It has no structural meaning — any component at any position in the hierarchy can carry it.

Arclight treats "collection" as structural, synonymous with "the top-level document of an EAD." We should consider making this distinction clearer. Doing so would address:

Proposed fix

Add method to cover the structural case:

def root_document?
  component_level == 0
end

Either replace all structural uses of SolrDocument#collection? with this new method or alias collection? to the new method.

Remove the hardcoded level_ssm: 'collection' lets level_ssm carry the actual @level value for root documents, consistent with how components are indexed.

The one thing to verify before removing the hardcoded value: any place that currently relies on level_ssm: 'collection' to mean "this is a root document" needs to be audited and switched to the root-based check. The breadcrumb partitioner, access_component.rb, the route filter, and within_collection are the known callers.

routes.rb:

get 'collections' => 'catalog#index', defaults: { f: { component_level_isim: [0] } }

catalog_controller.rb:

Note: this could just be leftover cruft to remove, see #1642

config.add_search_field 'within_collection' do |field|
  field.solr_parameters = { fq: '-component_level_isim:0' }
end

Analysis (with some help from Claude Sonnet 4.6)

Indexer forcibly assigns level_ssm: 'collection' to all top-level documents

ead2_config.rb (the indexer for the archdesc/root document) hardcodes:

to_field 'level_ssm' do |_record, accumulator|
  accumulator << 'collection'
end

Every EAD root document gets level_ssm: collection regardless of its actual @Level attribute. The real value is preserved only in level_ssim (the facetable field), and even there a synthetic 'Collection' entry is appended unless the actual level was already collection. So a @Level="fonds" document ends up with both "Fonds" and "Collection" in level_ssim.

What breaks: An EAD whose archdesc carries @Level="fonds", @Level="recordgrp", or any non-collection level will be incorrectly presented and faceted as a Collection.

SolrDocument#collection? checks level_ssm, which is always 'collection' for the root

def collection?
  level&.parameterize == 'collection'
end

level reads from level_ssm. Because every root document has level_ssm: collection, collection? is really asking "is this the root document?" — not "does this EAD component have @Level='collection'?". The method name suggests EAD semantics but implements structural semantics.

What breaks when a non-root component has @level="collection": A nested component indexed by ead2_component_config.rb gets its real level in level_ssm (not hardcoded to 'collection'). So a nested <c @Level="collection"> will have level_ssm: collection, making collection? return true for it — but it is not a root document and does not have the fields the rest of the code expects a root to have (_root_ pointing to itself, normalized_title_ssm used as the collection name for sub-components, etc.).

Breadcrumb component partitions by collection? assuming one collection at the top

breadcrumbs_hierarchy_component.rb:

collections, @parents_under_collection = document.parents.partition(&:collection?)
@collection = collections.first

This walks the parent chain and partitions into "collections" vs. "everything else under a collection." It assumes exactly one collection exists in the ancestry and that it sits at the top. With nested @level="collection" components, you get multiple hits from partition, and collections.first picks the wrong one.

arclight/parent.rb:

def collection?
  level == 'collection'
end

This checks the level attribute of the Parent struct, which is populated from parent_levels_ssm. Those levels come from ead2_component_config.rb:

to_field 'parent_levels_ssm' do |_record, accumulator, _context|
  accumulator.concat settings[:parent].output_hash['parent_levels_ssm'] || []
  accumulator.concat settings[:parent].output_hash['level_ssm'] || []
end

The root document's level_ssm is 'collection' (hardcoded), so the breadcrumb logic does work for normal cases. But when there is a nested collection component, level_ssm for that component will also be 'collection', and the breadcrumb partitioner will treat it as a collection boundary — potentially creating two "collection" entries.

The collections route and within_collection search filter use level_ssim:Collection

routes.rb:

get 'collections' => 'catalog#index', defaults: { f: { level: ['Collection'] } }

catalog_controller.rb:

config.add_search_field 'within_collection' do |field|
  field.solr_parameters = { fq: '-level_ssim:Collection' }
end

Because 'Collection' is synthetically appended to level_ssim for all root documents, these filters work for the common case. But they inadvertently include any component with @level="collection" in the "Collections" listing, and they exclude those same nested components from "within collection" searches.

Issue #1533 (nested collections crashing): the normalized_id failure

The crash on arclight/normalized_id.rb raising IDNotFound happens because a nested <c @level="collection"> in the EAD gets indexed as a component with id built as root_id_ref_id. If the nested collection component itself has sub-components, those sub-components' parent chain includes the nested collection's id. When the breadcrumb/hierarchy code tries to reconstruct that parent chain and resolve the collection document via the _root_ subquery, the nested collection's document record may not have the expected root pointing to itself (it points to the actual root). The cascade of assumptions about what a "collection" document looks like breaks down.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions