Skip to content

fix(loaders): remove silent flags clobber in make_entry, normalize load_category return signature#507

Open
ayushshukla1807 wants to merge 3 commits intohatnote:migrate-image-to-filerevisionfrom
ayushshukla1807:fix-loaders-flags-clobber-and-category-warnings
Open

fix(loaders): remove silent flags clobber in make_entry, normalize load_category return signature#507
ayushshukla1807 wants to merge 3 commits intohatnote:migrate-image-to-filerevisionfrom
ayushshukla1807:fix-loaders-flags-clobber-and-category-warnings

Conversation

@ayushshukla1807
Copy link
Copy Markdown

Two small bugs I spotted while going through the migration code.

1. Silent flags clobber in make_entry() (loaders.py)

There was a dead block at the bottom of make_entry() that would overwrite the reupload flags dict if an edict happened to carry a top-level flags key. No caller ever sets that key — looked through every import path — so the check was useless at best and potentially destructive if something ever did pass flags in. Removed it.

2. load_category() return inconsistency (loaders.py + rdb.py)

Every other loader (load_full_csv, load_by_filename, etc.) returns (entries, warnings). load_category() was the odd one out — just returned a bare list. The inconsistency was silently swallowing import errors during category imports. Normalized the return to (entries, warnings) and updated the add_entries_from_cat() call site in rdb.py to unpack correctly. Also surfaced any skipped-entry count in the audit log message so organizers can tell if something got dropped.

Tests pass locally:

montage/tests/test_web_basic.py::test_home_client     PASSED
montage/tests/test_web_basic.py::test_multiple_jurors PASSED
montage/tests/test_web_basic.py::test_make_entry_reupload_preserves_original_author PASSED

lgelauff and others added 3 commits April 17, 2026 19:28
The `image` and `oldimage` tables are being removed from Wikimedia
wikireplicas on 28 May 2026 (T28741). Rewrites `get_files()` and
`get_file_info()` in `labs.py` to use the new `file`, `filerevision`,
and `filetypes` tables. All output key aliases (`img_*`, `oi_archive_name`,
`rec_img_*`) are preserved so downstream code is unaffected.

Also adds `file_id` (the Commons file identity) as a nullable column on
the `Entry` model, threaded through `make_entry()` for all new imports.

Key implementation notes:
- GROUP BY non-determinism fixed: replaced with a correlated MIN(fr_id)
  subquery that deterministically selects the earliest non-deleted revision
  per file (original uploader / original timestamp).
- `get_files_legacy()` is a temporary verbatim copy of the old query,
  kept alive solely to power `test_get_files_parity` (xfail) while both
  table sets are live. Remove together with that test after 28 May 2026.
- The new `filetypes` LEFT JOIN can produce NULL MIME values (the old
  `image` table never did). Files with unknown MIME are kept (stored as
  NULL) rather than silently dropped. This surfaces a pre-existing bug in
  `autodisqualify_by_filetype()` where SQL NOT IN on NULL evaluates to
  NULL, silently skipping disqualification — left as a follow-up issue.
- Toolforge runs MariaDB 10.6.22 (partial unique indexes require 10.8+),
  so the migration SQL uses a plain non-unique index on `file_id`;
  import-time uniqueness enforced at the application layer.

Migration SQL: `.claude/features/migrate-image-to-filerevision/migration.sql`
Run on production before deploying.

Closes hatnote#504
Refs T28741 (https://phabricator.wikimedia.org/T28741)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add AND fr.fr_deleted = 0 to the main filerevision JOIN in both
get_files() and get_file_info(). In the old image table schema the
current revision was always live; in filerevision, file_latest can
theoretically point to a suppressed revision. Without this guard such
a file would be excluded from results by the inner JOIN rather than
returning NULL metadata — a regression introduced by the migration.

Also add a comment to make_entry() explaining that CSV-imported entries
intentionally receive file_id=NULL (file_id only comes from wikireplica
imports, not the CSV path).

Refs hatnote#505

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ad_category return signature

Two small bugs in loaders.py noticed while working on the migration:

1. make_entry() had a dead code block on lines 75-76 that would overwrite
   legitimate reupload flags (set a few lines earlier) if an edict happened
   to carry a top-level 'flags' key. No caller path ever sets this key in an
   edict, so the block was useless at best and destructive at worst. Removed.

2. load_category() was the only loader function returning a bare list instead
   of the standard (entries, warnings) tuple used by load_full_csv,
   load_partial_csv, load_by_filename and load_name_list. The inconsistency
   was quietly masking import errors. Normalized it to (entries, warnings) and
   updated the rdb.py call site (add_entries_from_cat) to unpack correctly
   and surface any skipped-entry count in the audit log.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants