Skip to content

Conversation

@oosei25
Copy link

@oosei25 oosei25 commented Nov 11, 2025

Motivation

Users (esp. students) are often confused when read_stata raises
“Version of given Stata file is {version}” for inputs that aren’t
actually Stata files (e.g., GitHub HTML “blob” pages). The intent of
this PR is to make that message unambiguous without changing any I/O
behavior.

What this changes

  • Replace the _version_error message in pandas/io/stata.py with a
    clearer wording that covers both cases:
    • the input is not a valid Stata dataset, or
    • it is a Stata dataset of a version pandas does not support.

No parsing logic changes; only user-facing text.

Tests

Updated/added assertions in pandas/tests/io/test_stata.py to match the
new wording (no behavioral expectations changed):

  • test_stata_v117_prefix_with_unsupported_version_raises_version_error

Backward compatibility

  • No API/behavior changes.
  • Only error text has changed.

Checklist

  • Tests updated/passing locally
  • Pre-commit checks pass
  • User-visible text clarified; no docs/whatsnew needed (happy to add
    a short whatsnew note if maintainers prefer)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant