Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support TSV as a "dialect" of CSV #4

Merged
merged 3 commits into from
Feb 29, 2020
Merged

Support TSV as a "dialect" of CSV #4

merged 3 commits into from
Feb 29, 2020

Conversation

tsibley
Copy link
Contributor

@tsibley tsibley commented Feb 20, 2020

Nice tool, I just want to use it on TSV too. See commit messages for details. :-)

The csv module documentation says this is necessary in some cases, such
as for parsing embedded newlines in quoted fields.  Refer to the
<https://docs.python.org/3/library/csv.html#id3>.
For seekable streams, the delimiter is sniffed from the first 1MB of
data.  This should provide enough rows to the sniffer even for datasets
with very long rows without blowing up memory usage much.

A csv.Dialect may also be specified directly to load_csv() for
programmatic usage.
Useful when you want to disable sniffing or when one or both of the
files aren't seekable, so sniffing doesn't work.
@simonw
Copy link
Owner

simonw commented Feb 29, 2020

I love this improvement - thanks for teaching me about the csv.Sniffer mechanism!

@simonw simonw merged commit 140fe0d into simonw:master Feb 29, 2020
simonw added a commit that referenced this pull request Feb 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants