Skip to content

Latest commit

 

History

History
135 lines (96 loc) · 6.38 KB

0001.csharp_plain_text_input.md

File metadata and controls

135 lines (96 loc) · 6.38 KB

Plain text input parsing: the flaw that reduces C# and .NET popularity

In competitive programming contests, interviews, or when solving problems from archives for learning we are facing automatic judging systems. These systems accept code on some supported language, then compile and run it against many test cases.

Although modern versions of C# are usually supported on these systems, I prefer to use C++, Java, or Kotlin for solving algorithmic problems. Why? Because judging systems provide test cases as plain text to STDIN and C# doesn't have a good way to read and parse it.

The Flaw

Let's look at example. It is "A+B problem", demo problem used to get familiar with judging system.

Statement:
Calculate A + B (|A|, |B| <= 10^5)

Input:
The single line contains two integers, A and B, separated by one space

Output:
A+B value

Notes:
Use + operator

It's simple, isn't it?

Let's compare solutions for this problem in different programming languages.

C:

#include <stdio.h>

int main()
{
    int a, b;
    scanf("%d %d", &a, &b);
    printf("%d", a + b);
}

C++:

#include <iostream>
using namespace std;

int main()
{
    int a, b;
    cin >> a >> b;
    cout << a + b;
}

Kotlin (Java solution similar to it, but much longer):

import java.util.*

fun main() {
    val sc = Scanner(System.`in`)
    val a = sc.nextInt()
    val b = sc.nextInt()
    print(a + b)
}

C# with Top-level statements and Global usings features:

var numbers = Console.ReadLine().Split();
var a = int.Parse(numbers[0]);
var b = int.Parse(numbers[1]);
Console.WriteLine(a + b);

Or, as C# one-liner:

Console.WriteLine(Console.ReadLine().Split().Select(int.Parse).Sum());

As we can see, C, C++, and Kotlin solutions just read two integers from the input. Instead, C# program should .ReadLine() the single line of the input, then .Split() it by whitespace character, then .Parse each part of the line to target type int.

So, when many other languages allow taking the input as a sequence of tokens of the given types, in C# we are forced to take the input as a string or char, then parse it manually. There's no simple abstraction that hides parsing complexity from a programmer in the .NET world.

I prefer the C++ solution for this problem because the code in the main method is way shorter than others, only 32 characters without unnecessary whitespaces, and also clear to read. C# one-liner has 70 characters in total. For real, non-demo problems, C++ often is much harder to debug.

Why manual parsing is a disadvantage?

Flexibility to format changes

Let's go back to "A+B problem" and change the input format slightly.

Input:
- A single line contains two integers, A and B, separated by one space
+ A two lines, each line containing one integer

C, C++, and Kotlin solutions remain unchanged. But the C# solution needs to be rewritten to support the new input format.

Console.WriteLine(int.Parse(Console.ReadLine()) + int.Parse(Console.ReadLine()));

In this simple case, it is possible to write a C# solution that supports both input formats via the Console.ReadToEnd() method, but it is inconvenient when the input is more complex.

Mental complexity

When solving algorithmic problems you want to focus on an algorithm, and the correctness of the implementation, but not on the input formats, line endings, invisible Unicode whitespace characters, and other quirks. It's much less painful when it is handled by a bunch of prewritten code, ideally from the standard library.

Also, in programming contests, you want to save time, because the speed of problem-solving affects the final score.

Performance

Obviously, if we write int.Parse(Console.ReadLine()), an intermediate string object will be allocated. This allocation takes some time and memory. Then if the garbage collector will be triggered, it will also take some time to process and clean up this object.

If the line contains more than one integer, an intermediate int[] array will be allocated in the .Split() method, with the same consequences.

There are also other performance problems with plain text IO, which are discussed in the next post.

Underestimation of this flaw

This disadvantage isn't dedicated to C# and other .NET languages. E.g. Python and Rust use the same approach for the plain text input reading. And it is not a disadvantage that we are facing in day-to-day work as software developers, because in production cases plain text input is very rare in comparison with JSON or other structured data formats.

For console applications, System.CommandLine or another argument/input parsing library could be used instead. But judging systems limits us to standard library features and standardized plain text input format. And it's impossible to just install some fancy NuGet package in these systems.

But why is this important at all?

  • Solving algorithmic problems and participating in programming contests are great ways to learn new things and improve your programming skills. The ability to flawlessly use familiar programming language greatly simplifies it, especially for beginners.
  • This feature could help to promote the language among developers as a tool for solving algorithmic problems on automatic judging systems.
  • For newbies who learn programming during solving algorithmic problems, this will help to choose C# and .NET instead of another language and platform.

What to do with it?

There is already .NET API Proposal that describes this issue. This proposal is far from well-written but contains a discussion about the usefulness of this feature and design decisions. Most likely, if this feature will be implemented, it will look like Java Scanner class.

If you also find this feature useful — please upvote the GitHub issue!

Announce

In the next posts we will discuss:

  • A performance problems with plain text IO
  • Efficient plain text input reading implementation