Skip to content

extra/dashboard/app.py: DataFrame component sorts numeric columns lexicographically instead of numerically when values are formatted as strings #24

Open
@cyril23

Description

@cyril23

Bug Description

The Gradio DataFrame component sorts numeric columns lexicographically (as strings) instead of numerically when the underlying data contains string representations of numbers. This leads to incorrect sorting behavior where values like 114.42, 1785.64, 181.58, 1816.62, 182.33 are sorted in that order instead of the correct numerical order.

Steps to Reproduce

  1. Run a benchmark, and display the dashboard. E.g. put these results into the folder: 1.json, 2.json, 3.json
python3 -m venv venv
source venv/bin/activate
pip install click gradio pandas pyarrow
python extra/dashboard/app.py --from-results-dir /path/to/my/results/
  1. Try to sort any of the numeric columns by clicking the column header
  2. Observe that sorting follows lexicographical order instead of numerical order

Expected Behavior

Numeric columns should be sorted numerically, regardless of their display formatting.

Image

Actual Behavior

Numeric columns are sorted lexicographically when they contain string representations of numbers, leading to incorrect ordering like:

Image

Root Cause

The issue occurs when numeric data is converted to formatted strings for display purposes, intended to control decimal precision. The DataFrame component then uses these string representations for sorting operations.

Proposed Solution

Ensure that numeric columns maintain their proper data types (int/float) in the underlying DataFrame before being passed to the gr.DataFrame component. Use pandas' .round() method instead of string formatting to control decimal precision while preserving numeric types.

Before (causes lexicographical sorting):

# This converts numbers to strings, breaking numerical sorting
data[metric] = data[metric].apply(lambda x: f"{x:.2f}")

After (maintains numerical sorting):

# Ensure proper numeric types
data[col] = pd.to_numeric(data[col], errors='coerce')
# Use rounding instead of string formatting
data[col] = data[col].round(2)

Alternative Solutions

Alternative approaches like using pandas Styler objects may provide better formatting control but can disable sorting functionality entirely in some cases (at least I couldn't get it to work but I have limited python experiences), making the rounding approach a more practical solution for production use.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions