Organizing Existing Customized Functionality

Below is an organization of existing customized functionalities of ExplainaBoard SDK.
Something could be discussed:
* Should we disable 2? (the downside is users with customized datasets couldn't define customized features.
* Regarding 4, what would be a good way to introduce customized functions?


## 1. Customized Dataset
* users are allowed to analyze their system outputs on their customized datasets by specifying their formats (e.g.., tsv, json)
* Example
```
        loader = get_loader_class(TaskType.text_classification)(
            load_file_as_str(self.tsv_dataset),
            load_file_as_str(self.txt_output),
            Source.in_memory,
            Source.in_memory,
            FileType.tsv,
            FileType.text,
        )
```

## 2. Customized features through `system_output` file
* users are allowed to define and provide the value of customized features by specifying some information in their system output files
* This can work for all data loaders
* Example: a [system output file](https://github.com/neulab/ExplainaBoard/blob/main/data/system_outputs/fb15k-237/test-kg-prediction-user-defined.json) with customized features
```
{
  "metadata": {
    "custom_features": {
      "rel_type": {
        "dtype": "string",
        "description": "symmetric or asymmetric",
        "num_buckets": 2
      }
    }
  },
  "examples": [
    {
      "gold_head": "/m/08966",
      "gold_predicate": "/travel/travel_destination/climate./travel/travel_destination_monthly_climate/month",
      "gold_tail": "/m/05lf_",
      "predict": "tail",
      "predictions": [
        "/m/05lf_",
        "/m/02x_y",
        "/m/01nv4h",
        "/m/02l6h",
        "/m/0kz1h"
      ],
      "rel_type": "asymmetric",
      "example_id": "1"
    },
  ...
```

## 3. Customized features through an additional config file
* user can introduce customized features by specifying the feature name in an external global [config file](https://github.com/neulab/ExplainaBoard/blob/main/explainaboard/resources/dataset_custom_features.json).
* Note that this is only valid when DataLab loader is used
* Example
```
{
  "sst2": {
    "custom_features": {
      "example": {
        "label": {
          "dtype": "string",
          "description": "the true label"
        }
      }
    },
...
}
```

## 4. Customized feature function through an additional config file
* user can introduce customized feature functions by specifying the string-style function in an external global [config file](https://github.com/neulab/ExplainaBoard/blob/add_customized_feature_function/explainaboard/resources/dataset_custom_features.json).
* Note that this is only valid when DataLab loader is used
* Example
```
{
  "sst2": {
    "label": {
      "dtype": "string",
      "description": "the true label",
      "num_buckets": 2
    },
    "text_len": {
      "dtype": "float",
      "description": "text length",
      "num_buckets": 4,
      "func": "lambda x:len(x['text'].split())"
    },
    "long_text": {
      "dtype": "string",
      "description": "whether a text is long",
      "num_buckets": 2,
      "func": "lambda x:'Long Text' if len(x['text'].split()) > 20 else 'Short Text'"
    }
...
}
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Organizing Existing Customized Functionality #388

1. Customized Dataset

2. Customized features through `system_output` file

3. Customized features through an additional config file

4. Customized feature function through an additional config file

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Organizing Existing Customized Functionality #388

Description

1. Customized Dataset

2. Customized features through system_output file

3. Customized features through an additional config file

4. Customized feature function through an additional config file

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

2. Customized features through `system_output` file