Open
Description
Below is an organization of existing customized functionalities of ExplainaBoard SDK.
Something could be discussed:
- Should we disable 2? (the downside is users with customized datasets couldn't define customized features.
- Regarding 4, what would be a good way to introduce customized functions?
1. Customized Dataset
- users are allowed to analyze their system outputs on their customized datasets by specifying their formats (e.g.., tsv, json)
- Example
loader = get_loader_class(TaskType.text_classification)(
load_file_as_str(self.tsv_dataset),
load_file_as_str(self.txt_output),
Source.in_memory,
Source.in_memory,
FileType.tsv,
FileType.text,
)
2. Customized features through system_output
file
- users are allowed to define and provide the value of customized features by specifying some information in their system output files
- This can work for all data loaders
- Example: a system output file with customized features
{
"metadata": {
"custom_features": {
"rel_type": {
"dtype": "string",
"description": "symmetric or asymmetric",
"num_buckets": 2
}
}
},
"examples": [
{
"gold_head": "/m/08966",
"gold_predicate": "/travel/travel_destination/climate./travel/travel_destination_monthly_climate/month",
"gold_tail": "/m/05lf_",
"predict": "tail",
"predictions": [
"/m/05lf_",
"/m/02x_y",
"/m/01nv4h",
"/m/02l6h",
"/m/0kz1h"
],
"rel_type": "asymmetric",
"example_id": "1"
},
...
3. Customized features through an additional config file
- user can introduce customized features by specifying the feature name in an external global config file.
- Note that this is only valid when DataLab loader is used
- Example
{
"sst2": {
"custom_features": {
"example": {
"label": {
"dtype": "string",
"description": "the true label"
}
}
},
...
}
4. Customized feature function through an additional config file
- user can introduce customized feature functions by specifying the string-style function in an external global config file.
- Note that this is only valid when DataLab loader is used
- Example
{
"sst2": {
"label": {
"dtype": "string",
"description": "the true label",
"num_buckets": 2
},
"text_len": {
"dtype": "float",
"description": "text length",
"num_buckets": 4,
"func": "lambda x:len(x['text'].split())"
},
"long_text": {
"dtype": "string",
"description": "whether a text is long",
"num_buckets": 2,
"func": "lambda x:'Long Text' if len(x['text'].split()) > 20 else 'Short Text'"
}
...
}