Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Organizing Existing Customized Functionality #388

Open
pfliu-nlp opened this issue Aug 18, 2022 · 1 comment
Open

Organizing Existing Customized Functionality #388

pfliu-nlp opened this issue Aug 18, 2022 · 1 comment
Labels
question Further information is requested

Comments

@pfliu-nlp
Copy link
Collaborator

Below is an organization of existing customized functionalities of ExplainaBoard SDK.
Something could be discussed:

  • Should we disable 2? (the downside is users with customized datasets couldn't define customized features.
  • Regarding 4, what would be a good way to introduce customized functions?

1. Customized Dataset

  • users are allowed to analyze their system outputs on their customized datasets by specifying their formats (e.g.., tsv, json)
  • Example
        loader = get_loader_class(TaskType.text_classification)(
            load_file_as_str(self.tsv_dataset),
            load_file_as_str(self.txt_output),
            Source.in_memory,
            Source.in_memory,
            FileType.tsv,
            FileType.text,
        )

2. Customized features through system_output file

  • users are allowed to define and provide the value of customized features by specifying some information in their system output files
  • This can work for all data loaders
  • Example: a system output file with customized features
{
  "metadata": {
    "custom_features": {
      "rel_type": {
        "dtype": "string",
        "description": "symmetric or asymmetric",
        "num_buckets": 2
      }
    }
  },
  "examples": [
    {
      "gold_head": "/m/08966",
      "gold_predicate": "/travel/travel_destination/climate./travel/travel_destination_monthly_climate/month",
      "gold_tail": "/m/05lf_",
      "predict": "tail",
      "predictions": [
        "/m/05lf_",
        "/m/02x_y",
        "/m/01nv4h",
        "/m/02l6h",
        "/m/0kz1h"
      ],
      "rel_type": "asymmetric",
      "example_id": "1"
    },
  ...

3. Customized features through an additional config file

  • user can introduce customized features by specifying the feature name in an external global config file.
  • Note that this is only valid when DataLab loader is used
  • Example
{
  "sst2": {
    "custom_features": {
      "example": {
        "label": {
          "dtype": "string",
          "description": "the true label"
        }
      }
    },
...
}

4. Customized feature function through an additional config file

  • user can introduce customized feature functions by specifying the string-style function in an external global config file.
  • Note that this is only valid when DataLab loader is used
  • Example
{
  "sst2": {
    "label": {
      "dtype": "string",
      "description": "the true label",
      "num_buckets": 2
    },
    "text_len": {
      "dtype": "float",
      "description": "text length",
      "num_buckets": 4,
      "func": "lambda x:len(x['text'].split())"
    },
    "long_text": {
      "dtype": "string",
      "description": "whether a text is long",
      "num_buckets": 2,
      "func": "lambda x:'Long Text' if len(x['text'].split()) > 20 else 'Short Text'"
    }
...
}
@pfliu-nlp pfliu-nlp added the question Further information is requested label Aug 18, 2022
@neubig
Copy link
Contributor

neubig commented Aug 19, 2022

Thanks @pfliu-nlp ! I created a design doc outlining some of the user stories we need to consider and some design options, let's discuss: https://docs.google.com/document/d/1lX74OrTWEjQT3A6RSandRMNKVjvfaqu-ZVl68sj17oE/edit#heading=h.j0odx9syhan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants