Skip to content

File Format: Clinical Data

Mark Keller edited this page Feb 1, 2019 · 1 revision

A clinical data file contains rows corresponding to patients and columns corresponding to clinical variables (as specified in the clinical metadata file).

The following columns are required:

  • Patient: The patient ID.

The following columns are "special" if contained in a clinical file:

  • Days to Death, Days to Last Followup: Used to construct kaplan-meier plots
  • ICD-O-3 Site Code, ICD-O-3 Site Description: If a code and description are both present for a patient, the description will be appended to the site code in parentheses (e.g. if the Code value is C67.4 and the Description value is URINARY BLADDER, the resulting Code value will be C67.4 (URINARY BLADDER)).
  • ICD-O-3 Histology Code, ICD-O-3 Histology Description: same behavior as above

Unknown values should be left as empty cells or as cells containing the string nan.

An example clinical file containing variables for Age, Sex, and Smoking might look as follows:

Patient Age Sex Smoking
TCGA-2F-A901 89 Female Non-smoker
TCGA-2F-A902 14 Male nan
TCGA-2G-B854 Female Smoker
... ... ... ...