Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add HELIOS as a new non-standard vocabulary - without mappings #1089

Open
DrUrwin opened this issue Jan 10, 2025 · 8 comments
Open

Add HELIOS as a new non-standard vocabulary - without mappings #1089

DrUrwin opened this issue Jan 10, 2025 · 8 comments

Comments

@DrUrwin
Copy link

DrUrwin commented Jan 10, 2025

HELIOS

The HELIOS (Health for Life in Singapore) Study (https://www.ntu.edu.sg/helios) (https://www.healthforlife.sg/) is a state-of-the-art prospective cohort study, established and led by LKCMedicine, and involving both National Healthcare Group (NGH) and Imperial College London. The plan is to identify the genetic and environmental factors that underpin development of obesity, diabetes, cardiovascular disease and other complex diseases in Singapore. The ultimate goal is to use the knowledge generated to develop new approaches for prediction, prevention, early detection and better treatment of these chronic diseases.

The target is to study 10,000 Singaporeans from the three main groups (Chinese, Malay, and Indian) is now well underway. For this study, at the baseline visit, we collect comprehensive, high quality phenotypic information from each participant, comprising health and lifestyle questionnaires, physical measurements, and extensive physiological and imaging data. In addition, biological samples are collected and a panel of biological markers is measured in the blood. Participants will then be followed up long-term to identify changes in health status, including new onset diseases.

The unique and extensive phenotypic measurements collected, together with biological samples and longterm follow-up, will enable investigation of the complex interrelationships between environmental, lifestyle and genetic factors on subsequent disease risk. The HELIOS study provides a powerful resource for medical research across a wide range of disciplines that will be accessible to biomedical researchers worldwide. Now contributing to a bigger national agenda through the SG100K.

HELIOS Non-Standard Vocabulary

We propose to add the HELIOS vocabulary as a non-standard OMOP vocabulary without relationship mappings. The HELIOS dataset represents a large amount of data attributes that are very difficult to represent using the currently existing standard and non-standard OMOP vocabulary concepts. Thus, creating a comprehensive, precise and accurate HELIOS OMOP representation is currently challenging.

The aim of the HELIOS non-standard vocabulary is to provide a formal controlled non-standard vocabulary that can be mapped to for the HELIOS dataset. The intention is to develop a full HELIOS OMOP database that is discoverable, accessible and reusable both within Singapore and internationally.

We propose the following approach to the new HELIOS non-standard vocabulary:
1: Add the HELIOS non-standard vocabulary without mappings;
2: Over time, iteratively map and update the HELIOS vocabulary to standard OMOP concepts.

The effort is lead by Nanyang Technological University (Professor John Chambers, Dr Theresia Handayani Mina, Dr Xiaoyan Wang and Joann Tan Shi Ling) in association with the University of Nottingham (Professor Phil Quinlan and Dr Esmond Urwin). We look forward to working with the OHDSI vocabulary team and working group, thank you in advance for your time and effort, it is appreciated.

@DrUrwin
Copy link
Author

DrUrwin commented Jan 10, 2025

The Template 4 file for the HELIOS non-standard vocabulary has been uploaded here:

https://docs.google.com/spreadsheets/d/1QiqOWpZ0Zrz5ML8Vv0Na90VrvloosAVV/edit?usp=drive_link&ouid=113956199466318507399&rtpof=true&sd=true

File Name: 1089_template_4_adding_vocabulary_HELIOS_20250110.xlsx

@cgreich
Copy link
Contributor

cgreich commented Jan 10, 2025

Friends:

Very nice. Couple points/questions. But if this is Christian asking stupid questions in the 11th hour feel free to ignore. :-)

Looks like there are two additions, one is about atypical visits, and the other is a survey with question-answer pairs. Is that correct?

  1. All those new "visits" to me look like they are NHS based, ie. from the UK. What have they got to do with Singapore? Or do you want to adopt those?
  2. I wouldn't put them into the Visit domain, standard or not. For example, "Penal establishment or police station" is not an interaction with the healthcare system, and therefore not a visit. But it is a perfectly good factor influencing health outcomes. Why not creating standard Observation concepts? Non-standard concepts really should not get a lot of knowledge engineering attention.
  3. The survey ones: I see you pre-coordinated the answers with the question (e.g. ""Cannot get to sleep within 30 minutes: Less than once a week"). Nice. But why do you need the question as a stand-alone concept ("Cannot get to sleep within 30 minutes")?
  4. Reconsider the Flavors of NULL, and if you really need them: "Cannot get to sleep within 30 minutes: None". If folks always sleep within 30 minutes - good for them. No need to have a record. Enjoy your life.

@DrUrwin
Copy link
Author

DrUrwin commented Jan 13, 2025

Dear @cgreich and Vocab Team,

Thanks for the reply, if I may answer the points raised:

  • Point 1: This is in relation to the 'example' sheets that are part of Template 4. Solution = I have removed the 'example' sheets from the Vocabulary Spreadsheet.
  • Point 2: This is in relation to the 'example' sheets that are part of Template 4. Solution = I have removed the 'example' sheets from the Vocabulary Spreadsheet.
  • Point 3: We would like to have questions as stand-alone concepts so that we may represent data for discoverability purposes. Thus if people query OMOP datasets internationally for who has what where, rather than returning query results relating to answers, we would like to allow results be returned relating to the questions that have been asked by those datasets, i.e., dataset x has results relating to the OMOP Concept ID 'question Y'.
  • Point 4: We would like the vocabulary to contain concepts that represent negative answers. The point of this is to be able to affirm that people have indeed answered no / negative to a question, rather that assume that a null represents a negative answer (even though this might be missing data).

As ever, many thanks in advance.

@TinyRickC137
Copy link
Contributor

Dear DrUrwin,

Thank you for your contribution - it looks like a great work!

We will need to dive deeper into the content. The only thing that we have to mention is that, unfortunately, the doors for the February release are already closed. We stop accepting submissions two months prior to the release date. This time is allocated for integrating the vocabularies and the final QA-QC process.

Maybe you could come to the vocabulary WG and discuss questions with @cgreich there? Tagging WG lead @aostropolets

@cgreich
Copy link
Contributor

cgreich commented Jan 13, 2025

@DrUrwin:

I have removed the 'example' sheets from the Vocabulary Spreadsheet

Meaning, this is not relevant to the what you need, correct?

for discoverability purposes.

Not sure you need that. You have the pre-coordinated question-answer pairs, and you have the vocabulary_id.

rather that assume that a null represents a negative answer

Understood, but that's the nature of RWD. Axiom 1: If something happened you have a record. Axiom 2: If you don't have a record it did not happen. That's no different to myocardial infarction or stage 4 breast cancer: you don't have to assert every day that the patient did not have those.

@DrUrwin
Copy link
Author

DrUrwin commented Jan 17, 2025

Thanks for the great replies.

@TinyRickC137, yes, I will attend the next Vocab WG meeting on the 21st Jan., my plan this year is to attend a lot of the meetings.

@cgreich, yes, those removed sheets are not relevant (thanks). I understand the point of not mapping to something that has not happened - which is what we currently do when mapping to OMOP for a number of research cohorts. However, when a null can represent either a negative response to a question or no/missing data, then that is potentially ambiguous. Would being able to map to a negative response remove the ambiguity and thus nulls would mean missing data (just a thought)? This could be useful for questionnaire related cohort data.

As always thank you and happy to discuss 👍

@cgreich
Copy link
Contributor

cgreich commented Jan 17, 2025

@DrUrwin:

It is potentially ambiguous. But it is the default assumption, with which all methods work. They are based on rates (incidence or prevalence) of things, And their calculation is based on the formula counts_of_records/some_denominator. If we have to start excluding the negative we are screwed. All methods would fail, and if we were to fix them, they would have the performance of a snail.

It's called the Closed World assumption.

In surveys, it is routinely violated. Which means you cannot use standard analytics. You have to build your own method. Do you want to go that route?

@DrUrwin
Copy link
Author

DrUrwin commented Jan 17, 2025

@cgreich, thanks indeed and understood, performance of a snail is not desired I am sure of that, nor all methods failing too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants