Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modify unit test and validation #63

Merged
merged 81 commits into from
Jul 25, 2024

Conversation

khaledk2
Copy link
Collaborator

@khaledk2 khaledk2 commented Nov 7, 2022

This PR adds restoring the database from a file and indexing the database to the GitHub actions.
It will restore an Omero database from backup
Indexing the data
Run the unit tests, I have added some queries to validate the results it will test:

  • equals
  • not_equals
  • contains
  • not_contrains
  • in
  • not_in
  • query to return the result for a specific owner
  • query to return the result for a specific group
  • query to return the result for a specific owner and group
  • search for any key
  • available values_for a key

This PR depended on PR #59.

This PR includes also a method to check key-value pairs for tailing and heading space. It also checks the key-value pair duplication, i.e. the image has more than one identical key with the same value.
It can check all the projects and screens. Also, it can run for a specific project or screen.
The output is CSV files; each check usually generates three files. The main file contains image details (e.g. image id) the screen or project name, the key and the value. File for screens and one for projects. Each file contains the screen name (project name), the key-value which has the issue and the total number of affected images for each row.
The search engine saves the output files at the /data/searchengine/searchengine/ folder .

Examples:
This command will check the database for the projects whose names contains idr0118-keenan-flylightsheet:

sudo docker run --rm -v /data/searchengine/searchengine/:/etc/searchengine/ -v /data/searchengine/searchengine/logs/:/opt/app-root/src/logs/ --network=searchengine-net khaledk2/searchengine:latest data_validator -p idr0140-ho-stressresponse -p idr0118-keenan-flylightsheet

The following command will check screens their name contain idr0093-mueller-perturbation

sudo docker run --rm -v /data/searchengine/searchengine/:/etc/searchengine/ -v /data/searchengine/searchengine/logs/:/opt/app-root/src/logs/ --network=searchengine-net khaledk2/searchengine:latest data_validator -s idr0093-mueller-perturbation

This command is another command to check the database for the projects whose names contains idr0043-uhlen-humanproteinatlas:

sudo docker run --rm -v /data/searchengine/searchengine/:/etc/searchengine/ -v /data/searchengine/searchengine/logs/:/opt/app-root/src/logs/ --network=searchengine-net khaledk2/searchengine:latest data_validator -p idr0043-uhlen-humanproteinatlas

Finally, the following command will check all the projects and screens inside the database:

sudo docker run --rm -v /data/searchengine/searchengine/:/etc/searchengine/ -v /data/searchengine/searchengine/logs/:/opt/app-root/src/logs/ --network=searchengine-net khaledk2/searchengine:latest data_validator

@jburel
Copy link
Member

jburel commented Nov 28, 2022

This will be reviewed after #61
We first need to have the multi-nodes in place to make sure it is working before adding new operators

@khaledk2 khaledk2 force-pushed the modify_unit_test_validation branch from d8e71a6 to 6146d5a Compare February 10, 2023 21:43
@jburel jburel requested review from dominikl and jburel March 20, 2023 10:10
@khaledk2
Copy link
Collaborator Author

@jburel I have deployed this branch on Idr-testing (merged locally with the rocky Linux branch). Could you please have a look? I think this one has been around for a while, it improved the test and added a test for indexing.

@joshmoore
Copy link
Member

A heads up, @khaledk2, that the 1.7 GB of app_data/ in this PR (a) causes GitHub to be unhappy with the diff and (b) will be something that we will need to carry for with us for the foreseeable future.

@joshmoore
Copy link
Member

Thanks, @khaledk2. GH diff now renders! 🎉

.github/workflows/main.yml Outdated Show resolved Hide resolved
.github/workflows/main.yml Outdated Show resolved Hide resolved
wget https://downloads.openmicroscopy.org/images/omero_db_searchengine.zip -P app_data
unzip app_data/omero_db_searchengine.zip -d app_data/
#cat app_data/backup_parts/backupa? > app_data/omero.pgdump
#run restore omero database
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#run restore omero database
# run restore omero database

.github/workflows/main.yml Outdated Show resolved Hide resolved
examples/using_in_operator.py Outdated Show resolved Hide resolved
unit_tests/__init__.py Outdated Show resolved Hide resolved
unit_tests/test_app.py Outdated Show resolved Hide resolved
unit_tests/test_app.py Outdated Show resolved Hide resolved
value = case[1]
validator = Validator(deep_check)
validator.set_simple_query(resource, name, value)
validator.get_results_postgres("equals")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to be adjusted after method renaming

unit_tests/test_data.py Outdated Show resolved Hide resolved
examples/using_not_in_operator.py Outdated Show resolved Hide resolved
manage.py Outdated
@@ -148,7 +160,8 @@ def get_index_data_from_database(resource="all"):
test_indexing_search_query(deep_check=False, check_studies=True)

# backup the index data
backup_elasticsearch_data()
if not nobackup:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double negative is a bit strange
should the variable be called backup instead

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is possible to fix this double negative?

manage.py Outdated Show resolved Hide resolved
manage.py Outdated
It also checks the key-value pair duplication.
It can check all the projects and screens.
Also, it can run for a specific project or screen.
The output is a collection of CSV files; each check usually generates three files:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The output is a collection of CSV files; each check usually generates three files:
The output is a collection of CSV files; each check usually generates three files:

df.groupby(["screen_name", "name", "value"])
.size()
.reset_index()
.rename(columns={0: "no of images"})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.rename(columns={0: "no of images"})
.rename(columns={0: "number of images"})

manage.py Outdated
@@ -148,7 +160,8 @@ def get_index_data_from_database(resource="all"):
test_indexing_search_query(deep_check=False, check_studies=True)

# backup the index data
backup_elasticsearch_data()
if not nobackup:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is possible to fix this double negative?


def restore_database():
"""
restote the database from a database dump file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This typo suggestion was missed

)
tail_space_results = conn.execute_query(sql_statment)
if len(tail_space_results) == 0:
search_omero_app.logger.info("No results is availlable for trailing space")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
search_omero_app.logger.info("No results is availlable for trailing space")
search_omero_app.logger.info("No results available for trailing space")

search_omero_app.logger.info("No results is availlable for trailing space")
return
search_omero_app.logger.info("Generate for trailing space ...")
genrate_reports(tail_space_results, "tailing_space", screen_name, project_name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
genrate_reports(tail_space_results, "tailing_space", screen_name, project_name)
genrate_reports(tail_space_results, "trailing_space", screen_name, project_name)



def check_duplicated_keyvalue_pairs(screen_name, project_name):
search_omero_app.logger.info("Checking for duplicated key-value pairs ...")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
search_omero_app.logger.info("Checking for duplicated key-value pairs ...")
search_omero_app.logger.info("Checking for duplicated key-value pairs...")

if not os.path.isdir(base_folder):
base_folder = os.path.expanduser("~")
if write_report:
base_folder = "/etc/searchengine/"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The location of the base folder is hard coded in a few places.
I think we need to use a configuration variable

query_in,
images_keys,
images_value_parts,
contains_not_contains_quries,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
contains_not_contains_quries,
contains_not_contains_queries,

validator.searchengine_results.get("total_number_of_buckets"),
)

def test_contains_not_contains_quries(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def test_contains_not_contains_quries(self):
def test_contains_not_contains_queries(self):

)

def test_contains_not_contains_quries(self):
for resource, cases in contains_not_contains_quries.items():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for resource, cases in contains_not_contains_quries.items():
for resource, cases in contains_not_contains_queries.items():

unit_tests/test_data.py Outdated Show resolved Hide resolved
@jburel
Copy link
Member

jburel commented Jul 23, 2024

Running

sudo docker run --rm  -v /data/searchengine/searchengine/:/etc/searchengine/  -v /data/searchengine/searchengine/logs/:/opt/app-root/src/logs/  --network=searchengine-net khaledk2/searchengine:latest data_validator -p idr0140-ho-stressresponse -p idr0118-keenan-flylightsheet

leads to some syntax error

data_validator -p idr0140-ho-stressresponse -p idr0118-keenan-flylightsheet
run_app.sh: line 12: [: syntax error: `-p' unexpected
data_validator -p idr0140-ho-stressresponse -p idr0118-keenan-flylightsheet
run_app.sh: line 12: [: syntax error: `-p' unexpected
/searchengine/configurations/configuration.py:30: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  cofg = yaml.load(f)
/searchengine/configurations/configuration.py:30: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  cofg = yaml.load(f)
/usr/local/lib/python3.9/site-packages/elasticsearch/connection/http_urllib3.py:209: UserWarning: Connecting to https://10.11.0.2:9200 using SSL with verify_certs=False is insecure.
  warnings.warn(
/usr/local/lib/python3.9/site-packages/elasticsearch/connection/http_urllib3.py:209: UserWarning: Connecting to https://10.11.0.3:9200 using SSL with verify_certs=False is insecure.
  warnings.warn(
/usr/local/lib/python3.9/site-packages/elasticsearch/connection/http_urllib3.py:209: UserWarning: Connecting to https://10.11.0.4:9200 using SSL with verify_certs=False is insecure.
  warnings.warn(
[2024-07-23 13:59:33,936] INFO in __init__: app assistant startup
Injecting config variables from :/etc/searchengine/.app_config.yml
'bool' object has no attribute 'read'
Injecting config variables from :/etc/searchengine/.app_config.yml
'bool' object has no attribute 'read'
INFO:numexpr.utils:Note: NumExpr detected 16 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
INFO:numexpr.utils:NumExpr defaulting to 8 threads.
usage: manage.py [-?]
                 {show_saved_indices,delete_es_index,delete_all_data_from_es_index,add_resource_data_to_es_index,create_index,get_index_data_from_database,set_database_configuration,set_elasticsearch_configuration,set_elasticsearch_password,set_verify_certs,set_cache_folder,set_elasticsearch_backup_folder,set_idr_test_file,set_cache_rows_number,set_searchengine_secret_key,set_max_page,set_no_processes,cache_key_value_index,test_indexing_search_query,backup_elasticsearch_data,restore_elasticsearch_data,test_container_key_value,get_search_terms_froget_all_indexes_from_elasticsearchm_log,update_container_no_of_images,shell,runserver}
                 ...
manage.py: error: argument {show_saved_indices,delete_es_index,delete_all_data_from_es_index,add_resource_data_to_es_index,create_index,get_index_data_from_database,set_database_configuration,set_elasticsearch_configuration,set_elasticsearch_password,set_verify_certs,set_cache_folder,set_elasticsearch_backup_folder,set_idr_test_file,set_cache_rows_number,set_searchengine_secret_key,set_max_page,set_no_processes,cache_key_value_index,test_indexing_search_query,backup_elasticsearch_data,restore_elasticsearch_data,test_container_key_value,get_search_terms_froget_all_indexes_from_elasticsearchm_log,update_container_no_of_images,shell,runserver}: invalid choice: 'data_validator' (choose from 'show_saved_indices', 'delete_es_index', 'delete_all_data_from_es_index', 'add_resource_data_to_es_index', 'create_index', 'get_index_data_from_database', 'set_database_configuration', 'set_elasticsearch_configuration', 'set_elasticsearch_password', 'set_verify_certs', 'set_cache_folder', 'set_elasticsearch_backup_folder', 'set_idr_test_file', 'set_cache_rows_number', 'set_searchengine_secret_key', 'set_max_page', 'set_no_processes', 'cache_key_value_index', 'test_indexing_search_query', 'backup_elasticsearch_data', 'restore_elasticsearch_data', 'test_container_key_value', 'get_search_terms_froget_all_indexes_from_elasticsearchm_log', 'update_container_no_of_images', 'shell', 'runserver')

@jburel
Copy link
Member

jburel commented Jul 23, 2024

 sudo docker run --rm  -v /data/searchengine/searchengine/:/etc/searchengine/  -v /data/searchengine/searchengine/logs/:/opt/app-root/src/logs/  --network=searchengine-net khaledk2/searchengine:latest data_validator -s idr0093-mueller-perturbation
data_validator -s idr0093-mueller-perturbation
data_validator -s idr0093-mueller-perturbation
run_app.sh: line 12: [: syntax error: `-s' unexpected
run_app.sh: line 12: [: syntax error: `-s' unexpected
/searchengine/configurations/configuration.py:30: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  cofg = yaml.load(f)
/searchengine/configurations/configuration.py:30: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  cofg = yaml.load(f)
/usr/local/lib/python3.9/site-packages/elasticsearch/connection/http_urllib3.py:209: UserWarning: Connecting to https://10.11.0.2:9200 using SSL with verify_certs=False is insecure.
  warnings.warn(
/usr/local/lib/python3.9/site-packages/elasticsearch/connection/http_urllib3.py:209: UserWarning: Connecting to https://10.11.0.3:9200 using SSL with verify_certs=False is insecure.
  warnings.warn(
/usr/local/lib/python3.9/site-packages/elasticsearch/connection/http_urllib3.py:209: UserWarning: Connecting to https://10.11.0.4:9200 using SSL with verify_certs=False is insecure.
  warnings.warn(
[2024-07-23 14:08:50,418] INFO in __init__: app assistant startup
Injecting config variables from :/etc/searchengine/.app_config.yml
'bool' object has no attribute 'read'
Injecting config variables from :/etc/searchengine/.app_config.yml
'bool' object has no attribute 'read'
INFO:numexpr.utils:Note: NumExpr detected 16 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
INFO:numexpr.utils:NumExpr defaulting to 8 threads.
usage: manage.py [-?]
                 {show_saved_indices,delete_es_index,delete_all_data_from_es_index,add_resource_data_to_es_index,create_index,get_index_data_from_database,set_database_configuration,set_elasticsearch_configuration,set_elasticsearch_password,set_verify_certs,set_cache_folder,set_elasticsearch_backup_folder,set_idr_test_file,set_cache_rows_number,set_searchengine_secret_key,set_max_page,set_no_processes,cache_key_value_index,test_indexing_search_query,backup_elasticsearch_data,restore_elasticsearch_data,test_container_key_value,get_search_terms_froget_all_indexes_from_elasticsearchm_log,update_container_no_of_images,shell,runserver}
                 ...
manage.py: error: argument {show_saved_indices,delete_es_index,delete_all_data_from_es_index,add_resource_data_to_es_index,create_index,get_index_data_from_database,set_database_configuration,set_elasticsearch_configuration,set_elasticsearch_password,set_verify_certs,set_cache_folder,set_elasticsearch_backup_folder,set_idr_test_file,set_cache_rows_number,set_searchengine_secret_key,set_max_page,set_no_processes,cache_key_value_index,test_indexing_search_query,backup_elasticsearch_data,restore_elasticsearch_data,test_container_key_value,get_search_terms_froget_all_indexes_from_elasticsearchm_log,update_container_no_of_images,shell,runserver}: invalid choice: 'data_validator' (choose from 'show_saved_indices', 'delete_es_index', 'delete_all_data_from_es_index', 'add_resource_data_to_es_index', 'create_index', 'get_index_data_from_database', 'set_database_configuration', 'set_elasticsearch_configuration', 'set_elasticsearch_password', 'set_verify_certs', 'set_cache_folder', 'set_elasticsearch_backup_folder', 'set_idr_test_file', 'set_cache_rows_number', 'set_searchengine_secret_key', 'set_max_page', 'set_no_processes', 'cache_key_value_index', 'test_indexing_search_query', 'backup_elasticsearch_data', 'restore_elasticsearch_data', 'test_container_key_value', 'get_search_terms_froget_all_indexes_from_elasticsearchm_log', 'update_container_no_of_images', 'shell', 'runserver')

@jburel
Copy link
Member

jburel commented Jul 23, 2024

Syntax error in the bash script

@khaledk2
Copy link
Collaborator Author

khaledk2 commented Jul 23, 2024

Please use this image which contains the changes for this PR
khaledk2/searchengine:local
It does not support checking more than one container at a time, if more than one is provided the latest one will be checked.

@jburel
Copy link
Member

jburel commented Jul 24, 2024

Using khaledk2/searchengine:local`` instead of khaledk2/searchengine:latest``` leads to similar error

sudo docker run --rm  -v /data/searchengine/searchengine/:/etc/searchengine/  -v /data/searchengine/searchengine/logs/:/opt/app-root/src/logs/  --network=searchengine-net khaledk2/searchengine:local data_validator -p idr0140-ho-stressresponse -p idr0118-keenan-flylightsheet
data_validator -p idr0140-ho-stressresponse -p idr0118-keenan-flylightsheet
data_validator -p idr0140-ho-stressresponse -p idr0118-keenan-flylightsheet
run_app.sh: line 12: [: syntax error: `-p' unexpected
run_app.sh: line 12: [: syntax error: `-p' unexpected
/searchengine/configurations/configuration.py:30: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  cofg = yaml.load(f)
/searchengine/configurations/configuration.py:30: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  cofg = yaml.load(f)
/usr/local/lib/python3.9/site-packages/elasticsearch/connection/http_urllib3.py:209: UserWarning: Connecting to https://10.11.0.2:9200 using SSL with verify_certs=False is insecure.
  warnings.warn(
/usr/local/lib/python3.9/site-packages/elasticsearch/connection/http_urllib3.py:209: UserWarning: Connecting to https://10.11.0.3:9200 using SSL with verify_certs=False is insecure.
  warnings.warn(
/usr/local/lib/python3.9/site-packages/elasticsearch/connection/http_urllib3.py:209: UserWarning: Connecting to https://10.11.0.4:9200 using SSL with verify_certs=False is insecure.
  warnings.warn(
[2024-07-24 12:21:07,517] INFO in __init__: app assistant startup
[2024-07-24 12:21:07,866] INFO in omero_keyvalue_data_validator: Checking for trailing space ...
[2024-07-24 12:21:07,948] INFO in omero_keyvalue_data_validator: Generate for trailing space ...
[2024-07-24 12:21:07,966] INFO in omero_keyvalue_data_validator: project_name        idr0118-keenan-flylightsheet/experimentA
name                                                  Strain
value                                              Oregon R 
number of images                                          20
dtype: object
[2024-07-24 12:21:07,967] INFO in omero_keyvalue_data_validator: Checking for heading space ...
[2024-07-24 12:21:08,027] INFO in omero_keyvalue_data_validator: Generate for head space ...
[2024-07-24 12:21:08,042] INFO in omero_keyvalue_data_validator: project_name        idr0118-keenan-flylightsheet/experimentAidr011...
name                Genetic CrossGenetic CrossGenetic CrossGenetic...
value                +;+;His-GFP/MCP-mCherry (virgin female) x (ma...
number of images                                                   20
dtype: object
[2024-07-24 12:21:08,044] INFO in omero_keyvalue_data_validator: Checking for duplicated key-value pairs...
[2024-07-24 12:21:08,085] INFO in omero_keyvalue_data_validator: No results available for duplicated key-value pairs
Injecting config variables from :/etc/searchengine/.app_config.yml
'bool' object has no attribute 'read'
Injecting config variables from :/etc/searchengine/.app_config.yml
'bool' object has no attribute 'read'
start: 2024-07-24 12:21:07.866303, start1: 2024-07-24 12:21:07.967847, start2: 2024-07-24 12:21:08.044018, end: 2024-07-24 12:21:08.086505

@jburel
Copy link
Member

jburel commented Jul 25, 2024

Discussed with @khaledk2 yesterday, the warnings in the bash script are not introduced by this PR.
The testing commands work so the cleaning can happen in a follow-up PR
Merging to the work can move forward

@jburel jburel merged commit 547bead into ome:main Jul 25, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants