Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modify unit test and validation #63

Merged
merged 81 commits into from
Jul 25, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
81 commits
Select commit Hold shift + click to select a range
2199410
Support 'in' and 'not in' operators
khaledk2 Oct 3, 2022
bcd66e1
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 4, 2022
5afd89f
Support 'in' and 'not in' operators
khaledk2 Oct 3, 2022
8a8532c
Add test indexing and queries to unit tests and add validation to key…
khaledk2 Nov 5, 2022
50d8327
Add port number to restore database
khaledk2 Nov 5, 2022
1ec4d94
Update utils.py
khaledk2 Nov 6, 2022
d24b6f5
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 7, 2022
45849be
Add unite test for contains and not contains
khaledk2 Nov 7, 2022
dfc0578
turn on deep check for unit tests
khaledk2 Nov 7, 2022
ef0d7e7
add unit tesst to onwer, group and 'owner and group' filters, clean t…
khaledk2 Nov 9, 2022
fe0c411
Improve documentation
khaledk2 Nov 16, 2022
de129a6
Fix test no of images for container
khaledk2 Feb 7, 2023
c3f345c
Add backup flag to test get_index_data_from_database
khaledk2 Feb 7, 2023
6146d5a
Check no of images inside container using id
khaledk2 Feb 10, 2023
3ef83ee
Fix unit test issues
khaledk2 Feb 21, 2023
61162a6
Merge branch 'main' into modify_unit_test_validation
khaledk2 Jan 21, 2024
750670c
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jan 21, 2024
9e5e4f0
Fix code syntax
khaledk2 Jan 21, 2024
dd5cd00
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jan 21, 2024
16a8471
Address J-M comments
khaledk2 Jul 10, 2024
c14c3b2
Download the database backup rather than has it on the project repo
khaledk2 Jul 11, 2024
0daaac4
fix comments
khaledk2 Jul 11, 2024
a8ac334
use the download url from openmicroscopy
khaledk2 Jul 11, 2024
f6cef9a
Add test indexing and queries to unit tests and add validation to key…
khaledk2 Nov 5, 2022
8376c29
Add port number to restore database
khaledk2 Nov 5, 2022
f754500
Update utils.py
khaledk2 Nov 6, 2022
7cb2dd2
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 7, 2022
a454622
Add unite test for contains and not contains
khaledk2 Nov 7, 2022
c1f392a
turn on deep check for unit tests
khaledk2 Nov 7, 2022
735ec87
add unit tesst to onwer, group and 'owner and group' filters, clean t…
khaledk2 Nov 9, 2022
22c23d3
Improve documentation
khaledk2 Nov 16, 2022
8fd65dd
Fix test no of images for container
khaledk2 Feb 7, 2023
357b90f
Add backup flag to test get_index_data_from_database
khaledk2 Feb 7, 2023
3da42e1
Check no of images inside container using id
khaledk2 Feb 10, 2023
6e7bca6
Fix unit test issues
khaledk2 Feb 21, 2023
95a229f
[pre-commit.ci] pre-commit autoupdate
pre-commit-ci[bot] Feb 7, 2023
e9331fd
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 7, 2023
39e7902
add rtd config
jburel May 20, 2023
77af6be
add doc badge
jburel May 20, 2023
84ec411
update developer documents
khaledk2 May 17, 2023
db811f8
Get container keys, and get values for a key in a container
khaledk2 Jan 21, 2023
2ce0f8c
add method to check conainer ket and values
khaledk2 Jan 22, 2023
304e5d8
Fix pre commit format
khaledk2 Jan 22, 2023
4ff76ef
Update test container key values
khaledk2 Jan 27, 2023
ae45c3e
Get container keys, and get values for a key in a container
khaledk2 Jan 21, 2023
363720c
add method to check conainer ket and values
khaledk2 Jan 22, 2023
4fc3d4e
Fix pre commit fix
khaledk2 Jan 27, 2023
76652cf
adding the option to generate a CSV file
khaledk2 Jan 31, 2023
af7186d
add comments
khaledk2 Jan 31, 2023
1b45c17
Fix typo
khaledk2 Feb 6, 2023
0f31cf6
Fix typo in file name
khaledk2 Mar 1, 2023
6e3594d
update changelog
khaledk2 Jun 12, 2023
7f5224b
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 12, 2023
0db69de
Mention the PR in the second features
khaledk2 Jun 12, 2023
3b2476f
Secure the connection with the elsticsearch
khaledk2 Jul 25, 2023
e02aaf8
Fix pre commit checks
khaledk2 Jul 26, 2023
6c20d33
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 26, 2023
803bdaa
Update to elasticsearch 8.8.1
khaledk2 Jul 26, 2023
643e5cf
FIx action
khaledk2 Jul 26, 2023
af1dd22
Set elastic search username and password
khaledk2 Jul 26, 2023
0364be1
add instruction to set ELASTIC_PASSWORD
khaledk2 Sep 5, 2023
6f90bf3
add tag 0.5.3 to changelog
khaledk2 Sep 25, 2023
8ca53b8
Update CHANGELOG.md
khaledk2 Sep 25, 2023
86bf2b0
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jan 21, 2024
beb3413
Fix code syntax
khaledk2 Jan 21, 2024
0d80924
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jan 21, 2024
51e3076
Address J-M comments
khaledk2 Jul 10, 2024
67d47b5
Download the database backup rather than has it on the project repo
khaledk2 Jul 11, 2024
7b05849
fix comments
khaledk2 Jul 11, 2024
69dbaa7
use the download url from openmicroscopy
khaledk2 Jul 11, 2024
4337d9c
fix merge conflict
khaledk2 Jul 11, 2024
bb757ab
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 11, 2024
14e1a89
Update .github/workflows/main.yml
khaledk2 Jul 15, 2024
7ad0371
address J-M review
khaledk2 Jul 15, 2024
e726b4d
Merge branch 'modify_unit_test_validation' of https://github.com/khal…
khaledk2 Jul 15, 2024
3dabf54
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 15, 2024
6c8ca41
address J-M comments
khaledk2 Jul 16, 2024
8c92eeb
Address J-M commit
khaledk2 Jul 18, 2024
e066e02
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 18, 2024
41bcf66
fix workflow issue
khaledk2 Jul 19, 2024
bbbf603
Merge branch 'modify_unit_test_validation' of https://github.com/khal…
khaledk2 Jul 19, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,13 @@ jobs:
python manage.py set_database_configuration -u localhost -s ${{ job.services.postgres.ports[5432] }} -n postgress -p passwprd
# configure elasticsearch
python manage.py set_elasticsearch_configuration -e localhost:${{ job.services.elasticsearch.ports[9200] }}
# download and extract the database backup file
wget https://downloads.openmicroscopy.org/images/omero_db_searchengine.zip -P app_data
unzip app_data/omero_db_searchengine.zip -d app_data/
# run restore omero database
python manage.py restore_postgresql_database
# run indexing indexing
python manage.py get_index_data_from_database -b False
# run tests
python -m unittest discover -s unit_tests
upload:
Expand Down
24 changes: 23 additions & 1 deletion app_data/test_index_data.json
Original file line number Diff line number Diff line change
Expand Up @@ -96,5 +96,27 @@
"validation screen"
]
]
}
},
"query_in": {
"image": [
[
"Gene Symbol",
[
"Duoxa2",
"Bach2",
"Cxcr2",
"Mysm1"
]
],
[
"Organism",
[
"homo sapiens",
"mus musculus",
"mus musculus x mus spretus",
"human adenovirus 2"
]
]
]
}
}
1 change: 1 addition & 0 deletions configurations/app_config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,3 +14,4 @@ MAX_RETUNED_ITEMS : 1700000
ELASTICSEARCH_BACKUP_FOLDER: "path/to/elasticsearch/backup/folder"
verify_certs: False
ELASTIC_PASSWORD: elasticsearch_user_password
BASE_FOLDER: /etc/searchengine/
8 changes: 6 additions & 2 deletions examples/search.py
Original file line number Diff line number Diff line change
Expand Up @@ -88,5 +88,9 @@ def call_omero_return_results(url, data=None, method="post"):
% (len(received_results), total_results, page, total_pages, bookmark)
)

# 2000 /11686633, page: 1/11687, bookmark: 109600
# 2000 /12225067, page: 1/12226, bookmark: 109600
# another example using in operators and send items inside value as a string,
# The List items are separated by ','
logging.info("Using in operator")
url = "%s%s?key=Gene Symbol&value=Pdgfc,Rnase10&operator=in" % (base_url, image_search)
bookmark, total_results, total_pages = call_omero_return_results(url, method="get")
logging.info("%s,%s" % (total_results, total_pages))
42 changes: 42 additions & 0 deletions examples/using_in_operator.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-

# Copyright (C) 2024 University of Dundee & Open Microscopy Environment.
# All rights reserved.
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as
# published by the Free Software Foundation, either version 3 of the
# License, or (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU Affero General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.


from utils import query_the_search_ending, logging

# It is similar to use the 'in' operator in a sql statement,
# rather than having multiple 'or' conditions,
# it will only use a single condition.

# The following example will search for the images which have any of the 'Gene Symbol'
# values in this list ["Duoxa2", "Bach2", "Cxcr2", "Mysm1"]

# and filters

logging.info("Example of using in operator")


values_in = ["Duoxa2", "Bach2", "Cxcr2", "Mysm1"]
logging.info("Searching for 'Gene Symbol' with values in [%s]" % (",".join(values_in)))
and_filters = [{"name": "Gene Symbol", "value": values_in, "operator": "in"}]

main_attributes = []
query = {"and_filters": and_filters}
#
recieved_results_data = query_the_search_ending(query, main_attributes)
48 changes: 48 additions & 0 deletions examples/using_not_in_operator.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-

# Copyright (C) 2024 University of Dundee & Open Microscopy Environment.
# All rights reserved.
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as
# published by the Free Software Foundation, either version 3 of the
# License, or (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU Affero General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.


from utils import query_the_search_ending, logging

# It is similar to use the 'not in' operator in a sql statement,
# rather than having multiple 'or' conditions with not_equals operators,
# it will only use a single condition.

# The following example will search for the images which have met any of the 'Organism'
# values in this list
# ["homo sapiens","mus musculus","mus musculus x mus spretus","human adenovirus 2"]

# and filters

logging.info("Example of using not_in operator")


values_not_in = [
"homo sapiens",
"mus musculus",
"mus musculus x mus spretus",
"human adenovirus 2",
]
logging.info("Searching for 'Organism' with values in [%s]" % (",".join(values_not_in)))
and_filters = [{"name": "Organism", "value": values_not_in, "operator": "not_in"}]

main_attributes = []
query = {"and_filters": and_filters}
#
received_results_data = query_the_search_ending(query, main_attributes)
60 changes: 58 additions & 2 deletions manage.py
Original file line number Diff line number Diff line change
Expand Up @@ -114,13 +114,25 @@ def sql_results_to_panda():
pass


@manager.command
def restore_postgresql_database():
from omero_search_engine.database.utils import restore_database

restore_database()


@manager.command
@manager.option(
"-r",
"--resource",
help="resource name, creating all the indexes for all the resources is the default", # noqa
)
def get_index_data_from_database(resource="all"):
@manager.option(
"-b",
"--backup",
help="if True, backup will be called ", # noqa
)
def get_index_data_from_database(resource="all", backup="True"):
"""
insert data in Elasticsearch index for each resource
It gets the data from postgres database server
Expand All @@ -132,7 +144,9 @@ def get_index_data_from_database(resource="all"):
get_insert_data_to_index,
save_key_value_buckets,
)
import json

backup = json.loads(backup.lower())
if resource != "all":
sql_st = sqls_resources.get(resource)
if not sql_st:
Expand All @@ -148,7 +162,8 @@ def get_index_data_from_database(resource="all"):
test_indexing_search_query(deep_check=False, check_studies=True)

# backup the index data
backup_elasticsearch_data()
if backup:
backup_elasticsearch_data()


# set configurations
Expand Down Expand Up @@ -351,6 +366,44 @@ def restore_elasticsearch_data():
restore_indices_data()


@manager.command
@manager.option("-s", "--screen_name", help="Screen name, or part of it")
@manager.option("-p", "--project_name", help="Project name, or part of it")
def data_validator(screen_name=None, project_name=None):
"""
Checking key-value pair for trailing and heading space.
It also checks the key-value pair duplication.
It can check all the projects and screens.
Also, it can run for a specific project or screen.
The output is a collection of CSV files; each check usually generates three files:
The main file contains image details (e.g. image id)
in addition to the key and the value.
one file for screens and one for projects.
Each file contains the screen name (project name),
the key-value pair which has the issue and the total number of affected
images for each row.
"""
from datetime import datetime

if screen_name and project_name:
print("Either screen name or project name is allowed")

from omero_search_engine.validation.omero_keyvalue_data_validator import (
check_for_heading_space,
check_for_trailing_space,
check_duplicated_keyvalue_pairs,
)

start = datetime.now()
check_for_trailing_space(screen_name, project_name)
start1 = datetime.now()
check_for_heading_space(screen_name, project_name)
start2 = datetime.now()
check_duplicated_keyvalue_pairs(screen_name, project_name)
end = datetime.now()
print("start: %s, start1: %s, start2: %s, end: %s" % (start, start1, start2, end))


@manager.command
def test_container_key_value():
from omero_search_engine.validation.results_validator import (
Expand All @@ -361,4 +414,7 @@ def test_container_key_value():


if __name__ == "__main__":
from flask_script import Command

Command.capture_all_args = False
manager.run()
25 changes: 21 additions & 4 deletions omero_search_engine/api/v1/resources/query_handler.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,9 @@
"screen": {"name": "name", "description": "description"},
}

res_and_main_attributes = None
res_or_main_attributes = None


def check_get_names(idr_, resource, attribute, return_exact=False):
# check the idr name and return the resource and possible values
Expand Down Expand Up @@ -107,8 +110,10 @@ def adjust_resource(self):
)
if len(ac_value) == 1:
self.value = ac_value[0]
else:
elif len(ac_value) == 0:
self.value = -1
else:
self.value = ac_value
"""
pr_names = get_resource_names(self.resource)
if not self.value in pr_names:
Expand Down Expand Up @@ -337,6 +342,7 @@ def get_image_non_image_query(self):

def run_query(self, query_, resource):
main_attributes = {}

query = {"and_filters": [], "or_filters": []}

if query_.get("and_filters"):
Expand Down Expand Up @@ -398,6 +404,11 @@ def run_query(self, query_, resource):
# res = search_query(query, resource, bookmark,
# self.raw_elasticsearch_query,
# main_attributes,return_containers=self.return_containers)
global res_and_main_attributes, res_or_main_attributes
if res_and_main_attributes:
main_attributes["and_main_attributes"] = (
main_attributes.get("and_main_attributes") + res_and_main_attributes
)
if resource == "image" and self.return_containers:
res = search_query(
query,
Expand Down Expand Up @@ -633,6 +644,12 @@ def determine_search_results_(query_, return_columns=False, return_containers=Fa
and_filters = query_.get("query_details").get("and_filters")
or_filters = query_.get("query_details").get("or_filters")
and_query_groups = []
main_attributes = query_.get("main_attributes")
global res_and_main_attributes, res_or_main_attributes
if main_attributes:
res_and_main_attributes = main_attributes.get("and_main_attributes")
res_or_main_attributes = main_attributes.get("or_main_attributes")

columns_def = query_.get("columns_def")
or_query_groups = []
if and_filters and len(and_filters) > 0:
Expand Down Expand Up @@ -785,9 +802,9 @@ def add_local_schemas_to(resolver, schema_folder, base_uri, schema_ext=".json"):


def query_validator(query):
query_schema_file = (
"omero_search_engine/api/v1/resources/schemas/query_data.json" # noqa
)
print("TRoz", query)
main_dir = os.path.abspath(os.path.dirname(__file__))
query_schema_file = os.path.join(main_dir, "schemas", "query_data.json")
base_uri = "file:" + abspath("") + "/"
with open(query_schema_file, "r") as schema_f:
query_schema = json.loads(schema_f.read())
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,12 +13,12 @@
},
"value": {
"name":"value",
"type": "string"
"type": ["array", "string"]
},
"operator": {
"name": "operator",
"type": "string",
"enum": ["equals", "not_equals", "contains","not_contains"]
"enum": ["equals", "not_equals", "contains", "not_contains", "in", "not_in"]
}
,"resource": {
"name": "resource",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ parameters:
description: operator, default equals
in: query
type: string
enum: ['equals', 'not_equals', 'contains', 'not_contains']
enum: ['equals', 'not_equals', 'contains', 'not_contains', 'in', 'not_in']
- name: case_sensitive
description: case sensitive query, default False
in: query
Expand Down
Loading
Loading