Skip to content

Commit 42e0fb9

Browse files
Add models and interfaces
1 parent 3399ac2 commit 42e0fb9

File tree

9 files changed

+130
-2
lines changed

9 files changed

+130
-2
lines changed

.pylintrc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,7 @@ ignored-modules=
6868

6969
# Python code to execute, usually for sys.path manipulation such as
7070
# pygtk.require().
71-
#init-hook=
71+
init-hook='import sys; sys.path.append("./similarity"); sys.path.append("./similarityRunner")'
7272

7373
# Use multiple processes to speed up Pylint. Specifying 0 will auto-detect the
7474
# number of processors available to use, and will cap the count on Windows to

requirements.txt

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,34 @@
1+
altair==5.4.1
2+
annotated-types==0.7.0
13
astroid==3.2.4
4+
attrs==24.2.0
25
black==24.8.0
6+
blinker==1.8.2
7+
cachetools==5.5.0
38
certifi==2024.8.30
49
charset-normalizer==3.3.2
510
click==8.1.7
11+
coverage==7.6.1
612
dill==0.3.8
713
filelock==3.16.0
814
fsspec==2024.9.0
15+
gitdb==4.0.11
16+
GitPython==3.1.43
917
huggingface-hub==0.24.7
1018
idna==3.10
1119
iniconfig==2.0.0
1220
isort==5.13.2
1321
Jinja2==3.1.4
1422
joblib==1.4.2
23+
jsonschema==4.23.0
24+
jsonschema-specifications==2023.12.1
25+
markdown-it-py==3.0.0
1526
MarkupSafe==2.1.5
1627
mccabe==0.7.0
28+
mdurl==0.1.2
1729
mpmath==1.3.0
1830
mypy-extensions==1.0.0
31+
narwhals==1.8.1
1932
networkx==3.3
2033
numpy==1.26.4
2134
packaging==24.1
@@ -25,24 +38,37 @@ pillow==10.4.0
2538
platformdirs==4.3.3
2639
plotly==5.24.1
2740
pluggy==1.5.0
41+
protobuf==5.28.1
42+
pyarrow==17.0.0
43+
pydantic==2.9.2
44+
pydantic_core==2.23.4
45+
pydeck==0.9.1
46+
Pygments==2.18.0
2847
pylint==3.2.7
2948
pytest==8.3.3
3049
python-dateutil==2.9.0.post0
3150
pytz==2024.2
3251
PyYAML==6.0.2
52+
referencing==0.35.1
3353
regex==2024.9.11
3454
requests==2.32.3
55+
rich==13.8.1
56+
rpds-py==0.20.0
3557
safetensors==0.4.5
3658
scikit-learn==1.5.2
3759
scipy==1.14.1
3860
sentence-transformers==3.1.0
3961
six==1.16.0
62+
smmap==5.0.1
63+
streamlit==1.38.0
4064
sympy==1.13.2
4165
tenacity==9.0.0
4266
threadpoolctl==3.5.0
4367
tokenizers==0.19.1
68+
toml==0.10.2
4469
tomlkit==0.13.2
4570
torch==2.4.1
71+
tornado==6.4.1
4672
tqdm==4.66.5
4773
transformers==4.44.2
4874
typing_extensions==4.12.2

similarity/README.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
11
# Structure
2-
- folder [comparing_all_tables](comparing_all_tables)
2+
- folder [comparing_all_tables](comparing_all_tables)
3+
- folder [interfaces](interfaces)
4+
- folder [models](models)
35
- file [Comparator](Comparator.py)
46
- file [ComparatorByColumns](ComparatorByColumn.py)
57
- file [Types](Types.py)
@@ -15,6 +17,10 @@ we do not recommend to use it.
1517

1618
File `categorical.ipynb` shows usage of `comparing.py`.
1719

20+
## folder interfaces
21+
This folder contains two files: `ConnectorInterface.py` and `UserInterface.py`.
22+
## folder models
23+
Contains all models that are used for interfaces.
1824
## file Comparator.py
1925
File contains Comparator class, ComparatorType classes and DistanceFunction
2026
Comparator is part of the pipeline that is shown below
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
"""
2+
File contains Connector interface
3+
"""
4+
import abc
5+
6+
from models.connector_models import ConnectorSettings, Output, ConnectorOutput
7+
8+
9+
class ConnectorInterface(metaclass=abc.ABCMeta):
10+
"""
11+
ConnectorInterface class is an abstract class that defines
12+
the methods that must be implemented by the concrete connector classes.
13+
"""
14+
15+
@abc.abstractmethod
16+
def _connect_and_load_data_source(self, settings: ConnectorSettings) -> ConnectorOutput:
17+
"""Load in the data set
18+
:param settings: ConnectorSettings
19+
this is a protected method"""
20+
raise NotImplementedError
21+
22+
@abc.abstractmethod
23+
def _format_data(self, data: ConnectorOutput) -> Output:
24+
"""Format loaded data
25+
this is a protected method"""
26+
raise NotImplementedError
27+
28+
def get_data(self, settings: ConnectorSettings) -> Output:
29+
"""Get formated data from the loaded data source
30+
:return: data"""
31+
data = self._connect_and_load_data_source(settings)
32+
return self._format_data(data)
Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
"""
2+
File contains UserInterface interface
3+
"""
4+
import abc
5+
6+
from models.user_models import SimilarityOutput
7+
from models.connector_models import ConnectorSettings
8+
9+
10+
class UserInterface(metaclass=abc.ABCMeta):
11+
"""
12+
UserInterface is an abstract class that defines the methods
13+
that must be implemented by any class that inherits from it.
14+
"""
15+
16+
@abc.abstractmethod
17+
def get_user_input(self) -> ConnectorSettings:
18+
"""
19+
Get user input and returns it as ConnectorSettings object
20+
"""
21+
raise NotImplementedError
22+
23+
@abc.abstractmethod
24+
def display_output(self, output: SimilarityOutput) -> None:
25+
"""
26+
Display output to the user
27+
"""
28+
raise NotImplementedError

similarityRunner/interfaces/__init__.py

Whitespace-only changes.

similarityRunner/models/__init__.py

Whitespace-only changes.
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
"""
2+
Connector models module contains:
3+
- the base class for connector settings and derived classes.
4+
- the base class for connector output and derived classes.
5+
"""
6+
import pandas as pd
7+
from pydantic import BaseModel
8+
9+
Output = pd.DataFrame
10+
11+
12+
class ConnectorSettings(BaseModel):
13+
"""
14+
ConnectorSettings class is a base class for connector settings.
15+
"""
16+
# here will be common fields for all connectors
17+
18+
19+
class ConnectorOutput(BaseModel):
20+
"""
21+
ConnectorOutput class is a base class for connector output.
22+
"""
23+
# here will be common fields for all connectors
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
"""
2+
This module contains the user models
3+
"""
4+
from pydantic import BaseModel
5+
6+
7+
class SimilarityOutput(BaseModel):
8+
"""
9+
SimilarityOutput class isclass containing similarity output.
10+
"""
11+
# here will be common fields for all similarity models
12+
table_names: list[str]
13+
distances: dict[(str, str),float]

0 commit comments

Comments
 (0)