Skip to content

Commit 6c8bd34

Browse files
authored
Merge pull request #170 from WMD-group/0.6.1_updates
Fix `parse_species` to handle non-integer oxidation states
2 parents c6b93a7 + cb93b20 commit 6c8bd34

File tree

10 files changed

+213
-48
lines changed

10 files changed

+213
-48
lines changed

.pre-commit-config.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,3 +53,7 @@ repos:
5353
args: [--toml, pyproject.toml]
5454
additional_dependencies:
5555
- tomli
56+
- repo: https://github.com/adamchainz/blacken-docs
57+
rev: 1.18.0
58+
hooks:
59+
- id: blacken-docs

README.md

Lines changed: 35 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -71,21 +71,25 @@ With -e pip will create links to the source folder so that changes to the code w
7171

7272
For simple usage, you can instantiate an Embedding object using one of the embeddings in the [data directory](src/elementembeddings/data/element_representations/README.md). For this example, let's use the magpie elemental representation.
7373

74-
```python
74+
```pycon
7575
# Import the class
7676
>>> from elementembeddings.core import Embedding
7777

7878
# Load the magpie data
79-
>>> magpie = Embedding.load_data('magpie')
79+
>>> magpie = Embedding.load_data("magpie")
8080
```
8181

8282
We can access some of the properties of the `Embedding` class. For example, we can find the dimensions of the elemental representation and the list of elements for which an embedding exists.
8383

84-
```python
84+
```pycon
8585
# Print out some of the properties of the ElementEmbeddings class
86-
>>> print(f'The magpie representation has embeddings of dimension {magpie.dim}')
87-
>>> print(f'The magpie representation contains these elements: \n {magpie.element_list}') # prints out all the elements considered for this representation
88-
>>> print(f'The magpie representation contains these features: \n {magpie.feature_labels}') # Prints out the feature labels of the chosen representation
86+
>>> print(f"The magpie representation has embeddings of dimension {magpie.dim}")
87+
>>> print(
88+
... f"The magpie representation contains these elements: \n {magpie.element_list}"
89+
... ) # prints out all the elements considered for this representation
90+
>>> print(
91+
... f"The magpie representation contains these features: \n {magpie.feature_labels}"
92+
... ) # Prints out the feature labels of the chosen representation
8993

9094
The magpie representation has embeddings of dimension 22
9195
The magpie representation contains these elements:
@@ -102,26 +106,40 @@ We can quickly generate heatmaps of distance/similarity measures between the ele
102106
from elementembeddings.plotter import heatmap_plotter, dimension_plotter
103107
import matplotlib.pyplot as plt
104108

105-
magpie.standardise(inplace=True) # Standardises the representation
109+
magpie.standardise(inplace=True) # Standardises the representation
106110

107-
fig, ax = plt.subplots(1, 1, figsize=(6,6))
111+
fig, ax = plt.subplots(1, 1, figsize=(6, 6))
108112
heatmap_params = {"vmin": -1, "vmax": 1}
109-
heatmap_plotter(embedding=magpie, metric="cosine_similarity",show_axislabels=False,cmap="Blues_r",ax=ax, **heatmap_params)
113+
heatmap_plotter(
114+
embedding=magpie,
115+
metric="cosine_similarity",
116+
show_axislabels=False,
117+
cmap="Blues_r",
118+
ax=ax,
119+
**heatmap_params
120+
)
110121
ax.set_title("Magpie cosine similarities")
111122
fig.tight_layout()
112123
fig.show()
113-
114124
```
115125

116126
<img src="resources/magpie_cosine_sim_heatmap.png" alt = "Cosine similarity heatmap of the magpie representation" width="50%"/>
117127

118128
```python
119-
fig, ax = plt.subplots(1, 1, figsize=(6,6))
120-
121-
reducer_params={"n_neighbors": 30, "random_state":42}
122-
scatter_params = {"s":100}
123-
124-
dimension_plotter(embedding=magpie, reducer="umap",n_components=2,ax=ax,adjusttext=True,reducer_params=reducer_params, scatter_params=scatter_params)
129+
fig, ax = plt.subplots(1, 1, figsize=(6, 6))
130+
131+
reducer_params = {"n_neighbors": 30, "random_state": 42}
132+
scatter_params = {"s": 100}
133+
134+
dimension_plotter(
135+
embedding=magpie,
136+
reducer="umap",
137+
n_components=2,
138+
ax=ax,
139+
adjusttext=True,
140+
reducer_params=reducer_params,
141+
scatter_params=scatter_params,
142+
)
125143
ax.set_title("Magpie UMAP (n_neighbours=30)")
126144
ax.legend().remove()
127145
handles, labels = ax1.get_legend_handles_labels()
@@ -149,7 +167,7 @@ The `composition_featuriser` function can be used to featurise the data. The com
149167
```python
150168
from elementembeddings.composition import composition_featuriser
151169

152-
df_featurised = composition_featuriser(df, embedding="magpie", stats=["mean","sum"])
170+
df_featurised = composition_featuriser(df, embedding="magpie", stats=["mean", "sum"])
153171

154172
df_featurised
155173
```

contributing.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Contributing
1+
`# Contributing
22

33
This is a quick guide on how to follow best practice and contribute smoothly to `ElementEmbeddings`.
44

@@ -49,3 +49,4 @@ pre-commit run --all-files # optionally run hooks on all files
4949
```
5050

5151
Pre-commit hooks will check all files when you commit changes, automatically fixing any files which are not formatted correctly. Those files will need to be staged again before re-attempting the commit.
52+
`

docs/embeddings/element.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -162,8 +162,8 @@ The 118 200-dimensional vectors in `random_200_new` were generated using the fol
162162
```python
163163
import numpy as np
164164

165-
mu , sigma = 0 , 1 # mean and standard deviation s = np.random.normal(mu, sigma, 1000)
166-
s = np.random.default_rng(seed=42).normal(mu, sigma, (118,200))
165+
mu, sigma = 0, 1 # mean and standard deviation s = np.random.normal(mu, sigma, 1000)
166+
s = np.random.default_rng(seed=42).normal(mu, sigma, (118, 200))
167167
```
168168

169169
### skipatom

docs/tutorials.md

Lines changed: 161 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -8,25 +8,150 @@ For simple usage, you can instantiate an Embedding object using one of the embed
88

99
```python
1010
# Import the class
11-
>>> from elementembeddings.core import Embedding
11+
from elementembeddings.core import Embedding
1212

1313
# Load the magpie data
14-
>>> magpie = Embedding.load_data('magpie')
14+
magpie = Embedding.load_data("magpie")
1515
```
1616

1717
We can access some of the properties of the `Embedding` class. For example, we can find the dimensions of the elemental representation and the list of elements for which an embedding exists.
1818

1919
```python
2020
# Print out some of the properties of the ElementEmbeddings class
21-
>>> print(f'The magpie representation has embeddings of dimension {magpie.dim}')
22-
>>> print(f'The magpie representation contains these elements: \n {magpie.element_list}') # prints out all the elements considered for this representation
23-
>>> print(f'The magpie representation contains these features: \n {magpie.feature_labels}') # Prints out the feature labels of the chosen representation
24-
25-
The magpie representation has embeddings of dimension 22
26-
The magpie representation contains these elements:
27-
['H', 'He', 'Li', 'Be', 'B', 'C', 'N', 'O', 'F', 'Ne', 'Na', 'Mg', 'Al', 'Si', 'P', 'S', 'Cl', 'Ar', 'K', 'Ca', 'Sc', 'Ti', 'V', 'Cr', 'Mn', 'Fe', 'Co', 'Ni', 'Cu', 'Zn', 'Ga', 'Ge', 'As', 'Se', 'Br', 'Kr', 'Rb', 'Sr', 'Y', 'Zr', 'Nb', 'Mo', 'Tc', 'Ru', 'Rh', 'Pd', 'Ag', 'Cd', 'In', 'Sn', 'Sb', 'Te', 'I', 'Xe', 'Cs', 'Ba', 'La', 'Ce', 'Pr', 'Nd', 'Pm', 'Sm', 'Eu', 'Gd', 'Tb', 'Dy', 'Ho', 'Er', 'Tm', 'Yb', 'Lu', 'Hf', 'Ta', 'W', 'Re', 'Os', 'Ir', 'Pt', 'Au', 'Hg', 'Tl', 'Pb', 'Bi', 'Po', 'At', 'Rn', 'Fr', 'Ra', 'Ac', 'Th', 'Pa', 'U', 'Np', 'Pu', 'Am', 'Cm', 'Bk']
28-
The magpie representation contains these features:
29-
['Number', 'MendeleevNumber', 'AtomicWeight', 'MeltingT', 'Column', 'Row', 'CovalentRadius', 'Electronegativity', 'NsValence', 'NpValence', 'NdValence', 'NfValence', 'NValence', 'NsUnfilled', 'NpUnfilled', 'NdUnfilled', 'NfUnfilled', 'NUnfilled', 'GSvolume_pa', 'GSbandgap', 'GSmagmom', 'SpaceGroupNumber']
21+
print(f"The magpie representation has embeddings of dimension {magpie.dim}")
22+
print(
23+
f"The magpie representation contains these elements: \n {magpie.element_list}"
24+
) # prints out all the elements considered for this representation
25+
print(
26+
f"The magpie representation contains these features: \n {magpie.feature_labels}"
27+
) # Prints out the feature labels of the chosen representation
28+
29+
# The magpie representation has embeddings of dimension 22
30+
# The magpie representation contains these elements:
31+
[
32+
"H",
33+
"He",
34+
"Li",
35+
"Be",
36+
"B",
37+
"C",
38+
"N",
39+
"O",
40+
"F",
41+
"Ne",
42+
"Na",
43+
"Mg",
44+
"Al",
45+
"Si",
46+
"P",
47+
"S",
48+
"Cl",
49+
"Ar",
50+
"K",
51+
"Ca",
52+
"Sc",
53+
"Ti",
54+
"V",
55+
"Cr",
56+
"Mn",
57+
"Fe",
58+
"Co",
59+
"Ni",
60+
"Cu",
61+
"Zn",
62+
"Ga",
63+
"Ge",
64+
"As",
65+
"Se",
66+
"Br",
67+
"Kr",
68+
"Rb",
69+
"Sr",
70+
"Y",
71+
"Zr",
72+
"Nb",
73+
"Mo",
74+
"Tc",
75+
"Ru",
76+
"Rh",
77+
"Pd",
78+
"Ag",
79+
"Cd",
80+
"In",
81+
"Sn",
82+
"Sb",
83+
"Te",
84+
"I",
85+
"Xe",
86+
"Cs",
87+
"Ba",
88+
"La",
89+
"Ce",
90+
"Pr",
91+
"Nd",
92+
"Pm",
93+
"Sm",
94+
"Eu",
95+
"Gd",
96+
"Tb",
97+
"Dy",
98+
"Ho",
99+
"Er",
100+
"Tm",
101+
"Yb",
102+
"Lu",
103+
"Hf",
104+
"Ta",
105+
"W",
106+
"Re",
107+
"Os",
108+
"Ir",
109+
"Pt",
110+
"Au",
111+
"Hg",
112+
"Tl",
113+
"Pb",
114+
"Bi",
115+
"Po",
116+
"At",
117+
"Rn",
118+
"Fr",
119+
"Ra",
120+
"Ac",
121+
"Th",
122+
"Pa",
123+
"U",
124+
"Np",
125+
"Pu",
126+
"Am",
127+
"Cm",
128+
"Bk",
129+
]
130+
# The magpie representation contains these features:
131+
[
132+
"Number",
133+
"MendeleevNumber",
134+
"AtomicWeight",
135+
"MeltingT",
136+
"Column",
137+
"Row",
138+
"CovalentRadius",
139+
"Electronegativity",
140+
"NsValence",
141+
"NpValence",
142+
"NdValence",
143+
"NfValence",
144+
"NValence",
145+
"NsUnfilled",
146+
"NpUnfilled",
147+
"NdUnfilled",
148+
"NfUnfilled",
149+
"NUnfilled",
150+
"GSvolume_pa",
151+
"GSbandgap",
152+
"GSmagmom",
153+
"SpaceGroupNumber",
154+
]
30155
```
31156

32157
### Plotting
@@ -37,26 +162,40 @@ We can quickly generate heatmaps of distance/similarity measures between the ele
37162
from elementembeddings.plotter import heatmap_plotter, dimension_plotter
38163
import matplotlib.pyplot as plt
39164

40-
magpie.standardise(inplace=True) # Standardises the representation
165+
magpie.standardise(inplace=True) # Standardises the representation
41166

42-
fig, ax = plt.subplots(1, 1, figsize=(6,6))
167+
fig, ax = plt.subplots(1, 1, figsize=(6, 6))
43168
heatmap_params = {"vmin": -1, "vmax": 1}
44-
heatmap_plotter(embedding=magpie, metric="cosine_similarity",show_axislabels=False,cmap="Blues_r",ax=ax, **heatmap_params)
169+
heatmap_plotter(
170+
embedding=magpie,
171+
metric="cosine_similarity",
172+
show_axislabels=False,
173+
cmap="Blues_r",
174+
ax=ax,
175+
**heatmap_params
176+
)
45177
ax.set_title("Magpie cosine similarities")
46178
fig.tight_layout()
47179
fig.show()
48-
49180
```
50181

51182
![Magpie cosine similarity heatmap](images/magpie_cosine_sim_heatmap.png)
52183

53184
```python
54-
fig, ax = plt.subplots(1, 1, figsize=(6,6))
55-
56-
reducer_params={"n_neighbors": 30, "random_state":42}
57-
scatter_params = {"s":100}
58-
59-
dimension_plotter(embedding=magpie, reducer="umap",n_components=2,ax=ax,adjusttext=True,reducer_params=reducer_params, scatter_params=scatter_params)
185+
fig, ax = plt.subplots(1, 1, figsize=(6, 6))
186+
187+
reducer_params = {"n_neighbors": 30, "random_state": 42}
188+
scatter_params = {"s": 100}
189+
190+
dimension_plotter(
191+
embedding=magpie,
192+
reducer="umap",
193+
n_components=2,
194+
ax=ax,
195+
adjusttext=True,
196+
reducer_params=reducer_params,
197+
scatter_params=scatter_params,
198+
)
60199
ax.set_title("Magpie UMAP (n_neighbours=30)")
61200
ax.legend().remove()
62201
handles, labels = ax1.get_legend_handles_labels()
@@ -84,7 +223,7 @@ The `composition_featuriser` function can be used to featurise the data. The com
84223
```python
85224
from elementembeddings.composition import composition_featuriser
86225

87-
df_featurised = composition_featuriser(df, embedding="magpie", stats=["mean","sum"])
226+
df_featurised = composition_featuriser(df, embedding="magpie", stats=["mean", "sum"])
88227

89228
df_featurised
90229
```

setup.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88

99
module_dir = os.path.dirname(os.path.abspath(__file__))
1010

11-
VERSION = "0.6"
11+
VERSION = "0.6.1"
1212
DESCRIPTION = "Element Embeddings"
1313
with open(os.path.join(module_dir, "README.md"), encoding="utf-8") as f:
1414
LONG_DESCRIPTION = f.read()

src/elementembeddings/data/element_representations/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -162,8 +162,8 @@ The 118 200-dimensional vectors in `random_200_new` were generated using the fol
162162
```python
163163
import numpy as np
164164

165-
mu , sigma = 0 , 1 # mean and standard deviation s = np.random.normal(mu, sigma, 1000)
166-
s = np.random.default_rng(seed=42).normal(mu, sigma, (118,200))
165+
mu, sigma = 0, 1 # mean and standard deviation s = np.random.normal(mu, sigma, 1000)
166+
s = np.random.default_rng(seed=42).normal(mu, sigma, (118, 200))
167167
```
168168

169169
### skipatom

src/elementembeddings/plotter.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -175,7 +175,7 @@ def dimension_plotter(
175175
signs = [get_sign(charge) for _, charge in parsed_species]
176176

177177
species_labels = [
178-
rf"$\mathregular{{{element}^{{{abs(charge)}{sign}}}}}}}$"
178+
rf"$\mathregular{{{element}^{{{abs(charge)}{sign}}}}}$"
179179
for (element, charge), sign in zip(parsed_species, signs)
180180
]
181181

src/elementembeddings/tests/test_utils.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,3 +57,6 @@ def test_parse_species(self):
5757
assert species.parse_species("Fe1-") == ("Fe", -1)
5858
assert species.parse_species("Fe+") == ("Fe", 1)
5959
assert species.parse_species("Fe-") == ("Fe", -1)
60+
assert species.parse_species("Fe2.5+") == ("Fe", 2.5)
61+
assert species.parse_species("Fe2.5-") == ("Fe", -2.5)
62+
assert species.parse_species("Fe2.555+") == ("Fe", 2.555)

src/elementembeddings/utils/species.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -34,8 +34,8 @@ def _parse_species_old(species: str) -> tuple[str, int]:
3434
"""
3535
ele = re.match(r"[A-Za-z]+", species).group(0)
3636

37-
charge_match = re.search(r"\d+", species)
38-
ox_state = int(charge_match.group(0)) if charge_match else 0
37+
charge_match = re.search(r"(\d+\.\d+|\d+)", species)
38+
ox_state = float(charge_match.group(1)) if charge_match else 0
3939

4040
if "-" in species:
4141
ox_state *= -1

0 commit comments

Comments
 (0)