Skip to content

Commit ead6f35

Browse files
committed
Misc TODOs
1 parent 182ab4c commit ead6f35

File tree

6 files changed

+82
-28
lines changed

6 files changed

+82
-28
lines changed

README.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -136,6 +136,19 @@ As shown in the `doped` tutorials, it is highly recommended to use the [`ShakeNB
136136
- S. R. Kavanagh, D. O. Scanlon, A. Walsh **_Rapid Recombination by Cadmium Vacancies in CdTe_** [_ACS Energy Letters_](https://pubs.acs.org/doi/full/10.1021/acsenergylett.1c00380) 2021
137137
- C. J. Krajewska et al. **_Enhanced visible light absorption in layered Cs<sub>3</sub>Bi<sub>2</sub>Br<sub>9</sub> through mixed-valence Sn(II)/Sn(IV) doping_** [_Chemical Science_](https://doi.org/10.1039/D1SC03775G) 2021
138138

139+
## Open Science and Reproducibility
140+
Robust, open and reproducible science greatly strengthens the impact of research. This is especially true for
141+
computational defect modelling, given the many steps and complexities involved -- see
142+
[_Guidelines for robust and reproducible point defect simulations in crystals_](https://doi.org/10.26434/chemrxiv-2025-3lb5k)
143+
for discussion.
144+
145+
`doped` has been built to aid robustness and reproducibility for computational defect studies.
146+
**We highly recommend** that the `doped`/`ShakeNBreak` class objects, which store key metadata and can be directly
147+
output to lightweight `json(.gz)` files be shared in open-access repositories upon publication, along with relevant raw
148+
computational data. It is also helpful to use the `doped` summary functions to tabulate key quantities in Supplementary
149+
Information files. See the [Open Science](https://doped.readthedocs.io/en/latest/Tips.html#open-science-and-reproducibility)
150+
section of the docs Tips page for details.
151+
139152
## Acknowledgments
140153
`doped` (née `DefectsWithTheBoys`) has benefitted from feedback from many users, in particular
141154
members of the [Scanlon](http://davidscanlon.com/) and [Walsh](https://wmd-group.github.io/) research groups who have used / are using it in their work. Direct contributors are listed in the `Contributors` sidebar above; including [Seán Kavanagh](https://sam-lab.net),

docs/Dev_ToDo.md

Lines changed: 3 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -38,35 +38,22 @@
3838

3939
- Docs:
4040
- Barebones tutorial workflow, as suggested by Alex G.
41-
- Add note about `NUPDOWN` for triplet states (bipolarons or dimers (e.g. C-C in Si apparently has ~0.5 eV energy splitting (10.1038/s41467-023-36090-2), and 0.4 eV for O-O in STO from Kanta, but smaller for VCd bipolaron in CdTe))).
4241
- Add our recommended workflow (gam, NKRED, std, ncl). See https://sites.tufts.edu/andrewrosen/density-functional-theory/vasp/ for some possibly useful general tips.
4342
- Workflow diagram with: https://twitter.com/Andrew_S_Rosen/status/1678115044348039168?s=20
4443
- Show on chemical potentials docs how chempots can be later set as attribute for ``DefectThermodynamics`` (loaded from `json`) (e.g. if user had finished and parsed defect calculations first, and then finished chemical potential calculations after).
4544
- Example on docs (miscellaneous/advanced analysis tutorial page?) for adding entries / combining multiple ``DefectThermodynamics`` objects
46-
- Note that bandfilling corrections are no longer supported, as in most cases they shouldn't be used anyway, and if you have band occupation in your supercell then the energies aren't accurate anyway as it's a resonant/shallow defect, and this is just lowering the energy so it sits near the band edge (leads to false charge state behaviour being a bit more common etc). If the user wants to add bandfilling corrections, they can still doing this by calculating it themselves and adding to the `corrections` attribute. (Link our code in old `pymatgen` for doing this)
4745
- Regarding competing phases with many low-energy polymorphs from the Materials Project; will build
4846
in a warning when many entries for the same composition, say which have database IDs, warn the user
4947
and direct to relevant section on the docs -> Give some general foolproof advice for how best to deal
5048
with these cases (i.e. check the ICSD and online for which is actually the groundstate structure,
5149
and/or if it's known from other work for your chosen functional etc.)
52-
- Show our workflow for calculating interstitials (see docs Tips page, i.e. `vasp_gam` relaxations first (can point to defects tutorial for this)) -> Need to mention this in the defects tutorial, and point to discussion in Tips docs page.
5350
- `vasp_ncl` chemical potential calculations for metals, use `ISMEAR = -5`, possibly `NKRED` etc. (make a function to generate `vasp_ncl` calculation files with `ISMEAR = -5`, with option to set different kpoints) - if `ISMEAR = 0` - converged kpoints still prohibitively large, use vasp_converge_files again to check for quicker convergence with ISMEAR = -5.
51+
- Worth noting that for metals it may sometimes be preferable to use a larger cell with reduced kpoints, due to memory limitations.
5452
- Often can't use `NKRED` with `vasp_std`, because we don't know beforehand the kpts in the IBZ (because symmetry on for `vasp_std` chempot calcs)(same goes for `EVENONLY = True`).
55-
- Worth noting that for metals it may sometimes be preferable to use a larger cell with reduced kpoints, due to memory limitations.
5653
- Readily-usable in conjunction with `atomate`, `AiiDA`(-defects), `vise`, `CarrierCapture`, and give some
5754
quick examples? Add as optional dependencies.
5855
- Setting `LREAL = Auto` can sometimes be worth doing if you have a very large supercell for speed up, _but_ it's important to do a final calculation with `LREAL = False` for accurate energies/forces, so only do if you're a power user and have a very large supercell.
5956
- Show usage of `get_conv_cell_site` in notebooks/docs (in an advanced analysis tutorial with other possibly useful functions being showcased?)
60-
- Note in docs that `spglib` convention used for Wyckoff labels and conventional structure definition.
61-
Primitive structure can change, as can supercell / supercell matrix (depending on input structure,
62-
`generate_supercell` etc), but conventional cell should always be the same (`spglib` convention).
63-
- Add examples of extending to
64-
non-radiative carrier capture calcs with `CarrierCapture.jl` and `nonrad`. Show example of using
65-
`sumo` to get the DOS plot of a defect calc, and why this is useful.
66-
-
67-
- Should have recommendation somewhere about open science practices. The doped defect dict and thermo jsons should always be shared in e.g. Zenodo when publishing, as contains all info on the parsed defect data in a lean format. Also using the `get_formation_energies` etc. functions for SI tables is recommended.
68-
-
69-
-
7057
- Add our general rule-of-thumbs/expectations regarding charge corrections:
7158
- Potential alignment terms should rarely ever be massive
7259
- In general, the correction terms should follow somewhat consistent trends (for a given charge state, across defects), so if you see a large outlier in the corrections, it's implying something odd is happening there. This is can be fairly easily scanned with `get_formation_energies`.
@@ -77,7 +64,7 @@
7764
oxidation states and can fail in weird cases. As always please consider if these charge states are
7865
reasonable for the defects in your system. (i.e. low-symmetry, amphoteric, mixed-valence cases etc!)
7966
- Note cases where we expect default charge states to not be appropriate (e.g. mixed ionic-covalent systems, low-symmetry systems and/or with amphoteric species), often better to test more than necessary to be thorough! (And link Xinwei stuff, Ke F_i +1 (also found with our Se and Alex's Ba2BiO6)) – i.e.
80-
use your f*cking head!
67+
use your head!
8168
- And particularly when you've calculated your initial set of defect results! E.g. with Sb2Se3, all antisites and interstitials amphoteric, so suggests you should re-check amphotericity for all vacancies
8269
- Note about rare cases where `vasp_gam` pre-relaxation can fail (e.g. Wenzhen's case); extremely disperse bands with small bandgaps, where low k-point sampling can induce a phase transition in the bulk structure. In these cases, using a special k-point is advised for the pre-relaxations. You can get the corresponding k-point for your supercell (given the primitive cell special k-point) using the `get_K_from_k` function from `easyunfold`, with the `doped` `supercell_matrix`.
8370
- Show quick example case of the IPR code from `pymatgen-analysis-defects` (or from Adair code? or others?)
@@ -91,9 +78,8 @@
9178
- `doped` repo/docs cleanup `TODO`s above, and check through code TODOs
9279
- Should have a general refactor from `(bulk, defect)` to `(defect, bulk)` in inputs to functions (e.g. site-matching, symmetry functions etc), as this is most intuitive and then keep consistent throughout?
9380
- Configuration coordinate diagram generation tutorial, linked in other tutorials and codes (CarrierCapture.jl). For defect PESs for carrier capture or NEB calculations (don't use `IBRION = 2` for NEB), and tests.
81+
- Tests for configuration coordinate diagram generation code
9482
- Stenciling tutorial and tests.
95-
- Tests for configuration coordinate diagram generation code
9683
- Quick-start tutorial suggested by Alex G
9784
- Add example to chemical potentials / thermodynamics analysis tutorials of varying chemical potentials as a function of temperature/pressure (i.e. gas phases), using the `Spinney` functions detailed here (https://spinney.readthedocs.io/en/latest/tutorial/chemipots.html#including-temperature-and-pressure-effects-through-the-gas-phase-chemical-potentials) or possibly `DefAP` functions otherwise. Xinwei Sb2S3 stuff possibly a decent example for this, see our notebooks.
9885
- Deal with cases where "X-rich"/"X-poor" corresponds to more than one limit (pick one and warn user?)(e.g. Wenzhen Si2Sb2Te6). Can see `get_chempots` in `pmg-analysis-defects` for inspo on this.
99-
- Automatically detect dimers, check the magnetisation from the calcs, and then warn the user that they may want to try NUPDOWN = 2 (if the magnetisation was singlet)? Do in `DefectsParser` at the end (as part of a 'final-checks' function), so can loop through and check if dimer with magnetisation was calculated at some point for that defect state.

docs/Tips.rst

Lines changed: 28 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -595,11 +595,12 @@ bipolaron/multi-polaron state (e.g. for `V`\ :sub:`Cd`\ :sup:`0*` in
595595
`CdTe <https://pubs.acs.org/doi/10.1021/acsenergylett.1c00380>`__, `V`\ :sub:`Se`\ :sup:`0` in
596596
`t-Se <https://pubs.rsc.org/en/content/articlelanding/2025/ee/d4ee04647a>`__, or `V`\ :sub:`P`\ :sup:`-1`
597597
in `NaP <https://journals.aps.org/prxenergy/abstract/10.1103/PRXEnergy.2.043002>`__), a molecular
598-
dimer-like state (such as O\ :sub:`2` species in oxides or
599-
`carbon pairs in silicon <https://www.nature.com/articles/s41467-023-36090-2>`__), defects involving
600-
multiple localised `d`/`f` electrons, or orbital-degenerate/correlated defects where Hund's rule implies
601-
open-shell solutions (such as the highly-studied
602-
`NV centre in diamond <https://journals.aps.org/prb/abstract/10.1103/PhysRevB.104.235301>`__
598+
dimer-like state (such as O\ :sub:`2` species in
599+
`oxides <https://pubs.acs.org/doi/full/10.1021/acsenergylett.4c01307>`__ or
600+
`carbon pairs in silicon <https://www.nature.com/articles/s41467-023-36090-2>`__) -- for which ``doped``
601+
has an automated warning upon parsing, defects involving multiple localised `d`/`f` electrons, or
602+
orbital-degenerate/correlated defects where Hund's rule implies open-shell solutions (such as the
603+
highly-studied `NV centre in diamond <https://journals.aps.org/prb/abstract/10.1103/PhysRevB.104.235301>`__
603604
or `transition metal impurities in silicon <https://journals.aps.org/prmaterials/abstract/10.1103/PhysRevMaterials.6.L053201>`__).
604605
If you encounter defect states like these and/or suspect that alternative spin configurations may be
605606
possible, you should test the different possibilities by setting ``NUPDOWN`` (and possibly ``MAGMOM``,
@@ -681,8 +682,28 @@ etc.).
681682
``USE_MAGNETIC_SYMMETRY=1`` (i.e. ``os.environ["USE_MAGNETIC_SYMMETRY"] = "1"`` in Python).
682683

683684

684-
Serialization & Data Provenance (``JSON``/``csv``)
685-
--------------------------------------------------
685+
Open Science and Reproducibility
686+
--------------------------------
687+
Robust, open and reproducible science greatly strengthens the impact of research. This is especially true
688+
for computational defect modelling, given the many steps and complexities involved -- see
689+
`Guidelines for robust and reproducible point defect simulations in crystals <https://doi.org/10.26434/chemrxiv-2025-3lb5k>`__
690+
for discussion.
691+
692+
:code:`doped` has been built to aid robustness and reproducibility for computational defect studies.
693+
**We highly recommend** that the :code:`doped`/:code:`ShakeNBreak` class objects, which store key metadata
694+
and can be directly output to lightweight :code:`json(.gz)` files (such as :code:`DefectThermodynamics`,
695+
:code:`DefectsGenerator`, :code:`Distortions` :code:`CompetingPhasesAnalyzer`) be shared in open-access
696+
repositories (e.g. Zenodo, Materials Cloud, Figshare) upon publication, along with relevant raw
697+
computational data. It is also helpful to use the summary functions such as :code:`DefectThermodynamics.get_formation_energies()`, :code:`DefectThermodynamics.get_symmetries_and_degeneracies()`, :code:`CompetingPhasesAnalyzer.get_formation_energy_df()`, :code:`CompetingPhasesAnalyzer.calculate_chempots()`, :code:`CompetingPhasesAnalyzer.to_LaTeX_table()` etc -- which output :code:`pandas` :code:`DataFrame`s which can be output to csv (with :code:`.to_csv()`, see tutorials) and imported to Microsoft Word / converted to LaTeX (`Tables Generator <https://www.tablesgenerator.com>`__) -- to summarise key quantities in Supplementary Information files.
698+
699+
Examples of these practices are shown in
700+
`Intrinsic point defect tolerance in selenium for indoor and tandem photovoltaics <https://doi.org/10.1039/D4EE04647A>`__
701+
& `Defect Tolerance via External Passivation in the Photocatalyst SrTiO₃:Al <https://pubs.acs.org/doi/10.1021/jacs.5c07104>`__.
702+
Further details on serialisation and data provenance utilities in :code:`doped` are given below.
703+
704+
705+
Serialisation & Data Provenance (``JSON``/``csv``)
706+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
686707
To aid calculation reproducibility, data provenance and easy sharing/comparison of pre- and post-processing
687708
stages of the defect workflow, ``doped`` objects have been made fully serializable, meaning they can be
688709
easily saved and (re-)loaded from compact, lightweight ``.json`` files. As demonstrated at

docs/index.rst

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -171,6 +171,22 @@ Studies using ``doped``, so far
171171
.. Sykes Magnetic oxide polarons
172172
.. Kat YTOS
173173
174+
Open Science and Reproducibility
175+
================================
176+
Robust, open and reproducible science greatly strengthens the impact of research. This is especially true
177+
for computational defect modelling, given the many steps and complexities involved -- see
178+
`Guidelines for robust and reproducible point defect simulations in crystals <https://doi.org/10.26434/chemrxiv-2025-3lb5k>`__
179+
for discussion.
180+
181+
``doped`` has been built to aid robustness and reproducibility for computational defect studies.
182+
**We highly recommend** that the ``doped``/``ShakeNBreak`` class objects, which store key metadata and can
183+
be directly output to lightweight ``json(.gz)`` files be shared in open-access repositories upon
184+
publication, along with relevant raw computational data. It is also helpful to use the ``doped`` summary
185+
functions to tabulate key quantities in Supplementary Information files. See the
186+
`Open Science <https://doped.readthedocs.io/en/latest/Tips.html#open-science-and-reproducibility>`__
187+
section of the docs Tips page for details.
188+
189+
174190
Acknowledgements
175191
================
176192

@@ -181,7 +197,7 @@ Direct contributors are listed in the GitHub ``Contributors`` sidebar; including
181197
Alex Squires, Adair Nicolson, Irea Mosquera-Lois, Alex Ganose, Bonan Zhu, Katarina Brlec, Sabrine Hachmioune and Savya
182198
Aggarwal.
183199

184-
`doped` was originally based on the excellent ``PyCDT`` (no longer maintained), but transformed and morphed
200+
``doped`` was originally based on the excellent ``PyCDT`` (no longer maintained), but transformed and morphed
185201
over time as more and more functionality was added. After breaking changes in ``pymatgen``, the package
186202
was entirely refactored and rewritten, to work with the new ``pymatgen-analysis-defects`` package.
187203

doped/utils/legacy_corrections.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,12 @@
44
55
Mostly adapted from the deprecated AIDE package developed by the dynamic duo
66
Adam Jackson and Alex Ganose.
7+
8+
Note that bandfilling corrections are no longer supported, as in most cases
9+
they shouldn't be used (see https://doi.org/10.26434/chemrxiv-2025-3lb5k). If
10+
for some reason bandfilling corrections are desired, they can be manually added
11+
to `corrections` attributes of DefectEntry objects. See
12+
https://github.com/materialsproject/pymatgen/pull/2193
713
"""
814

915
import copy

examples/parsing_tutorial.ipynb

Lines changed: 15 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2179,17 +2179,29 @@
21792179
]
21802180
},
21812181
{
2182-
"cell_type": "code",
21832182
"metadata": {
21842183
"collapsed": false,
21852184
"ExecuteTime": {
21862185
"end_time": "2025-06-09T00:02:48.645087Z",
21872186
"start_time": "2025-06-09T00:02:48.643157Z"
21882187
}
21892188
},
2190-
"source": [],
2189+
"cell_type": "markdown",
2190+
"source": [
2191+
"\n",
2192+
"## Open Science and Reproducibility\n",
2193+
"Robust, open and reproducible science greatly strengthens the impact of research. This is especially true for computational defect modelling, given the many steps and complexities involved -- see [_Guidelines for robust and reproducible point defect simulations in crystals_](https://doi.org/10.26434/chemrxiv-2025-3lb5k) for discussion.\n",
2194+
"\n",
2195+
"`doped` has been built to aid robustness and reproducibility for computational defect studies. **We highly recommend** that the `doped`/`ShakeNBreak` class objects, which store key metadata and can be directly output to lightweight `json(.gz)` files (such as `DefectThermodynamics`, `DefectsGenerator`, `Distortions` `CompetingPhasesAnalyzer`) be shared in open-access repositories (e.g. Zenodo, Materials Cloud, Figshare) upon publication, along with relevant raw computational data. It is also helpful to use the summary functions such as `DefectThermodynamics.get_formation_energies()`, `DefectThermodynamics.get_symmetries_and_degeneracies()`, `CompetingPhasesAnalyzer.get_formation_energy_df()`, `CompetingPhasesAnalyzer.calculate_chempots()`, `CompetingPhasesAnalyzer.to_LaTeX_table()` etc -- which output `pandas` `DataFrame`s which can be output to csv (with `.to_csv()`, see tutorials) and imported to Microsoft Word / converted to LaTeX (https://www.tablesgenerator.com/) -- to summarise key quantities in Supplementary Information files.\n",
2196+
"Examples of these practices are shown in [_Intrinsic point defect tolerance in selenium for indoor and tandem photovoltaics_](https://doi.org/10.1039/D4EE04647A) & [_Defect Tolerance via External Passivation in the Photocatalyst SrTiO₃:Al_](https://pubs.acs.org/doi/10.1021/jacs.5c07104)."
2197+
]
2198+
},
2199+
{
2200+
"metadata": {},
2201+
"cell_type": "code",
21912202
"outputs": [],
2192-
"execution_count": null
2203+
"execution_count": null,
2204+
"source": ""
21932205
}
21942206
],
21952207
"metadata": {

0 commit comments

Comments
 (0)