Skip to content

Commit 903baae

Browse files
authored
Merge pull request #38 from Sparks29032/docs_patch
Finalize diffpy.utils documentation
2 parents a83e669 + 1502c67 commit 903baae

24 files changed

+351
-166
lines changed

CHANGELOG.md

Lines changed: 0 additions & 49 deletions
This file was deleted.

CHANGELOG.rst

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
=============
2+
Release Notes
3+
=============
4+
5+
.. current developments
6+
7+
v3.2.3
8+
====================
9+
10+
**Added:**
11+
12+
* Compatability with Python 3.12.0rc3, 3.11.
13+
* CI Coverage.
14+
* New tests for loadData function.
15+
* loadData function now toggleable. Can return either (a) data read from data blocks or (b) header information stored
16+
above the data block.
17+
18+
**Removed:**
19+
20+
* Remove use of pkg_resources (deprecated).
21+
* No longer use Travis.
22+
23+
24+
25+
v3.1.0
26+
====================
27+
28+
**Added:**
29+
30+
* Compatibility with Python 3.10, 3.9, 3.8.
31+
32+
**Removed:**
33+
34+
* Remove the support for Python 3.5, 3.6.
35+
36+
37+
38+
v3.0.0
39+
====================
40+
41+
**Added:**
42+
43+
* Compatibility with Python 3.7, 3.6, 3.5 in addition to 2.7.
44+
45+
**Changed:**
46+
47+
* Switch to platform-independent "noarch" Anaconda package.
48+
49+
**Deprecated:**
50+
51+
* Variable `__gitsha__` in the `version` module which was renamed to `__git_commit__`.
52+
53+

doc/manual/source/api/diffpy.utils.parsers.rst

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
.. _Parsers Documentation:
2+
13
diffpy.utils.parsers package
24
============================
35

@@ -6,6 +8,8 @@ diffpy.utils.parsers package
68
:undoc-members:
79
:show-inheritance:
810

11+
For a sample data extraction workflow, see :ref:`parsers example<Parsers Example>`.
12+
913
diffpy.utils.parsers.loaddata module
1014
------------------------------------
1115

Binary file not shown.
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
.. _Examples:
2+
3+
:tocdepth: 2
4+
5+
Examples
6+
########
7+
Landing page for diffpy.utils examples.
8+
9+
.. toctree::
10+
parsersexample
11+
resampleexample
Lines changed: 104 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,104 @@
1+
.. _Parsers Example:
2+
3+
:tocdepth: 2
4+
5+
Parsers Example
6+
###############
7+
8+
This example will demonstrate how diffpy.utils lets us easily process and serialize files.
9+
Using the parsers module, we can load file data into simple and easy-to-work-with Python objects.
10+
11+
1) To begin, unzip :download:`parserdata<./exampledata/parserdata.zip>` and take a look at ``data.txt``.
12+
Our goal will be to extract and serialize the data table as well as the parameters listed in the header of this file.
13+
14+
2) To get the data table, we will use the ``loadData`` function. The default behavior of this
15+
function is to find and extract a data table from a file.::
16+
17+
from diffpy.utils.parsers import loadData
18+
data_table = loadData('<PATH to data.txt>')
19+
20+
While this will work with most datasets, on our ``data.txt`` file, we got a ``ValueError``. The reason for this is
21+
due to the comments ``$ Phase Transition Near This Temperature Range`` and ``--> Note Significant Jump in Rw <--``
22+
embedded within the dataset. To fix this, try using the ``comments`` parameter. ::
23+
24+
data_table = loadData('<PATH to data.txt>', comments=['$', '-->'])
25+
26+
This parameter tells ``loadData`` that any lines beginning with ``$`` and ``-->`` are just comments and
27+
more entries in our data table may follow.
28+
29+
Here are a few other parameters to test out:
30+
31+
* ``delimiter=','``: Look for a comma-separated data table. Useful for csv file types.
32+
However, since ``data.txt`` is whitespace separated, running ::
33+
34+
loadData('<PATH to data.txt>', comments=['$', '-->'], delimiter=',')
35+
36+
returns an empty list.
37+
* ``minrows=50``: Only look for data tables with at least 50 rows. Since our data table has much less than that many
38+
rows, running ::
39+
40+
loadData('<PATH to data.txt>', comments=['$', '-->'], minrows=50)
41+
42+
returns an empty list.
43+
* ``usecols=[0, 3]``: Only return the 0th and 3rd columns (zero-indexed) of the data table. For ``data.txt``, this
44+
corresponds to the temperature and rw columns. ::
45+
46+
loadData('<PATH to data.txt>', comments=['$', '-->'], usecols=[0, 3])
47+
48+
3) Next, to get the header information, we can again use ``loadData``,
49+
but this time with the ``headers`` parameter enabled. ::
50+
51+
hdata = loadData('<PATH to data.txt>', comments=['$', '-->'], headers=True)
52+
53+
4) Rather than working with separate ``data_table`` and ``hdata`` objects, it may be easier to combine them into a single
54+
dictionary. We can do so using the ``serialize_data`` function. ::
55+
56+
from diffpy.utils.parsers import serialize_data
57+
file_data = serialize_data('<PATH to data.txt', hdata, data_table)
58+
# File data is a dictionary with a single key
59+
# The key is the file name (in our case, 'data.txt')
60+
# The entry is a dictionary containing data from hdata and data_table
61+
data_dict = file_data['data.txt']
62+
63+
This dictionary ``data_dict`` contains all entries in ``hdata`` and an additional entry named
64+
``data table`` containing ``data_table``. ::
65+
66+
here_is_the_data_table = data_dict['data table']
67+
68+
There is also an option to name columns in the data table and save those columns as entries instead. ::
69+
70+
data_table_column_names = ['temperature', 'scale', 'stretch', 'rw'] # names of the columns in data.txt
71+
file_data = serialize_data('<PATH to data.txt>', hdata, data_table, dt_colnames=data_table_column_names)
72+
data_dict = file_data['data.txt']
73+
74+
Now we can extract specific data table columns from the dictionary. ::
75+
76+
data_table_temperature_column = data_dict['temperature']
77+
data_table_rw_column = data_dict['rw']
78+
79+
5) When we are done working with the data, we can store it on disc for later use. This can also be done using the
80+
``serialize_data`` function with an additional ``serial_file`` parameter.::
81+
82+
parsed_file_data = serialize_data('<PATH to data.txt>', hdata, data_table, serial_file='<PATH to serialfile.json>')
83+
84+
The returned value, ``parsed_file_data``, is the dictionary we just added to ``serialfile.json``.
85+
To extract the data from the serial file, we use ``deserialize_data''. ::
86+
87+
from diffpy.utils.parsers import deserialize_data
88+
parsed_file_data = deserialize_data('<PATH to serialdata.json>')
89+
90+
6) Finally, ``serialize_data`` allows us to store data from multiple text file in a single serial file. For one last bit
91+
of practice, we will extract and add the data from ``moredata.txt`` into the same ``serialdata.json`` file.::
92+
93+
data_table = loadData('<PATH to moredata.txt>')
94+
hdata = loadData('<PATH to moredata.txt>', headers=True)
95+
serialize_data('<PATH to moredata.txt>', hdata, data_table, serial_file='<PATH to serialdata.json>')
96+
97+
The serial file ``serialfile.json`` should now contain two entries: ``data.txt`` and ``moredata.txt``.
98+
The data from each file can be accessed using ::
99+
100+
serial_data = deserialize_data('<PATH to serialdata.json>')
101+
data_txt_data = serial_data['data.txt'] # Access data.txt data
102+
moredata_txt_data = serial_data['moredata.txt'] # Access moredata.txt data
103+
104+
For more information, check out the :ref:`documentation<Parsers Documentation>` of the ``parsers`` module.
Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,78 @@
1+
.. _Resample Example:
2+
3+
:tocdepth: 2
4+
5+
Resampling Example
6+
##################
7+
8+
This example will demonstrate how we can use diffpy.utils functions to resample a function on a denser grid.
9+
Specifically, we will resample the grid of one function to match another for us to easily compare the two.
10+
Then we will show how this resampling method lets us create a perfect reconstruction of certain functions
11+
given enough datapoints.
12+
13+
1) To start, unzip :download:`parserdata<./exampledata/parserdata.zip>`. Then, load the data table from ``Nickel.gr``
14+
and ``NiTarget.gr``. These datasets are based on data from `Atomic Pair Distribution Function Analysis: A Primer
15+
<https://global.oup.com/academic/product/atomic-pair-distribution-function-analysis-9780198885801?cc=us&lang=en&>`_.
16+
::
17+
18+
from diffpy.utils.parsers import loadData
19+
nickel_datatable = loadData('<PATH to Nickel.gr>')
20+
nitarget_datatable = loadData('<PATH to NiTarget.gr>')
21+
22+
Each data table has two columns: first is the grid and second is the function value.
23+
To extract the columns, we can utilize the serialize function ... ::
24+
25+
from diffpy.utils.parsers import serialize_data
26+
nickel_data = serialize_data('Nickel.gr', {}, nickel_datatable, dt_colnames=['grid', 'func'])
27+
nickel_grid = nickel_data['Nickel.gr']['grid']
28+
nickel_func = nickel_data['Nickel.gr']['func']
29+
target_data = serialize_data('NiTarget.gr', {}, nitarget_datatable, dt_colnames=['grid', 'function'])
30+
target_grid = nickel_data['Nickel.gr']['grid']
31+
target_func = nickel_data['Nickel.gr']['func']
32+
33+
... or you can use any other column extracting method you prefer.
34+
35+
2) If we plot the two on top of each other ::
36+
37+
import matplotlib.pyplot as plt
38+
plt.plot(target_grid, target_func, linewidth=3)
39+
plt.plot(nickel_grid, nickel_func, linewidth=1)
40+
41+
they look pretty similar, but to truly see the difference, we should plot the difference between the two.
42+
We may want to run something like ... ::
43+
44+
import numpy as np
45+
difference = np.subtract(target_func, nickel_func)
46+
47+
... but this will only produce the right result if the ``target_func`` and ``nickel_func`` are on the same grid.
48+
Checking the lengths of ``target_grid`` and ``nickel_grid`` shows that these grids are clearly distinct.
49+
50+
3) However, we can resample the two functions to be on the same grid. Since both functions have grids spanning
51+
``[0, 60]``, let us define a new grid ... ::
52+
53+
grid = np.linspace(0, 60, 6001)
54+
55+
... and use the diffpy.utils ``wsinterp`` function to resample on this grid.::
56+
57+
from diffpy.utils.parsers import wsinterp
58+
nickel_resample = wsinterp(grid, nickel_grid, nickel_func)
59+
target_resample = wsinterp(grid, target_grid, target_func)
60+
61+
We can now plot the difference to see that these two functions are in fact equal.:
62+
63+
plt.plot(grid, target_resample - nickel_resample)
64+
65+
This is the desired result as the data in ``Nickel.gr`` is every tenth data point in ``NiTarget.gr``.
66+
This also shows us that ``wsinterp`` can help us reconstruct a function from incomplete data.
67+
68+
4) In order for our function reconstruction to be perfect, we require that (a) the function is a Fourier transform of a
69+
band-limited dataset and (b) the original grid has enough equally-spaced datapoints based on the Nyquist sampling
70+
theorem.
71+
72+
* If our function :math:`F(r)` is of the form :math:`F(r) = \int_0^{qmax} f(q)e^{-iqr}dq` where :math:`qmax` is
73+
the bandlimit, then for a grid spanning :math:`r \in [rmin, rmax]`, the Nyquist sampling theorem tells us we
74+
require at least :math:`qmax * (rmin - rmax) / \pi` equally-spaced datapoints.
75+
76+
In the case of our dataset, our band-limit is ``qmax=25.0`` and our function spans :math:`r \in (0.0, 60.0)`.
77+
Thus, our original grid requires :math:`25.0 * 60.0 / \pi < 478`. Since our grid has :math:`601` datapoints, our
78+
reconstruction was perfect as shown from the comparison between ``Nickel.gr`` and ``NiTarget.gr``.

doc/manual/source/index.rst

Lines changed: 12 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -7,11 +7,17 @@ diffpy.utils - general purpose shared utilities for the diffpy libraries.
77
| Software version |release|.
88
| Last updated |today|.
99
10-
The diffpy.utils package provides functions for extracting array data from
11-
variously formatted text files and wx GUI utilities used by the PDFgui
12-
program. The package also includes interpolation function based on the
13-
Whittaker-Shannon formula that can be used to resample a PDF or other profile
14-
function over a new grid.
10+
The diffpy.utils package provides general functions for extracting data from variously formatted text files as well as
11+
some PDF-specific functionality. These include wx GUI utilities used by the PDFgui program and an interpolation function
12+
based on the Whittaker-Shannon formula for resampling a bandlimited PDF or other profile function.
13+
14+
========
15+
Examples
16+
========
17+
Illustrations of when and how one would use various diffpy.utils functions.
18+
19+
* :ref:`File Data Extraction<Parsers Example>`
20+
* :ref:`Resampling & Data Reconstruction<Resample Example>`
1521

1622
=======
1723
Authors
@@ -40,6 +46,7 @@ Table of contents
4046

4147
license
4248
release
49+
Examples <examples/examples>
4350
Package API <api/diffpy.utils>
4451

4552
======================================

doc/manual/source/license.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
:tocdepth: 2
2+
13
.. index:: license
24

35
License

doc/manual/source/release.rst

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
:tocdepth: 2
2+
13
.. index:: release notes
24

3-
.. mdinclude:: ../../../CHANGELOG.md
5+
.. include:: ../../../CHANGELOG.rst

0 commit comments

Comments
 (0)