Skip to content

Commit d40b807

Browse files
committed
Open source commit. Original code history on gitlab NMF_analysis
0 parents  commit d40b807

File tree

195 files changed

+296797
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

195 files changed

+296797
-0
lines changed

Diff for: .gitignore

+23
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
cache-directory/*
2+
*.pyc
3+
lib/*
4+
.pytest_cache/*
5+
6+
# Rever
7+
rever/
8+
9+
instance/*.json
10+
download/*
11+
.idea/
12+
13+
14+
*.egg-info
15+
*.egg-info/
16+
bin
17+
develop-eggs
18+
dist
19+
lib
20+
lib64
21+
eggs
22+
parts
23+
.installed.cfg

Diff for: .travis.yml

+36
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
sudo: False
2+
3+
language: python
4+
python:
5+
- "3.7"
6+
cache:
7+
directories:
8+
- $HOME/.cache/pip
9+
10+
matrix:
11+
include:
12+
- python: 3.7
13+
14+
15+
install:
16+
# Install conda
17+
- wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh
18+
- bash miniconda.sh -b -p $HOME/miniconda
19+
- export PATH="$HOME/miniconda/bin:$PATH"
20+
- conda config --set always_yes yes --set changeps1 no
21+
- conda config --add channels conda-forge
22+
- conda config --add channels diffpy
23+
- conda update conda
24+
# Install dependencies
25+
- conda create -n test --file requirements/run.txt python=3.7
26+
- source activate test
27+
- python setup.py install
28+
29+
script:
30+
- set -e
31+
- conda install --file requirements/test.txt
32+
- python -m pytest
33+
34+
notifications:
35+
email: false
36+

Diff for: AUTHORS.txt

+2
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
Simon J. L. Billinge <[email protected]>
2+
Zachary A. Thatcher<[email protected]>

Diff for: LICENSE.txt

+33
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
This program is part of the DiffPy open-source project at Columbia
2+
University and is available subject to the conditions and terms laid out below.
3+
4+
Copyright © 2009-2019, Trustees of Columbia University in the City of New York,
5+
all rights reserved.
6+
7+
For more information please visit the diffpy web-page at http://diffpy.org or
8+
email Prof. Simon Billinge at [email protected].
9+
10+
Redistribution and use in source and binary forms, with or without
11+
modification, are permitted provided that the following conditions are met:
12+
13+
* Redistributions of source code must retain the above copyright notice, this
14+
list of conditions and the following disclaimer.
15+
16+
* Redistributions in binary form must reproduce the above copyright notice,
17+
this list of conditions and the following disclaimer in the documentation
18+
and/or other materials provided with the distribution.
19+
20+
* Neither the names of COLUMBIA UNIVERSITY, MICHIGAN STATE UNIVERSITY nor the
21+
names of their contributors may be used to endorse or promote products
22+
derived from this software without specific prior written permission.
23+
24+
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
25+
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
26+
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
27+
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE
28+
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
29+
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
30+
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
31+
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
32+
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
33+
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Diff for: README.md

+84
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
NMF Mapping for PDF or XRD Files
2+
---------
3+
This package takes a directory containing diffraction files in .gr (or .xy/.xye) format and performs an NMF decomposition of
4+
the components with the goal of determining the number of structural phases present, and when these phases are
5+
present if the data provided comes from a time series. Any non .gr ( or .xy/.xye) files or .dat files in
6+
the directory will be ignored and skipped in the calculation.
7+
8+
Use
9+
---------
10+
This package is the backend logic for pdfitc.org/NMF. Please consider utilizing pdfitc.org/NMF prior to this tool, if
11+
possible. If your NMF analysis requires some feature from this CLI that isn't present on the website, please let us know
12+
and we will consider adding the feature to the pdfitc.org interface.
13+
14+
Installation
15+
--------
16+
- Install requirements from run.txt via "conda (or pip) install --file (or -r) 'requirements/run.txt'"
17+
- Install using "pip install -e ." in a python 3 environment
18+
19+
Argparse
20+
--------
21+
Input:
22+
- directory: path to the directory containing the diffraction files that are to be analyzed
23+
- format: string (filepath)
24+
- eg: '/Users/zthatcher/Desktop/Data/nmf_mapping/time_data/' or . for cwd
25+
26+
- save-files (optional): boolean as to whether or not you would like to save the dataframes, plots, and
27+
components (note: pdf data saves as .cgr and xrd data saves as .xy)
28+
- format: boolean
29+
- eg: --save-files False
30+
- default: True
31+
32+
- threshold (optional and mut-exc to other thresholds): a threshold for the number of structural phases graphed (NMF components returned)
33+
- format as: integer
34+
- eg: --threshold 2
35+
- default: 10
36+
37+
- improve-thresh (optional and mut-exc to other thresholds): a threshold (between 0 and 1) for the relative improvement ratio necessary to
38+
add an additional component. Default is 0.001. 0.1 Recommended for real data.
39+
- format: float
40+
- eg: --improve-thresh 0.1
41+
- default = 0.001
42+
43+
- pca-thresh (optional and mut-exc to other thresholds): explained variance threshold for PCA component counting cutoff
44+
- format: float
45+
- eg: --pca-thresh 0.95
46+
- default = None
47+
48+
- n-iter (optional): total number of iterations to run NMF algo. Defaults to 1000. 10000 typical to publish.
49+
- format: int
50+
- eg: --n-iter 10000
51+
- default: 1000
52+
53+
- x-range (optional): the active x-range over which to run the NMF analysis (must be between shortest and
54+
longest range in the set of files)
55+
- format: pair of integers representing the lower r bound and the upper r bound with a comma between
56+
the lower and upper bound
57+
- eg: --xrange 5,10 12,15
58+
- default: entire range
59+
60+
- xrd (optional): set this option if the directory contains xy or xye files rather than gr.
61+
- format: boolean
62+
- eg: --xrd True
63+
- default: False
64+
65+
- x_units (required if xrd): set this as either twotheta or q if working with xrd data.
66+
- format: enum[str]
67+
- eg: --x_units twotheta
68+
- default: None (since --xrd defaults to False)
69+
70+
- show graphs (optional): whether you or not you would like display the images
71+
- format: boolean
72+
- eg: --show False
73+
- default: True
74+
75+
Returns:
76+
- Figure One: PDF or XRD pattern of structural phase components contributing to the NMF reconstruction
77+
- Figure Two: Weights of the phase components plotted in Figure One
78+
- Figure Three: Reconstruction error as a function of components
79+
- (Optional) Figure Four: Explained Variance plot as a function of components for PCA thresholding
80+
81+
Example:
82+
83+
nmf_mapping . --threshold 3 --xrange 5,10 --show True
84+

Diff for: diffpy/__init__.py

+23
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
#!/usr/bin/env python
2+
##############################################################################
3+
#
4+
# diffpy by DANSE Diffraction group
5+
# Simon J. L. Billinge
6+
# (c) 2008 Trustees of the Columbia University
7+
# in the City of New York. All rights reserved.
8+
#
9+
# File coded by: Pavol Juhas
10+
#
11+
# See AUTHORS.txt for a list of people who contributed.
12+
# See LICENSE.txt for license information.
13+
#
14+
##############################################################################
15+
16+
"""nmf_mapping - tools for performing NMF on PDF and XRD data.
17+
"""
18+
19+
20+
__import__('pkg_resources').declare_namespace(__name__)
21+
22+
23+
# End of file

Diff for: diffpy/nmf_mapping/__init__.py

+26
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
#!/usr/bin/env python
2+
##############################################################################
3+
#
4+
# nmf_mapping by DANSE Diffraction group
5+
# Simon J. L. Billinge
6+
# (c) 2006 trustees of the Michigan State University.
7+
# All rights reserved.
8+
#
9+
# File coded by: Chris Farrow
10+
#
11+
# See AUTHORS.txt for a list of people who contributed.
12+
# See LICENSE.txt for license information.
13+
#
14+
##############################################################################
15+
16+
"""Tools for manipulating and comparing PDFs.
17+
"""
18+
19+
__id__ = "$Id$"
20+
21+
# obtain version information
22+
__version__ = '0.0.1'
23+
24+
# top-level import
25+
from diffpy.nmf_mapping.nmf_mapping import nmf_mapping_code as nmf
26+
# End of file

Diff for: diffpy/nmf_mapping/nmf_mapping/main.py

+155
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,155 @@
1+
2+
import os
3+
import sys
4+
from argparse import ArgumentParser, RawTextHelpFormatter, Namespace
5+
import time
6+
from datetime import datetime
7+
8+
from diffpy.nmf_mapping import nmf
9+
import numpy as np
10+
11+
12+
def boolean_string(s):
13+
try:
14+
if s.lower() not in {'false', 'true'}:
15+
raise ValueError('Not a valid boolean string')
16+
except AttributeError:
17+
raise ValueError('Not a valid boolean string')
18+
return s.lower() == 'true'
19+
20+
21+
def main(args=None):
22+
"""
23+
Parses directory argument supplied by user and conducts NMF decomposition
24+
analysis (computes NMF decomposition and shows the weights over time).
25+
"""
26+
27+
_BANNER = """
28+
This is a package which takes a directory of 1D diffraction files
29+
(xrd or pdf) and returns json files containing the decomposed components,
30+
the phase fraction of these components from file to file,
31+
as well as the reconstruction error as a fxn of component
32+
"""
33+
34+
parser = ArgumentParser(prog='nmf_mapping',
35+
description=_BANNER, formatter_class=RawTextHelpFormatter)
36+
37+
def tup(s):
38+
try:
39+
l, h = map(int, s.split(','))
40+
return l,h
41+
except:
42+
raise TypeError('r range must be low, high')
43+
44+
# args
45+
parser.add_argument("directory", default=None, type=str,
46+
help="a directory of PDFs to calculate NMF decomposition")
47+
group = parser.add_mutually_exclusive_group()
48+
parser.add_argument("--save-files", default=True, type=boolean_string,
49+
help='whether to save the component, graph, and json files in the execution directory\n'
50+
'default: True\n'
51+
'e.g. --save-files False')
52+
group.add_argument("--threshold", default=None, type=int,
53+
help="a threshold for the number of structural phases graphed (NMF components returned)\n"
54+
"e.g. --threshold 3")
55+
group.add_argument("--improve-thresh", default=None, type=float,
56+
help="a threshold (between 0 and 1) for the relative improvement ratio necessary to add an"
57+
" additional component. Default is 0.001. 0.1 Recommended for real data.\n"
58+
"e.g. --improve-thresh 0.1")
59+
group.add_argument("--pca-thresh", default=None, type=float,
60+
help="a threshold (between 0 and 1) for the explained variance of PCA to determine the \n"
61+
"number of components for NMF. e.g. --pca-thresh 0.95")
62+
parser.add_argument("--n-iter", default=None, type=int,
63+
help="total number of iterations to run NMF algo. Defaults to 1000. 10000 typical to publish.")
64+
parser.add_argument("--xrd", default=False, type=boolean_string,
65+
help="whether to look for .xy files rather than .gr files\n"
66+
"default: False\n"
67+
"e.g. --xrd True")
68+
parser.add_argument("--x_units", default=None, type=str, choices=["twotheta", "q"], required='--xrd' in sys.argv,
69+
help="x axis units for XRD data\n"
70+
"default: None\n"
71+
"e.g. --x_units twotheta")
72+
parser.add_argument("--xrange", default=None, type=tup, nargs='*',
73+
help="the x-range over which to calculate NMF, can be multiple ranges (e.g. --xrange 5,10 12,15)")
74+
parser.add_argument("--show", default=True, type=boolean_string,
75+
help='whether to show the plot')
76+
args0 = Namespace()
77+
args1, _ = parser.parse_known_args(args, namespace=args0)
78+
79+
input_list, data_list = nmf.load_data(args1.directory, args1.xrd)
80+
if args1.pca_thresh:
81+
df_components, df_component_weight_timeseries, df_reconstruction_error, df_explained_var_ratio = \
82+
nmf.NMF_decomposition(input_list, args1.xrange, args1.threshold, additional_comp=False,
83+
improve_thresh=args1.improve_thresh, n_iter=args1.n_iter,
84+
pca_thresh=args1.pca_thresh)
85+
else:
86+
df_components, df_component_weight_timeseries, df_reconstruction_error = \
87+
nmf.NMF_decomposition(input_list, args1.xrange, args1.threshold, additional_comp=False,
88+
improve_thresh=args1.improve_thresh, n_iter=args1.n_iter)
89+
90+
print(f'Number of components: {len(df_components.columns)}')
91+
92+
fig1 = nmf.component_plot(df_components, args1.xrd, args1.x_units, args1.show)
93+
fig2 = nmf.component_ratio_plot(df_component_weight_timeseries, args1.show)
94+
fig3 = nmf.reconstruction_error_plot(df_reconstruction_error, args1.show)
95+
if args1.pca_thresh:
96+
fig4 = nmf.explained_variance_plot(df_explained_var_ratio, args1.show)
97+
98+
if args1.save_files:
99+
if not os.path.exists(os.path.join(os.getcwd(), 'nmf_result')):
100+
os.mkdir(os.path.join(os.getcwd(), 'nmf_result'))
101+
output_fn = datetime.fromtimestamp(time.time()).strftime(
102+
'%Y%m%d%H%M%S%f')
103+
df_components.to_json(os.path.join(os.getcwd(), 'nmf_result', 'x_index_vs_y_col_components.json'))
104+
df_component_weight_timeseries.to_json(os.path.join(os.getcwd(), 'nmf_result', 'component_index_vs_pratio_col.json'))
105+
df_component_weight_timeseries.to_csv(os.path.join(os.getcwd(), 'nmf_result', output_fn + 'component_row_pratio_col.txt'), header=None, index=False, sep=' ', mode='a')
106+
df_reconstruction_error.to_json(os.path.join(os.getcwd(), 'nmf_result', 'component_index_vs_RE_value.json'))
107+
plot_file1 = os.path.join(os.getcwd(), 'nmf_result', output_fn + "comp_plot.png")
108+
plot_file2 = os.path.join(os.getcwd(), 'nmf_result', output_fn + "ratio_plot.png")
109+
plot_file3 = os.path.join(os.getcwd(), 'nmf_result', output_fn + "loss_plot.png")
110+
if args1.pca_thresh:
111+
plot_file7 = os.path.join(os.getcwd(), 'nmf_result', output_fn + "pca_var_plot.png")
112+
plot_file4 = os.path.splitext(plot_file1)[0] + '.pdf'
113+
plot_file5 = os.path.splitext(plot_file2)[0] + '.pdf'
114+
plot_file6 = os.path.splitext(plot_file3)[0] + '.pdf'
115+
if args1.pca_thresh:
116+
plot_file8 = os.path.splitext(plot_file7)[0] + '.pdf'
117+
txt_file = os.path.join(os.getcwd(), 'nmf_result', output_fn + '_meta' + '.txt')
118+
with open(txt_file, 'w+') as fi:
119+
fi.write('NMF Analysis\n\n')
120+
fi.write(f'{len(df_component_weight_timeseries.columns)} files uploaded for analysis.\n\n')
121+
fi.write(f'The selected active r ranges are: {args1.xrange} \n\n')
122+
fi.write('Thesholding:\n')
123+
fi.write(f'\tThe input component threshold was: {args1.threshold}\n')
124+
fi.write(f'\tThe input improvement threshold was: {args1.improve_thresh}\n')
125+
fi.write(f'\tThe input # of iterations to run was: {args1.n_iter}\n')
126+
fi.write(f'\tWas PCA thresholding used?: {args1.pca_thresh}\n')
127+
fi.write(f'{len(df_components.columns)} components were extracted')
128+
129+
fig1.savefig(plot_file1)
130+
fig2.savefig(plot_file2)
131+
fig3.savefig(plot_file3)
132+
if args1.pca_thresh:
133+
fig4.savefig(plot_file7)
134+
fig1.savefig(plot_file4)
135+
fig2.savefig(plot_file5)
136+
fig3.savefig(plot_file6)
137+
if args1.pca_thresh:
138+
fig4.savefig(plot_file8)
139+
columns = df_components.columns
140+
for i, col in enumerate(columns):
141+
data = np.column_stack([df_components.index.to_list(), df_components[col].to_list()])
142+
143+
if args1.xrd:
144+
np.savetxt(os.path.join(os.getcwd(), 'nmf_result', output_fn + f'_comp{i}' + '.xy'), data,
145+
header=f"NMF Generated XRD\nSource = nmfMapping\n"
146+
f"Date = {output_fn}\n{args1.x_units} Intensity\n", fmt='%s',
147+
comments="' ")
148+
else:
149+
np.savetxt(os.path.join(os.getcwd(), 'nmf_result', output_fn + f'_comp{i}' + '.cgr'), data,
150+
header=f"NMF Generated PDF\nSource: nmfMapping\n"
151+
f"Date: {output_fn}\nr g", fmt='%s')
152+
153+
154+
if __name__ == "__main__":
155+
main()

0 commit comments

Comments
 (0)