Skip to content

Commit 92753e0

Browse files
janismdhanbadrasbt
authored andcommitted
adding version to iris dataset (#539)
* adding version to iris dataset * adding version to iris dataset * adding test cases for iris dataset * adding test cases for iris dataset * some minor changes
1 parent e78846f commit 92753e0

File tree

5 files changed

+1719
-17
lines changed

5 files changed

+1719
-17
lines changed

Diff for: docs/sources/CHANGELOG.md

+3
Original file line numberDiff line numberDiff line change
@@ -16,8 +16,11 @@ The CHANGELOG for the current development version is available at
1616

1717
##### New Features
1818

19+
- Added an enhancement to the existing `iris_data()` such that both the UCI Repository version of the Iris dataset as well as the corrected, original
20+
version of the dataset can be loaded, which has a slight difference in two data points (consistent with Fisher's paper; this is also the same as in R). (via [#539](https://github.com/rasbt/mlxtend/pull/532) via [janismdhanbad](https://github.com/janismdhanbad))
1921
- Add optional `groups` parameter to `SequentialFeatureSelector` and `ExhaustiveFeatureSelector` `fit()` methods for forwarding to sklearn CV ([#537](https://github.com/rasbt/mlxtend/pull/537) via [arc12](https://github.com/qiaguhttps://github.com/arc12))
2022

23+
2124
##### Changes
2225

2326
- -

Diff for: docs/sources/user_guide/data/iris_data.ipynb

+21-8
Original file line numberDiff line numberDiff line change
@@ -131,7 +131,7 @@
131131
"text": [
132132
"## iris_data\n",
133133
"\n",
134-
"*iris_data()*\n",
134+
"*iris_data(version='uci')*\n",
135135
"\n",
136136
"Iris flower dataset.\n",
137137
"\n",
@@ -153,6 +153,17 @@
153153
" - 3) petal length [cm]\n",
154154
" - 4) petal width [cm]\n",
155155
"\n",
156+
"\n",
157+
"**Parameters**\n",
158+
"\n",
159+
"- `version` : string, optional (default: 'uci').\n",
160+
"\n",
161+
" Version to use {'uci', 'corrected'}. 'uci' loads the dataset\n",
162+
" as deposited on the UCI machine learning repository, and\n",
163+
" 'corrected' provides the version that is consistent with\n",
164+
" Fisher's original paper. See Note for details.\n",
165+
"\n",
166+
"\n",
156167
"**Returns**\n",
157168
"\n",
158169
"- `X, y` : [n_samples, n_features], [n_class_labels]\n",
@@ -167,10 +178,12 @@
167178
"\n",
168179
"The Iris dataset (originally collected by Edgar Anderson) and\n",
169180
" available in UCI's machine learning repository is different from\n",
170-
" the Iris dataset available in R(and the one in the original paper\n",
171-
" by R.A. Fisher [1]). Precisely, there are two data points(row number\n",
181+
" the Iris dataset described in the original paper by R.A. Fisher [1]).\n",
182+
" Precisely, there are two data points (row number\n",
172183
" 34 and 37) in UCI's Machine Learning repository are different from the\n",
173-
" Iris dataset in R (and the one in the original Fisher paper).\n",
184+
" origianlly published Iris dataset. Also, the original version of the Iris\n",
185+
" Dataset, which can be loaded via `version='corrected'` is the same\n",
186+
" as the one in R.\n",
174187
"\n",
175188
" [1] . A. Fisher (1936). \"The use of multiple measurements in taxonomic\n",
176189
" problems\". Annals of Eugenics. 7 (2): 179–188\n",
@@ -193,9 +206,9 @@
193206
"metadata": {
194207
"anaconda-cloud": {},
195208
"kernelspec": {
196-
"display_name": "mlxtend_dev",
209+
"display_name": "Python 3",
197210
"language": "python",
198-
"name": "mlxtend_dev"
211+
"name": "python3"
199212
},
200213
"language_info": {
201214
"codemirror_mode": {
@@ -207,7 +220,7 @@
207220
"name": "python",
208221
"nbconvert_exporter": "python",
209222
"pygments_lexer": "ipython3",
210-
"version": "3.6.8"
223+
"version": "3.7.1"
211224
},
212225
"toc": {
213226
"nav_menu": {},
@@ -223,5 +236,5 @@
223236
}
224237
},
225238
"nbformat": 4,
226-
"nbformat_minor": 1
239+
"nbformat_minor": 2
227240
}

0 commit comments

Comments
 (0)