-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathreadme.txt
49 lines (35 loc) · 1.55 KB
/
readme.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
feature-selection
=================
This implementation to calculate the feature selection score in three different algorithms:
pmi: Pointwise mutual information - reference: http://en.wikipedia.org/wiki/Pointwise_mutual_information
avg: average rule.
chi2: Pearson's chi-squared test - reference: http://en.wikipedia.org/wiki/Pearson%27s_chi-squared_test
works on binary datasets as input see below:
===========
A 1:1 2:1 3:0 4:1
B 1:0 2:1 3:0 4:0
C 1:0 2:1 3:0 4:0
A 1:1 2:1 3:0 4:0
A 1:1 2:1 3:1 4:0
C 1:0 2:1 3:1 4:0
============
The input file in Libsvm format - reference: http://www.csie.ntu.edu.tw/~cjlin/libsvm/
- http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/
The first column is the label column, then each column after represents a feature values related to the label
"1:0": the first integer represents the key of the feature, then a seperator ":",
then another binary value [0|1] represent if this feature is: 1: active or 0: not active.
Usage:
>java eu.fbk.phd.featureSelection <arg1> <arg2> <arg3> [<arg4>]
<arg1> Input file name.
<arg2> Choose the feature selection metric[chi|avg|pmi].
<arg3> Output file name.
<arg4> Select the active feature index(s): '0,2,3', otherwise all the features will be considered.
Example: to run avg method: java eu.fbk.phd.featureSelection test/test avg test/oo.txt
The output file:
2 1.0
1 0.5
3 0.3333333432674408
4 0.1666666716337204
This first column has integers represent the feature column index,
in the same row of each index you could see the score for it,
depending on the running method.