Skip to content

Commit 3601cff

Browse files
committedOct 7, 2013
initial version (v7.0)
1 parent f93adc3 commit 3601cff

File tree

6 files changed

+1797
-0
lines changed

6 files changed

+1797
-0
lines changed
 

‎README.txt

+124
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,124 @@
1+
NAME
2+
CorScorer: Perl package for scoring coreference resolution systems
3+
using different metrics.
4+
5+
6+
VERSION
7+
v7.0 -- reference implementations of MUC, B-cubed and CEAF metrics.
8+
9+
10+
INSTALLATION
11+
Requirements:
12+
1. Perl: downloadable from http://perl.org
13+
2. Algorithm-Munkres: included in this package and downloadable
14+
from CPAN http://search.cpan.org/~tpederse/Algorithm-Munkres-0.08
15+
16+
USE
17+
This package is distributed with two scripts to execute the scorer from
18+
the command line.
19+
20+
Windows (tm): scorer.bat
21+
Linux: scorer.pl
22+
23+
24+
SYNOPSIS
25+
use CorScorer;
26+
27+
$metric = 'ceafm';
28+
29+
# Scores the whole dataset
30+
&CorScorer::Score($metric, $keys_file, $response_file);
31+
32+
# Scores one file
33+
&CorScorer::Score($metric, $keys_file, $response_file, $name);
34+
35+
36+
INPUT
37+
metric: the metric desired to score the results:
38+
muc: MUCScorer (Vilain et al, 1995)
39+
bcub: B-Cubed (Bagga and Baldwin, 1998)
40+
ceafm: CEAF (Luo et al, 2005) using mention-based similarity
41+
ceafe: CEAF (Luo et al, 2005) using entity-based similarity
42+
all: uses all the metrics to score
43+
44+
keys_file: file with expected coreference chains in SemEval format
45+
46+
response_file: file with output of coreference system (SemEval format)
47+
48+
name: [optional] the name of the document to score. If name is not
49+
given, all the documents in the dataset will be scored. If given
50+
name is "none" then all the documents are scored but only total
51+
results are shown.
52+
53+
54+
OUTPUT
55+
The score subroutine returns an array with four values in this order:
56+
1) Recall numerator
57+
2) Recall denominator
58+
3) Precision numerator
59+
4) Precision denominator
60+
61+
Also recall, precision and F1 are printed in the standard output when variable
62+
$VERBOSE is not null.
63+
64+
Final scores:
65+
Recall = recall_numerator / recall_denominator
66+
Precision = precision_numerator / precision_denominator
67+
F1 = 2 * Recall * Precision / (Recall + Precision)
68+
69+
Identification of mentions
70+
An scorer for identification of mentions (recall, precision and F1) is also included.
71+
Mentions from system response are compared with key mentions. There are two kind of
72+
positive matching response mentions:
73+
74+
a) Strictly correct identified mentions: Tokens included in response mention are exactly
75+
the same tokens included in key mention.
76+
77+
b) Partially correct identified mentions: Response mention tokens include the head token
78+
of the key mention and no new tokens are added (i.e. the key mention bounds are not
79+
overcome).
80+
81+
82+
The partially correct mentions can be given some credit (for
83+
example, a weight of 0.5) to get a combined score as follows:
84+
85+
Recall = (a + 0.5 * b) / #key mentions
86+
Precision = (a + 0.5 * b) / #response mentions
87+
F1 = 2 * Recall * Precision / (Recall + Precision)
88+
89+
For the official CoNLL evaluation, however, we will only consider
90+
mentions with exact boundaries as being correct.
91+
92+
SEE ALSO
93+
94+
1. http://stel.ub.edu/semeval2010-coref/
95+
96+
2. Marta Recasens, Lluís Màrquez, Emili Sapena, M. Antònia Martí, Mariona Taulé,
97+
Véronique Hoste, Massimo Poesio, and Yannick Versley. 2010. SemEval-2010 Task 1:
98+
Coreference Resolution in Multiple Languages. In Proceedings of the ACL International
99+
Workshop on Semantic Evaluation (SemEval-2010), pp. 1-8, Uppsala, Sweden.
100+
101+
102+
AUTHOR
103+
Emili Sapena, Universitat Politècnica de Catalunya
104+
http://www.lsi.upc.edu/~esapena
105+
esapena <at> lsi.upc.edu
106+
107+
108+
COPYRIGHT AND LICENSE
109+
Copyright (C) 2009-2011, Emili Sapena esapena <at> lsi.upc.edu
110+
111+
This program is free software; you can redistribute it and/or modify it
112+
under the terms of the GNU General Public License as published by the
113+
Free Software Foundation; either version 2 of the License, or (at your
114+
option) any later version. This program is distributed in the hope that
115+
it will be useful, but WITHOUT ANY WARRANTY; without even the implied
116+
warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
117+
GNU General Public License for more details.
118+
119+
You should have received a copy of the GNU General Public License along
120+
with this program; if not, write to the Free Software Foundation, Inc.,
121+
59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
122+
123+
Modified in 2013 for v1.07 by Sebastian Martschat,
124+
sebastian.martschat <at> h-its.org

‎lib/Algorithm/Munkres.pm

+596
Large diffs are not rendered by default.

‎lib/Algorithm/README

+130
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,130 @@
1+
NAME
2+
Algorithm-Munkres : Perl extension for Munkres' solution to
3+
classical Assignment problem for square and rectangular matrices
4+
This module extends the solution of Assignment problem for square
5+
matrices to rectangular matrices by padding zeros. Thus a rectangular
6+
matrix is converted to square matrix by padding necessary zeros.
7+
8+
SYNOPSIS
9+
use Algorithm::Munkres;
10+
11+
@mat = (
12+
[2, 4, 7, 9],
13+
[3, 9, 5, 1],
14+
[8, 2, 9, 7],
15+
);
16+
17+
assign(\@mat,\@out_mat);
18+
19+
Then the @out_mat array will have the output as: (0,3,1,2),
20+
where
21+
0th element indicates that 0th row is assigned 0th column i.e value=2
22+
1st element indicates that 1st row is assigned 3rd column i.e.value=1
23+
2nd element indicates that 2nd row is assigned 1st column.i.e.value=2
24+
3rd element indicates that 3rd row is assigned 2nd column.i.e.value=0
25+
26+
DESCRIPTION
27+
Assignment Problem: Given N jobs, N workers and the time taken by
28+
each worker to complete a job then how should the assignment of a
29+
Worker to a Job be done, so as to minimize the time taken.
30+
31+
Thus if we have 3 jobs p,q,r and 3 workers x,y,z such that:
32+
x y z
33+
p 2 4 7
34+
q 3 9 5
35+
r 8 2 9
36+
37+
where the cell values of the above matrix give the time required
38+
for the worker(given by column name) to complete the job(given by
39+
the row name)
40+
41+
then possible solutions are:
42+
Total
43+
1. 2, 9, 9 20
44+
2. 2, 2, 5 9
45+
3. 3, 4, 9 16
46+
4. 3, 2, 7 12
47+
5. 8, 9, 7 24
48+
6. 8, 4, 5 17
49+
50+
Thus (2) is the optimal solution for the above problem.
51+
This kind of brute-force approach of solving Assignment problem
52+
quickly becomes slow and bulky as N grows, because the number of
53+
possible solution are N! and thus the task is to evaluate each
54+
and then find the optimal solution.(If N=10, number of possible
55+
solutions: 3628800 !)
56+
Munkres' gives us a solution to this problem, which is implemented
57+
in this module.
58+
59+
This module also solves Assignment problem for rectangular matrices
60+
(M x N) by converting them to square matrices by padding zeros. ex:
61+
If input matrix is:
62+
[2, 4, 7, 9],
63+
[3, 9, 5, 1],
64+
[8, 2, 9, 7]
65+
i.e 3 x 4 then we will convert it to 4 x 4 and the modified input
66+
matrix will be:
67+
[2, 4, 7, 9],
68+
[3, 9, 5, 1],
69+
[8, 2, 9, 7],
70+
[0, 0, 0, 0]
71+
72+
EXPORT
73+
"assign" function by default.
74+
75+
INPUT
76+
The input matrix should be in a two dimensional array(array of
77+
array) and the 'assign' subroutine expects a reference to this
78+
array and not the complete array.
79+
eg:assign(\@inp_mat, \@out_mat);
80+
The second argument to the assign subroutine is the reference
81+
to the output array.
82+
83+
OUTPUT
84+
The assign subroutine expects references to two arrays as its
85+
input paramenters. The second parameter is the reference to the
86+
output array. This array is populated by assign subroutine. This
87+
array is single dimensional Nx1 matrix.
88+
For above example the output array returned will be:
89+
(0,
90+
2,
91+
1)
92+
93+
where
94+
0th element indicates that 0th row is assigned 0th column i.e value=2
95+
1st element indicates that 1st row is assigned 2nd column i.e.value=5
96+
2nd element indicates that 2nd row is assigned 1st column.i.e.value=2
97+
98+
SEE ALSO
99+
1. http://216.249.163.93/bob.pilgrim/445/munkres.html
100+
101+
2. Munkres, J. Algorithms for the assignment and transportation
102+
Problems. J. Siam 5 (Mar. 1957), 32-38
103+
104+
3. François Bourgeois and Jean-Claude Lassalle. 1971.
105+
An extension of the Munkres algorithm for the assignment
106+
problem to rectangular matrices.
107+
Communication ACM, 14(12):802-804
108+
109+
AUTHOR
110+
Anagha Kulkarni, University of Minnesota Duluth
111+
kulka020 <at> d.umn.edu
112+
113+
Ted Pedersen, University of Minnesota Duluth
114+
tpederse <at> d.umn.edu
115+
116+
COPYRIGHT AND LICENSE
117+
Copyright (C) 2007-2008, Ted Pedersen and Anagha Kulkarni
118+
119+
This program is free software; you can redistribute it and/or modify it
120+
under the terms of the GNU General Public License as published by the
121+
Free Software Foundation; either version 2 of the License, or (at your
122+
option) any later version. This program is distributed in the hope that
123+
it will be useful, but WITHOUT ANY WARRANTY; without even the implied
124+
warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
125+
GNU General Public License for more details.
126+
127+
You should have received a copy of the GNU General Public License along
128+
with this program; if not, write to the Free Software Foundation, Inc.,
129+
59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
130+

‎lib/CorScorer.pm

+827
Large diffs are not rendered by default.

‎scorer.bat

+67
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
@rem = '--*-Perl-*--
2+
@echo off
3+
if "%OS%" == "Windows_NT" goto WinNT
4+
perl -x -S "%0" %1 %2 %3 %4 %5 %6 %7 %8 %9
5+
goto endofperl
6+
:WinNT
7+
perl -x -S %0 %*
8+
if NOT "%COMSPEC%" == "%SystemRoot%\system32\cmd.exe" goto endofperl
9+
if %errorlevel% == 9009 echo You do not have Perl in your PATH.
10+
if errorlevel 1 goto script_failed_so_exit_with_non_zero_val 2>nul
11+
goto endofperl
12+
@rem ';
13+
#!perl
14+
#line 15
15+
16+
BEGIN {
17+
$d = $0;
18+
$d =~ s/\/[^\/][^\/]*$//g;
19+
push(@INC, $d."/lib");
20+
}
21+
22+
use strict;
23+
use CorScorer;
24+
25+
if (@ARGV < 3) {
26+
print q|
27+
use: scorer.bat <metric> <keys_file> <response_file> [name]
28+
29+
metric: the metric desired to score the results:
30+
muc: MUCScorer (Vilain et al, 1995)
31+
bcub: B-Cubed (Bagga and Baldwin, 1998)
32+
ceafm: CEAF (Luo et al, 2005) using mention-based similarity
33+
ceafe: CEAF (Luo et al, 2005) using entity-based similarity
34+
all: uses all the metrics to score
35+
36+
keys_file: file with expected coreference chains in SemEval format
37+
38+
response_file: file with output of coreference system (SemEval format)
39+
40+
name: [optional] the name of the document to score. If name is not
41+
given, all the documents in the dataset will be scored. If given
42+
name is "none" then all the documents are scored but only total
43+
results are shown.
44+
45+
|;
46+
exit;
47+
}
48+
49+
my $metric = shift (@ARGV);
50+
if ($metric !~ /^(muc|bcub|ceafm|ceafe|all)/i) {
51+
print "Invalid metric\n";
52+
exit;
53+
}
54+
55+
56+
if ($metric eq 'all') {
57+
foreach my $m ('muc', 'bcub', 'ceafm', 'ceafe') {
58+
print "\nMETRIC $m:\n";
59+
&CorScorer::Score( $m, @ARGV );
60+
}
61+
}
62+
else {
63+
&CorScorer::Score( $metric, @ARGV );
64+
}
65+
66+
__END__
67+
:endofperl

‎scorer.pl

+53
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
#!/usr/bin/perl
2+
3+
BEGIN {
4+
$d = $0;
5+
$d =~ s/\/[^\/][^\/]*$//g;
6+
push(@INC, $d."/lib");
7+
}
8+
9+
use strict;
10+
use CorScorer;
11+
12+
13+
if (@ARGV < 3) {
14+
print q|
15+
use: scorer.pl <metric> <keys_file> <response_file> [name]
16+
17+
metric: the metric desired to score the results:
18+
muc: MUCScorer (Vilain et al, 1995)
19+
bcub: B-Cubed (Bagga and Baldwin, 1998)
20+
ceafm: CEAF (Luo et al, 2005) using mention-based similarity
21+
ceafe: CEAF (Luo et al, 2005) using entity-based similarity
22+
all: uses all the metrics to score
23+
24+
keys_file: file with expected coreference chains in SemEval format
25+
26+
response_file: file with output of coreference system (SemEval format)
27+
28+
name: [optional] the name of the document to score. If name is not
29+
given, all the documents in the dataset will be scored. If given
30+
name is "none" then all the documents are scored but only total
31+
results are shown.
32+
33+
|;
34+
exit;
35+
}
36+
37+
my $metric = shift (@ARGV);
38+
if ($metric !~ /^(muc|bcub|ceafm|ceafe|all)/i) {
39+
print "Invalid metric\n";
40+
exit;
41+
}
42+
43+
44+
if ($metric eq 'all') {
45+
foreach my $m ('muc', 'bcub', 'ceafm', 'ceafe') {
46+
print "\nMETRIC $m:\n";
47+
&CorScorer::Score( $m, @ARGV );
48+
}
49+
}
50+
else {
51+
&CorScorer::Score( $metric, @ARGV );
52+
}
53+

0 commit comments

Comments
 (0)
Please sign in to comment.