Skip to content

Commit 5d54114

Browse files
author
André Almada
committed
'Pre-processing'
0 parents  commit 5d54114

File tree

576 files changed

+44539
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

576 files changed

+44539
-0
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
"# data-mining-py"

bag-of-words.py

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
import glob
2+
import os
3+
import re
4+
import pandas as pd
5+
import nltk
6+
nltk.download('punkt')
7+
nltk.download('stopwords')
8+
from nltk.tokenize import word_tokenize
9+
from nltk.corpus import stopwords
10+
from nltk.stem.porter import PorterStemmer
11+
12+
path = './txt/'
13+
stemmer = PorterStemmer()
14+
15+
def get_class (filename):
16+
return filename.split('-')[0].split('\\')[1]
17+
18+
def get_data (filename):
19+
with open(filename, 'r', encoding='latin1') as f:
20+
data = f.read()
21+
data = re.sub('[^A-Za-z]', ' ', data)
22+
data = data.lower()
23+
data = word_tokenize(data)
24+
25+
for token in data:
26+
if token in stopwords.words('english'):
27+
data.remove(token)
28+
29+
for i in range(len(data)):
30+
data[i] = stemmer.stem(data[i])
31+
32+
return data
33+
34+
names = [f for f in glob.glob(os.path.join(path, '*.txt'))]
35+
36+
dataset = pd.DataFrame({'journal' : [get_class(f) for f in names], 'data' : [get_data(f) for f in names]})
37+
38+
# Tira caracteres não alfabéticos e deixa o texto inteiro na minúscula
39+
#dataset.data = dataset.data.map(lambda x: re.sub('[^A-Za-z]', ' ', x).lower())
40+
41+
#dataset.data = pre_processing (dataset.data)
42+

txt/CBR-1010Agr109-120.txt

Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
KBS Maintenance as Learning Two-Tiered
2+
Domain Representation *
3+
4+
Gennady Agre
5+
6+
Institute of Information Technologies - Bulgarian Academy of Sciences
7+
Acad. G. Bonchev St. Block 29A, 1113 Sofia, Bulgaria
8+
9+
10+
11+
Abstract. The paper deals with the problem of improving problem-solving
12+
behavior of traditional KBS in the course of its real operation
13+
which is a part of the maintenance task. The solution of the problem is
14+
searched in integration of the KBS with a specially designed case-based
15+
reasoning module used for correcting solutions produced by the KBS.
16+
Special attention is paid to the methods of case matching and reconciling
17+
conflicts between CBR and RBR. The proposed solution for both
18+
problems is based on treating the maintenance task as a problem for
19+
learning two-tiered domain representation. From this view point rules
20+
form the first domain tier reflecting existing strong patterns in the representation
21+
of domain concepts, while the second tier is formed by the
22+
newly solved cases along with a special domain-dependent procedure for
23+
case matching. The main ideas of the approach are illustrated by the
24+
results of some experiments with the experimental system CoRCase.
25+
References
26+
27+
1. Aamodt, A.: Knowledge-intensive, integrated approach to problem solving and sustained
28+
learning. Ph.D. Dissertation, University of Trondheim (1991).
29+
2. Agre, G.: Improvement of KBS Behavior by Using Problem-Solving Experience.
30+
Ph. Jorrand and V. Sgurev (Eds.), Proc. of the VIth Int. Conference AIMSA �94,
31+
World Scientific Pubi., Singapore (1994) 257�266.
32+
3. Agre, G.: An Approach to Integration of Rule-Based and Case-Based Reasoning.
33+
Problems of Engineering Cybernetics and Robotics, 42, Bulgarian Academy of Sciences,
34+
Sofia (1995) 40�49.
35+
4. Barsalou, L., Hale, G.: Components of conceptual representation: from feature lists
36+
to recursive frames. I. Van Mechelen, J. Hampton, R.S. Michalsi and P. Theuns
37+
(Eds.) Categories and Concepts - Theoretical Views and Inductive Data Analysis,
38+
Academic Press (1993) 97�144.
39+
5. Bergadano, F., Matwin, S., Michaiski R.S., Zhang, J.: Learning two-tiered description
40+
of flexible concepts: the POSEIDON system. Machine Learning 8 (1992) 5�43.
41+
6. Biberman, Y.: A Contex Similarity Measure. F. Bergadano and L. De Raedt (Eds.)
42+
Machine Learning: ECML-94. LNAJ 784, Springer-Verlag (1994) 48�63
43+
7. Coenen, F., Bench-Capon, T.: Maintenance of Knowledge-Based Systems: Theory,
44+
Techniques and Tools, Academic Press (1993).
45+
8. Golding A.R., Rosenbloom, P.S.: Improving rule-based systems through case-based
46+
reasoning. Proceedings of the National Conference on Artificial Intelligence, Anaheim,
47+
MIT Press (1991) 22�27.
48+
9. Hammond, K.: Case-Based Planning: Viewing Planning as a Memory Task. Academic Press (1989).
49+
10. Iordanova l., Giacometti, A., Vila, A., Amy, B., Reymond, F., Abaoub, L., Dahou
50+
M., Rialle, V.: Shade - A Hybrid System for Diagnosis in Electromyography. Proc.
51+
of IXth Int. Congress on Electromyography, Jerusalem (1992).
52+
11. Kolodner, J.L.: Extending problem solver capabilities through case-based inference.
53+
Proc. of 4th Workshop on Machine Learning, UC-Irvine, June 22-25 (1987) 167�
54+
178.
55+
12. Markov, Z.: Private communication (1995).
56+
13. Matwin, S., Plante, B.: A Deductive-Inductive Method For Theory Revision. R.S.
57+
Michalski and Gh. Tecuci (Eds.) Proc. of the First Int. Workshop on Multistrategy
58+
Learning (1991) Harpers Ferry, 160�174.
59+
14. Michalski R.S.: Learning flexible concepts: fundamental ideas and a method based
60+
on two-tiered representation. Y. Kodratoff and R.S. Michalski (Eds.) Machine
61+
Learning: an Artificial Intelligence Approach 3, San Mateo, CA: Morgan Kaufmann (1990) 63�111.
62+
15. Mitchell, T.M., R. Keller, Kedar-Cabelli, S.: Explanation-Based Generalization: A
63+
unifying view. Machine Learning 1 (1986) 47�80.
64+
16. Cost, S., Salzberg, S.: A weighted nearest neighbor algorithm for learning with
65+
symbolic features. Machine Learning 10(1) (1991) 56�78.
66+
17. Stanfill, C., Waltz, D.: Toward memory-based reasoning. Communication of ACM
67+
29(12) (1986) 1213�1229.
68+
18. Zhang, J.: Integrating Symbolic and Subsymbolic Approaches in Learning Flexible
69+
Concepts. R.S. Michalski and Gh. Tecuci (Eds.) Proc. of the First Int. Workshop
70+
on Multistrategy Learning, Harpers Ferry (1991) 289�304.
71+
19. Zhang, J.: Selecting Typical Instances in Instance-Based Learning. D. Sleeman and
72+
P. Edwards (Eds.) Machine Learning - Proc. of the Ninth Int. Workshop (ML02),
73+
San Mateo, CA: Morgan Kaufmann (1992) 470�479

txt/CBR-1010All1-10.txt

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
Integration of Case Based Retrieval with a Relational
2+
Database System in Aircraft Technical Support
3+
4+
Jonathan RC Allen1, David WR Patterson, Maurice D Mulvenna and John G Hughes
5+
6+
Northern Ireland Knowledge Engineering Laboratory
7+
Faculty of Informatics, University of Ulster at Jordanstown Newtownabbey
8+
Northern Ireland, UK, BT37 0QB
9+
10+
11+
12+
Abstract. Case-Based Reasoning (CBR) is suited to problem solving in
13+
domains where there are recurring problems. This paper describes the
14+
development of a CBR system for use in such a domain, the Technical
15+
Support department of an aircraft manufacturing company. The system uses
16+
three types of indexing: knowledge-guided induction, inductive indexing
17+
and nearest neighbour matching. The resultant system integrates case based
18+
retrieval with a relational database system to provide a rich environment to
19+
help manage the life cycle of a technical support query. In early tests with
20+
the system, staff can discern if a new query is a recurring problem and has
21+
been solved before or if it is a completely new unsolved technical query.
22+
References
23+
24+
1. Barletta R. An Introduction to Case-Based Reasoning, Al Expert 8, 1991.
25+
2. Harmon, P. Case-Based Reasoning 2, Intelligent Software Strategies, 7(11), 1991.
26+
3. Kriegsman, M., & Barletta, R. Building a Case-Based Help Application, IEEE
27+
Expert, 8(6), 1993.
28+
4. Harmon, P. Case-Based Reasoning 3, Intelligent Software Strategies, 8(1), 1992.
29+
5. Forsyth, R. Expert Systems: Principles and Case Studies, ed. Forsyth R, Chapman
30+
and Hall Ltd, (1989).
31+
6. Michalski, R., Carbonnel, J., & Mitchell, T. Machine Learning: An Artificial
32+
Intelligence Approach, Tioga Publishing Corp, Palo alto, CA, 1983.
33+
7. Magaldi, R.V., Maitaining Aeroplanes in Time-Constrained Operational Situations
34+
Using Case-Based Reasoning, EWCBR 1994 Chantilly, France.

txt/CBR-1010Alu121-132.txt

Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,87 @@
1+
A Case-Based Approach for Developing Writing Tools
2+
Aimed at Non-native English Users
3+
4+
Sandra M. Alu�sio1 and Osvaldo N. Oliveira Jr.2
5+
6+
7+
1 Universidade de S�o Paulo
8+
Departamento de Ci�ncias de Computa��o e Estat�stica
9+
CP 668, 13560-970, S�o Carlos, SP, BRAZIL
10+
11+
2 Universidade de S�o Paulo
12+
Instituto de F�sica de S�o Carlos
13+
CP 369, 13560-970 S�o Carlos, SP, BRAZIL
14+
15+
16+
17+
18+
Abstract
19+
20+
A writing tool has been developed for helping non-native English users to produce
21+
a first draft of Introductory Sections of scientific papers. A corpus analysis was carried
22+
out in 54 papers of Experimental Physics which allowed one to identify the schematic
23+
structure of Introductions and 30 rhetorical strategies generally employed. Each one of
24+
the Introductions analysed constituted a case. The user chooses from menus features
25+
related to the rhetorical strategies for each component and gives the intended order for
26+
his/her Introduction, thus forming the requisition. Using three types of metric, the tool
27+
recovers the best-match cases that can be later modified in a revision process.
28+
Preliminary experiments showed that high precision and recall will only be obtained if
29+
the number of cases in the case base is considerably increased. In the revision process,
30+
four operations are suggested which consist in modifying/adding/deleting the different
31+
rhetorical messages that constitute the strategies of the chosen case.
32+
References
33+
34+
[Born-92] Born, G. A Hipertext -Based Support Aid for Writing Software
35+
Documentation. In Computers and Writing - State of the Art, P. O�Brian-Holt &
36+
N.Williams (eds), Kiuwer Academic Publishers, Dordrecht, pp. 266-277, 1992.
37+
[Buchanan-92] Buchanan, R.A. Textbase Technology: Writing with Reusable Text. In
38+
Computers and Writing - State of the Art, P. O�Brian-Holt & N.Williams (eds),
39+
Kluwer Academic Publishers, Dordrecht, pp. 254-265, 1992.
40+
[Caldeira-92] (Caldeira), Alufsio S.M.; De Oliveira, M.C.F.; Fontana, N.; Nacamatsu,
41+
C.O. and Oliveira Jr., O.N. Writing tools for non-native users of English.
42+
Proceedings of the XVIII Latinamerican Informatics Conference, Spain, p. 224-
43+
231, 1992.
44+
[Crookes-86] Crookes, G. Towards a Validate Analysis of Scientific Text Structure.
45+
Applied Linguistics, Vol. 7 No. 1, 1986, pp.57-70.
46+
[Fontana-93] Fontana, N.; (Caldeira), Alu�sio S.M.; De Oliveira, M.C.F. and Oliveira
47+
Jr., O.N. Computer Assisted Writing--Aplications to English as a Foreign
48+
Language. CALL, Volume 6 (2), p. 145-161, 1993.
49+
[Hovy-90] Hovy, E. Pragmatics and Natural Language Generation. Artificial
50+
Intelligence 43, p. 153-197, 1990.
51+
[Huckin-91] Huckin, T.N. and Olsen, L.A. Technical Writing and Professional
52+
Communication for Nonnative Speakers of English. McGraw-Hill, In. 1991.
53+
[Jacobs-85] Jacobs, P.S. PHRED: A Generator for Natural Language Interfaces.
54+
Computational Linguistics 11 (4):2 19-242.
55+
[Kettler-94] Kettler, B.P.; Hendler, J.A. Andersen, W.A; and Evett M.P. Massively
56+
Parallel Support for Case-Based Planning. IEEE Expert, pp. 8-14, February 1994.
57+
[Kitano-90] Kitano, H. Parallel Incremental Sentence Production for a Model of
58+
Simultaneous Interpretation. In Current Research in Natural Language Generation,
59+
Dale, R., Mellish, C. and Zock, M. (eds), Academic Press, Boston, 1990, pp. 321-
60+
351.
61+
[Kukich-83] Kukich, K. Knowledge-Based Report Generation: A Knowledge
62+
Engineering Approach to Natural Language Report Generation. PhD Thesis,
63+
University of Pittsburg, 1983.
64+
[Maybury-91] Maybury, M.T. Planning Multisentential English Text Using
65+
Communicative Acts. (PhD Thesis) Tech. R. 239, University of Cambridge, 1991.
66+
[Oliveira-91] Oliveira, Jr. O.N.; (Caldeira), Alufsio S.M. and Fontana, N. Chusaurus:
67+
A Writing Tool Resource for Non-Native Users of English, In Proceedings of the
68+
XI International Conference of The Chilean Computer Science Society, pp. 59-70.
69+
Also In Computer Science: Research and Applications, R. Baeza-Yates and U.
70+
Manber (eds), Plenum Press, N.Y. pp. 63-72, 1992.
71+
[Paice-90] Paice, C.D. Constructing Literature Abstracts by Computer: Techniques
72+
and Prospects. Information Processing & Management, Vol. 26, No. 1, pp. 171-
73+
186, 1990.
74+
[Pautler-94] Pautler, D. Planning and learning in domains providing little feedback.
75+
AAAI Fall Symposium on Planning & Learning Notes�94.
76+
[Smadja-91] Smadja, F. Retrieving Collocational Knowledge from Textual Corpora.
77+
An application: Language Generation. PhD Thesis, Computer Science
78+
Department, Columbia University, 1991.
79+
[Swales-90] Swales, J. Genre Analisys - English in academic and research settings.
80+
Cambridge University Press, 1990.
81+
[Taylor-91] Taylor, G. and Tingguang, C. Linguistic, Cultural and Subcultural Issues
82+
in Contrastive Discourse Analysis: Anglo-amarican and Chinese Scientific Texts.
83+
Applied Linguistics, Vol. 12, No. 3, 1991, pp. 3 19-336.
84+
[Trimble-85] Trimble, L. English for science and technology: a discourse approach.
85+
Cambridge University Press, 1985.
86+
[Weissberg-90] Weissberg R. and Buker, S. Writing up Research - Experimental
87+
Research Report Writing for Students of English. Prentice Hall Regents, 1990.

txt/CBR-1010Ash133-144.txt

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
Reasoning with Reasons in
2+
Case-Based Comparisons
3+
4+
Kevin D. Ashley and Bruce M. McLaren
5+
University of Pittsburgh, Pittsburgh, Pennsylvania 15260
6+
7+
Abstract. In this work, we are interested in how rational decision makers reason
8+
with and about reasons in a domain, practical ethics, where they appear to reason
9+
about reasons symbolically in terms of both abstract moral principles and case
10+
comparisons. The challenge for reasoners, human and artificial, is to use abstract
11+
knowledge of reasons and principles to inform decisions about the salience of
12+
similarities and differences among cases while still accounting for a case's or
13+
problems specific contextual circumstances. TRUTH-TELLER is a program we
14+
have developed and tested that compares pairs of cases presenting ethical
15+
dilemmas about whether to tell the truth. The programs methods for reasoning
16+
about reasons help it to make context sensitive assessments of the salience of
17+
similarities and differences.
18+
19+
References
20+
Aleven, V. and Ashley, K.D. (1994). An Instructional Environment for Practicing
21+
Argumentation Skills; In the Proceedings of AAAI-94, pages 485-492.
22+
Ashley, K.D. (1990). Modeling Legal Argument: Reasoning with Cases and
23+
Hypotheticals. MIT Press, Cambridge. Based on PhD. Dissertation.
24+
Ashley, K.D. and McLaren, B.M. (1994). A CBR Knowledge Representation for
25+
Practical Ethics; In Proceedings, 2d EWCBR.
26+
Bareiss, E. R. (1989). Exemplar-Based Knowledge Acquisition - A Unified Approach to
27+
Concept Representation, Classification, and Learning. Academic Press, San Diego,
28+
CA, 1989. Based on PhD dissertation, 1988.
29+
Bok, 5. (1989). Lying: Moral Choice in Public and Private Life. Random House, Inc.
30+
Vintage Books, New York.
31+
Branting, K. L. (1991). Building Explanations From Rules and Structured Cases. In the
32+
Journal of Man-Machine Studies. 34, 797-837.
33+
Edelson, D. C. (1992). When Should A Cheetah Remind You of a Bat? Reminding in
34+
Case-Based Teaching. In Proceedings of AAAI-92, 667-672. San Jose, CA.
35+
Gilligan, C. (1982). In a Different Voice. Harvard University Press.
36+
Hammond, K. (1989) Case-Based Planning -- Viewing Planning as a Memory Task. San
37+
Diego, CA: Academic Press.
38+
Jonsen A. R. and Toulmin 5. (1988). The Abuse of Casuistry: A History of Moral
39+
Reasoning. University of CA Press, Berkeley.
40+
Kass, A. M., Leake, D., and Owens, C. C. (1986). Swale: A Program that Explains. In
41+
Schank, R. C. (ed.), Explanation Patterns: Understanding Mechanically and
42+
Creatively. Lawrence Erlbaum Associates, Hillsdale, NJ.
43+
Kolodner, J. (1993) Case-Based Reasoning Morgan Kaufmann Publishers, Inc., San
44+
Mateo, CA.
45+
Koton, P. (1988) Using Experience in Learning and Problem Solving. PhD thesis, MIT.
46+
MacGregor R. and Burstein, M. H. (1991). Using a Description Classifier to Enhance
47+
Knowledge Representation. In IEEE Expert 6(3), pages 41-46.
48+
Rissland, E. L. and Skalak, D. B. (1991). CABARET: Rule Interpretation in a Hybrid
49+
Architecture. In the Journal of Man-Machine Studies. 34, pages 8 39-887.
50+
Rissland, E. L., Skalak, D. B., and Friedman, M. T. (1993). BankXX: A Program to
51+
Generate Argument through Case-Based Search. In Fourth International Conference
52+
on AI and Law, Vrie Universiteit, Amsterdam.
53+
Strong, C. (1988). Justification in Ethics. In Baruch A. Brody, ed., Moral Theory and
54+
Moral Judgments in Medical Ethics, pp. 193-211. Kiuwer, Dordrecht.
55+
Sycara, E. P. (1987). Resolving Adversarial Conflicts: An Approach Integrating Case-Based
56+
and Analytic Methods Georgia Inst.Tech., Tech. Rep.GIT-ICS-87/26. Atlanta.
57+
Veloso, M. V. (1992). Learning by Analogical Reasoning in General Problem Solving.
58+
PhD thesis, Carnegie Mellon University. Technical Report No. CMU-CS-92-174.

0 commit comments

Comments
 (0)