Skip to content

Commit 334004e

Browse files
Acat 3.12 updates (#62)
* Refactor LoggingUtility to implement singleton pattern and enhance logging format * Update pyproject.toml and ConvAssistUI.spec for dependency management and application name change * Rename ConvAssistUI to ConvAssist and update related build scripts * Enhance logging in ConvAssist and related components for better traceability * Add setter for sbertmodel, enhance logging utility, and update database generator arguments * Refactor LoggingUtility for improved log formatting and adjust ACAT connection retry logic * Add human conversation training data and dataset download utility * Fix cron syntax in scorecard workflow configuration * Add README files for Personachat dataset and ConvAssist 3rd party resources * Update .gitignore to exclude human conversation and Persona Chat data files * Add kagglehub and pandas dependencies to pyproject.toml * Refactor ConvAssist.py to improve logging and file locking mechanisms * Refactor imports and remove unused files in ConvAssist module and merged shachi_changes branch * Restore old nlp functionality. * Remove unused __init__.py files from various modules to clean up the codebase * Update ConvAssist.spec to enable stripping and disable console window * Updated poetry.lock file * Refactor import statements to use absolute paths for consistency across modules * Refactor imports and update paths for ACAT simulator; add Preferences class for managing application settings * Enhance NGramUtil and SmoothedNgramPredictor for better error handling and logging; update configuration paths and remove unused files * Refactor SentenceCompletionPredictor to disable learning functionality; update logging messages for clarity * Refactor imports for consistency; add unit tests for predictors and enhance NGramUtil setup * Update .gitignore to exclude ConvAssist.zip from version control * fixing sentence predictor retrieval (#61) * fixing sentence predictor retrieval * Added missing parameter to call to _generate --------- Co-authored-by: Beale, Michael <[email protected]> * WIP Refactor logging utility and preferences; update database connector interface with transaction methods * Add transaction methods to SQLiteDatabaseConnector; import numpy in sentence_completion_predictor * Enable learning mode for SentenceCompletionPredictor in configuration * Add concurrent-log-handler and portalocker dependencies to project * Refactor logging configuration in ConvAssist and LoggingUtility; replace log_location with log_file boolean * Refactor error logging in predictors; update spellchecker import and add new dependencies * Update ConvAssist.spec to include spellchecker data files and refine exclusions * Limit sentence processing to max_partial_prediction_size in SentenceCompletionPredictor * Update ContinuousPredict to disable log file and add CannedPhrasesPredictor configuration * Remove spellchecker dependency and update logging utility to default log_file to False * Update logging utility to change log file format and enhance build script with error handling and versioning * Refactor spell checker import, enhance logging utility, and update dependencies * Update run_unittests.yaml --------- Co-authored-by: Shachi Kumar <[email protected]>
1 parent e548389 commit 334004e

File tree

75 files changed

+5558
-3165
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

75 files changed

+5558
-3165
lines changed

.github/workflows/scorecard.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ on:
1010
# To guarantee Maintained check is occasionally updated. See
1111
# https://github.com/ossf/scorecard/blob/main/docs/checks.md#maintained
1212
schedule:
13-
- cron: '44 15 * * 0'
13+
- cron: "44 15 * * 0"
1414
# push:
1515
# branches: [ "main", "covassist-cleanup" ]
1616
# pull_request:

.gitignore

+3
Original file line numberDiff line numberDiff line change
@@ -27,3 +27,6 @@ interfaces/Demos/prompter/
2727
coverage.xml
2828
convassist/.coverage.*
2929
convassist/tests/test_data/*
30+
3rd_party_resources/human-conversation/Human_Conversation_Data.txt
31+
3rd_party_resources/personachat/Persona_Chat_Data.txt
32+
interfaces/ACAT/acatconvassist/ConvAssist.zip

.vscode/launch.json

+2-2
Original file line numberDiff line numberDiff line change
@@ -25,10 +25,10 @@
2525
}
2626
},
2727
{
28-
"name": "ConvAssistUI",
28+
"name": "ConvAssist",
2929
"type": "debugpy",
3030
"request": "launch",
31-
"program": "ConvAssistUI.py",
31+
"program": "ConvAssist.py",
3232
"console": "integratedTerminal",
3333
"cwd": "${workspaceFolder}/interfaces/ACAT/acatconvassist",
3434
"justMyCode": false,

3rd_party_resources/README.md

+34
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
# ConvAssist 3rd Party Resources
2+
3+
## Datasets
4+
5+
TODO: Explanation of why we need conversation datasets for the predictors
6+
7+
Two datasets are provided by ConvAssist:
8+
* [Human Conversation Training Data](human-conversation/README.md)
9+
* [Personachat](personalchat/README/md)
10+
11+
### Licensing
12+
13+
> [!NOTE]
14+
> Datasets included in the ConvAssist project are subject to the license provided. We make no
15+
> claim of copyright to the data provided by. See specific license information for each
16+
> dataset.
17+
18+
### "Bring your own conversation dataset"
19+
20+
If you would like to extend ConvAssist with your own converstaion dataset:
21+
* Conversation data **MUST** be in plain text format.
22+
* Each sentence should be on a separate line.
23+
* ConvAssist expects the conversation data to be in a single file. You may combine multiple datasets to provide a more robust library to the predictors.
24+
25+
The following Predictors support custom conversation dataset(s)
26+
27+
| Predictor | .ini Setting |
28+
| --- | --- |
29+
| GeneralWordPredictor| aac_dataset |
30+
| SmoothedNGramPredictor* | N/A |
31+
| SentencePredictor | retrieve_database |
32+
33+
>\* NOTE
34+
The SmoothedNGramPredictor uses an AAC like dataset to populate the NGram database used for predictions. ConvAssist uses the data from [Human Conversation Training Data](human-conversation/README.md) in the [ACATConvAssist](../interfaces/ACAT/acatconvassist/) utility as well as the [Continuous Predictor](../interfaces/Demos/continuous_predict/) Demo application.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,119 @@
1+
CC0 1.0 Universal
2+
3+
CREATIVE COMMONS CORPORATION IS NOT A LAW FIRM AND DOES NOT PROVIDE
4+
LEGAL SERVICES. DISTRIBUTION OF THIS DOCUMENT DOES NOT CREATE AN
5+
ATTORNEY-CLIENT RELATIONSHIP. CREATIVE COMMONS PROVIDES THIS
6+
INFORMATION ON AN "AS-IS" BASIS. CREATIVE COMMONS MAKES NO WARRANTIES
7+
REGARDING THE USE OF THIS DOCUMENT OR THE INFORMATION OR WORKS
8+
PROVIDED HEREUNDER, AND DISCLAIMS LIABILITY FOR DAMAGES RESULTING FROM
9+
THE USE OF THIS DOCUMENT OR THE INFORMATION OR WORKS PROVIDED
10+
HEREUNDER.
11+
12+
Statement of Purpose
13+
14+
The laws of most jurisdictions throughout the world automatically confer
15+
exclusive Copyright and Related Rights (defined below) upon the creator
16+
and subsequent owner(s) (each and all, an "owner") of an original work of
17+
authorship and/or a database (each, a "Work").
18+
19+
Certain owners wish to permanently relinquish those rights to a Work for
20+
the purpose of contributing to a commons of creative, cultural and
21+
scientific works ("Commons") that the public can reliably and without fear
22+
of later claims of infringement build upon, modify, incorporate in other
23+
works, reuse and redistribute as freely as possible in any form whatsoever
24+
and for any purposes, including without limitation commercial purposes.
25+
These owners may contribute to the Commons to promote the ideal of a free
26+
culture and the further production of creative, cultural and scientific
27+
works, or to gain reputation or greater distribution for their Work in
28+
part through the use and efforts of others.
29+
30+
For these and/or other purposes and motivations, and without any
31+
expectation of additional consideration or compensation, the person
32+
associating CC0 with a Work (the "Affirmer"), to the extent that he or she
33+
is an owner of Copyright and Related Rights in the Work, voluntarily
34+
elects to apply CC0 to the Work and publicly distribute the Work under its
35+
terms, with knowledge of his or her Copyright and Related Rights in the
36+
Work and the meaning and intended legal effect of CC0 on those rights.
37+
38+
1. Copyright and Related Rights. A Work made available under CC0 may be
39+
protected by copyright and related or neighboring rights ("Copyright and
40+
Related Rights"). Copyright and Related Rights include, but are not
41+
limited to, the following:
42+
43+
i. the right to reproduce, adapt, distribute, perform, display,
44+
communicate, and translate a Work;
45+
ii. moral rights retained by the original author(s) and/or performer(s);
46+
iii. publicity and privacy rights pertaining to a person's image or
47+
likeness depicted in a Work;
48+
iv. rights protecting against unfair competition in regards to a Work,
49+
subject to the limitations in paragraph 4(a), below;
50+
v. rights protecting the extraction, dissemination, use and reuse of data
51+
in a Work;
52+
vi. database rights (such as those arising under Directive 96/9/EC of the
53+
European Parliament and of the Council of 11 March 1996 on the legal
54+
protection of databases, and under any national implementation
55+
thereof, including any amended or successor version of such
56+
directive); and
57+
vii. other similar, equivalent or corresponding rights throughout the
58+
world based on applicable law or treaty, and any national
59+
implementations thereof.
60+
61+
2. Waiver. To the greatest extent permitted by, but not in contravention
62+
of, applicable law, Affirmer hereby overtly, fully, permanently,
63+
irrevocably and unconditionally waives, abandons, and surrenders all of
64+
Affirmer's Copyright and Related Rights and associated claims and causes
65+
of action, whether now known or unknown (including existing as well as
66+
future claims and causes of action), in the Work (i) in all territories
67+
worldwide, (ii) for the maximum duration provided by applicable law or
68+
treaty (including future time extensions), (iii) in any current or future
69+
medium and for any number of copies, and (iv) for any purpose whatsoever,
70+
including without limitation commercial, advertising or promotional
71+
purposes (the "Waiver"). Affirmer makes the Waiver for the benefit of each
72+
member of the public at large and to the detriment of Affirmer's heirs and
73+
successors, fully intending that such Waiver shall not be subject to
74+
revocation, rescission, cancellation, termination, or any other legal or
75+
equitable action to disrupt the quiet enjoyment of the Work by the public
76+
as contemplated by Affirmer's express Statement of Purpose.
77+
78+
3. Public License Fallback. Should any part of the Waiver for any reason
79+
be judged legally invalid or ineffective under applicable law, then the
80+
Waiver shall be preserved to the maximum extent permitted taking into
81+
account Affirmer's express Statement of Purpose. In addition, to the
82+
extent the Waiver is so judged Affirmer hereby grants to each affected
83+
person a royalty-free, non transferable, non sublicensable, non exclusive,
84+
irrevocable and unconditional license to exercise Affirmer's Copyright and
85+
Related Rights in the Work (i) in all territories worldwide, (ii) for the
86+
maximum duration provided by applicable law or treaty (including future
87+
time extensions), (iii) in any current or future medium and for any number
88+
of copies, and (iv) for any purpose whatsoever, including without
89+
limitation commercial, advertising or promotional purposes (the
90+
"License"). The License shall be deemed effective as of the date CC0 was
91+
applied by Affirmer to the Work. Should any part of the License for any
92+
reason be judged legally invalid or ineffective under applicable law, such
93+
partial invalidity or ineffectiveness shall not invalidate the remainder
94+
of the License, and in such case Affirmer hereby affirms that he or she
95+
will not (i) exercise any of his or her remaining Copyright and Related
96+
Rights in the Work or (ii) assert any associated claims and causes of
97+
action with respect to the Work, in either case contrary to Affirmer's
98+
express Statement of Purpose.
99+
100+
4. Limitations and Disclaimers.
101+
102+
a. No trademark or patent rights held by Affirmer are waived, abandoned,
103+
surrendered, licensed or otherwise affected by this document.
104+
b. Affirmer offers the Work as-is and makes no representations or
105+
warranties of any kind concerning the Work, express, implied,
106+
statutory or otherwise, including without limitation warranties of
107+
title, merchantability, fitness for a particular purpose, non
108+
infringement, or the absence of latent or other defects, accuracy, or
109+
the present or absence of errors, whether or not discoverable, all to
110+
the greatest extent permissible under applicable law.
111+
c. Affirmer disclaims responsibility for clearing rights of other persons
112+
that may apply to the Work or any use thereof, including without
113+
limitation any person's Copyright and Related Rights in the Work.
114+
Further, Affirmer disclaims responsibility for obtaining any necessary
115+
consents, permissions or other rights required for any use of the
116+
Work.
117+
d. Affirmer understands and acknowledges that Creative Commons is not a
118+
party to this document and has no duty or obligation with respect to
119+
this CC0 or use of the Work.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
# Human Conversation Training Data
2+
Training data aggregated from various sources for training a chatbot with NLP.
3+
4+
## License
5+
[CC0 1.0 Universal](LICENSE)
6+
7+
## About Dataset
8+
Original Dataset Card from [projjail1/Human-Conversation-Training-Data](https://www.kaggle.com/datasets/projjal1/human-conversation-training-data/data)
9+
10+
> **Dataset Card**
11+
> ### Context
12+
>
13+
> I was working with RNN models in Tensorflow and was searching about conversation bots. Then a idea struck me as to create a bot myself. I looked for chat data but was not able to find something useful. Then I came across Meena chatbot and Mitsoku chatbot data and so compiled them with some data from human chats corpus.
14+
>
15+
> ### Content
16+
>
17+
> The data corpus contain chat labelled chat data with Human 1 and Human 2 in ask-reponse manner.
18+
> Each odd row with Human 1 label is the initiator of the chat and each even row with Human 2 label is the response.
19+
> Data after Human x: is the chat data which can be preprocessed to remove the label part.
20+
>
21+
> ### Acknowledgements
22+
>
23+
> We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.
24+
>
25+
> ### Inspiration
26+
>
27+
> I would love others to explore this data and frame ideas related to the creation of a chatbot system.
28+

0 commit comments

Comments
 (0)