Skip to content

Commit

Permalink
Merge pull request #111 from UAL-RE/110-release-v100
Browse files Browse the repository at this point in the history
Release: v1.0.0 (Log messages description included)
  • Loading branch information
HafeezOJ authored Nov 25, 2024
2 parents ce74245 + 27aaa8d commit f2ccbd7
Show file tree
Hide file tree
Showing 3 changed files with 98 additions and 1 deletion.
61 changes: 61 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,64 @@

## [v1.0.0](https://github.com/UAL-RE/ReQUIAM/tree/v1.0.0)
Initial version

## What's Changed

### Features
* Add validation for preservation directory struct. in https://github.com/UAL-RE/ReBACH/pull/3
* Check curation_folder_access for readonly vs readwrite in https://github.com/UAL-RE/ReBACH/pull/26
* Chunk downloads and add more log messages in https://github.com/UAL-RE/ReBACH/pull/29
* Chunk the hash check in Article.py in https://github.com/UAL-RE/ReBACH/pull/32
* Add warning message for articles without a curation folder in https://github.com/UAL-RE/ReBACH/pull/39
* Add validations for ‘post-processing script’ parameter in configuration file in https://github.com/UAL-RE/ReBACH/pull/40
* Allow overriding of Wasabi credentials in DART workflow (Issue #47) in https://github.com/UAL-RE/ReBACH/pull/50
* Case insensitive filename comparisons (Issue #57) in https://github.com/UAL-RE/ReBACH/pull/58
* Add color to messages in the console (Issue #62) in https://github.com/UAL-RE/ReBACH/pull/65
* Keep track of items successfully processed (Issue #66) in https://github.com/UAL-RE/ReBACH/pull/75
* Update BagIt profiles and associated workflows (Issue #73) in https://github.com/UAL-RE/ReBACH/pull/76
* enhance log messages in certain cases (Issue #80) in https://github.com/UAL-RE/ReBACH/pull/82
* add option to continue item processing on error (Issue #86) in https://github.com/UAL-RE/ReBACH/pull/87
* Improve Bagit profiles for APTrust (Issue #94) in https://github.com/UAL-RE/ReBACH/pull/95
* Improve summary messages (Issue #97) in https://github.com/UAL-RE/ReBACH/pull/98
* Fix free disk space needs computation (Issue #96) in https://github.com/UAL-RE/ReBACH/pull/99
* Check if item version is already preserved before bagging (Issue #102) in https://github.com/UAL-RE/ReBACH/pull/103
* Implement `--dry-run` flag in https://github.com/UAL-RE/ReBACH/pull/106

### Bug fixes
* Change the regexes to be more flexible on article_id and version in https://github.com/UAL-RE/ReBACH/pull/7
* Make curation validation less strict in https://github.com/UAL-RE/ReBACH/pull/10
* Incorrect metadata directory and filename in https://github.com/UAL-RE/ReBACH/pull/37
* Missing first_depositor_full_name in preservation storage folder creation in https://github.com/UAL-RE/ReBACH/pull/38
* Bagging log message consistency and location in main app (Issue #51) in https://github.com/UAL-RE/ReBACH/pull/52
* Bags uploaded despite error (Issue #60) in https://github.com/UAL-RE/ReBACH/pull/63
* Add missing option to bagger config (Issue #69) in https://github.com/UAL-RE/ReBACH/pull/70
* Crash when not uploading bags (Issue #71) in https://github.com/UAL-RE/ReBACH/pull/72
* Add retries to file downloading (Issue #81) in https://github.com/UAL-RE/ReBACH/pull/83
* Properly count articles that are published vs unpublished (Issue #79) in https://github.com/UAL-RE/ReBACH/pull/84
* Incorrect bag is processed (Issue #89) in https://github.com/UAL-RE/ReBACH/pull/90
* Avoid various error conditions with collections (Issue #91) in https://github.com/UAL-RE/ReBACH/pull/92
* Fix: Write Internal-Sender-Identifier to bags (Issue #100) in https://github.com/UAL-RE/ReBACH/pull/101
* Fix: Preprocessing of articles stops if curation folder does not exist for an article (Issue #105) in https://github.com/UAL-RE/ReBACH/pull/107
* Fix: Preservation package check fails when multiple copies of an item are already preserved (Issue 108) in https://github.com/UAL-RE/ReBACH/pull/109

### Others
* Org rename in https://github.com/UAL-RE/ReBACH/pull/2
* Change process in https://github.com/UAL-RE/ReBACH/pull/9
* Client feedback 1 in https://github.com/UAL-RE/ReBACH/pull/11
* Merge ReBACH-Bagger in https://github.com/UAL-RE/ReBACH/pull/13
* Logging changes in app.py in https://github.com/UAL-RE/ReBACH/pull/14
* README updates in https://github.com/UAL-RE/ReBACH/pull/15
* Address the issue #17 in https://github.com/UAL-RE/ReBACH/pull/21
* Address the issue #22 - Implemented method 'post_proces_script_function' with parameters in https://github.com/UAL-RE/ReBACH/pull/24
* Address Issue 43 - Enhance ReBACH to accept specific article and collection IDs for selective processing in https://github.com/UAL-RE/ReBACH/pull/44
* Address Issue 27 - Selective processing and uploading of articles and collections mentioned in the command-line argument in https://github.com/UAL-RE/ReBACH/pull/45
* Update setup.py with various fixes in https://github.com/UAL-RE/ReBACH/pull/53

## Contributors
* @astrochun
* @zoidy
* @davidagud
* @jonathannoah
* @rubab
* @HafeezOJ

4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@ ReBACH is run via the command line as outlined in the 'How to Run' section of th
## Requirements:
- Figshare organization number
- Figshare API token for respective organization
- Preservation final remote storage (AP Trust) user email
- Preservation final remote storage (AP Trust) user secret
- Read privileges to Curation storage location
- Write privileges to Preservation storage location
- Write privileges to logs location
Expand Down Expand Up @@ -44,7 +46,7 @@ ReBACH is run via the command line as outlined in the 'How to Run' section of th
- curation_storage_location - required: The file system location where the Curation files reside
- Ensure the aforementioned Dependencies and Requirements are met
- Navigate to the root directory of ReBACH via the terminal and start the script by entering the command `python3 app.py --xfg /path/of/.env.ini` or `python app.py --xfg /path/of/.env.ini` depending on your system configuration (note: the script must be run using Python 3.9 or greater)
- Informational and error output will occur in the terminal. The same output will be appended to a file in the logs location with today's date with some additional information and error logging occurring in the file
- Informational and error output will occur in the terminal. The same output will be appended to a file in the logs location with today's date with some additional information and error logging occurring in the file. The log details are described in [Description of ReBACH Log Messages](ReBACH_Logs_Summary_Description.md).
- Final preservation package output will occur in the preservation location you specified in the env.ini file

## Command line
Expand Down
34 changes: 34 additions & 0 deletions ReBACH_Logs_Summary_Description.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
## Description of ReBACH Log Messages

### Articles

| Log Message | Description |
|------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Total matched unique articles | Total number of articles whose article_ids are found both in ReDATA and curation storage. This excludes already preserved articles. <br/>**Note**: A preserved article can be in either of the remote preservation storages. |
| Total unmatched unique articles | Total number of articles whose article_ids are found in ReDATA but **not** in curation storage. This excludes already preserved articles. |
| Total matched article versions | Total number of article versions whose article_ids are found in ReDATA and curation storage contains a folder of the version. This excludes already preserved versions. |
| Total unmatched article versions | Total number of article versions whose article_ids are found in ReDATA but curation storage **does not** contain a folder of the version. This excludes already preserved versions. This number should always be zero. If non-zero, check the logs. |
| Total skipped unique articles | Total number of articles that are already preserved and are not processed. These articles have at least one version in preservation remote staging or preservation remote final storage . |
| Total skipped article versions | Total number of article versions that are already preserved and are not processed. |
| Total articles | Total number of articles in ReDATA, both published and unpublished. |
| Total published articles/article versions | Total number of published articles against total number of published versions in ReDATA. |
| Total count of already preserved (skipped) articles / article versions | Total count of already preserved articles against total count of already preserved article versions. <br/>**Note:** These figures include all articles and article versions already preserved in either of the remote preservation storages and are skipped during processing. |
| Total count of articles with fetch error / articles | Total number of articles with error while fetching items either from ReDATA or curation storage against total number of published articles. <br/>**Note**: Articles with fetch error are skipped during processing. |
| Total count of article versions with fetch error / article versions | Total number of article versions with error while fetching items either from ReDATA or curation storage against total number of published article versions. <br/>**Note**: Article versions with fetch error are skipped during processing. |
| Total articles versions matched/published (unskipped) | Total number of article versions with folders in curation storage against total number of published article versions in ReDATA. <br/>**Note**: The two numbers should be equal if nothing went wrong. If the first number is less than the second, it means there was an issue matching at least one of the published article versions that are published in ReDATA. The logs should be examined. |
| Total articles versions processed/matched | Total number of article versions successfully processed against total number of article versions in ReDATA with a folder in curation storage. <br/>**Note**: The first number will always be less than or equal to the second. If the numbers are not equal, the log should be checked. |
| Total count of already preserved article versions in preservation final remote storage | Total number of already preserved article versions in preservation final remote storage. <br/>**Note:** These article versions are skipped during processing. |
| Total count of already preserved article versions in preservation staging remote storage | Total number of already preserved article versions in preservation staging remote storage. <br/>**Note:** These article versions are skipped during processing. |
| Total articles versions unmatched (published-matched): | Total number of article versions in ReDATA but no folder in curation storage. <br/>**Note:** This should always be zero. |
| Total processed articles bags successfully preserved | Total number of articles versions successfully uploaded to preservation staging remote storage excluding total number of already preserved articles versions and article versions fetched but encountered an error prior to the upload stage of the process. In other words, this is the total number of bags that were successfully uploaded to preservation staging remote storage |

### Collections

| Log Message | Description |
|---------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Total collections | Total number of published collections in ReDATA. |
| Total published collections / collection versions | Total number of collections against total number of collection versions. |
| Total count of already preserved (skipped) collections / collection versions | Total number of already preserved collections against total count of already preserved collection versions. <br/>**Note:** These figures include all collections and collection versions already preserved in either of the remote preservation storages and are skipped during processing. |
| Total collections versions processed/published | Total number of collection versions successfully processed against total number of collection versions in ReDATA. |
| Total count of already preserved collection versions in preservation final remote storage | Total number of already preserved collection versions in preservation final remote storage. <br/>**Note:** These collection versions are skipped during processing. |
| Total count of already preserved collection versions in preservation staging remote storage | Total number of already preserved collection versions in preservation staging remote storage. <br/>**Note:** These collection versions are skipped during processing. |

0 comments on commit f2ccbd7

Please sign in to comment.