Skip to content

Commit 555b3cc

Browse files
committed
Merge remote-tracking branch 'origin/1.0.7'
2 parents ee0d4e0 + f11823c commit 555b3cc

File tree

12 files changed

+187
-84
lines changed

12 files changed

+187
-84
lines changed

.dockerignore

Lines changed: 7 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,11 @@
11
.git/
2+
.github/
23
.gitignore
4+
.dockerignore
5+
Dockerfile
6+
Containerfile
7+
bin/
8+
build/
39
*~
410
**/*.pyc
5-
**/__pycache__
6-
*.md
7-
out*
8-
backup.sh*
9-
venv
10-
dbpedia-spotlight-1.0.0.jar
11-
entity-linking
12-
en.tar.gz
13-
en
11+
**/__pyche__

.github/for-clams-team.md

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
2+
This directory contains GitHub-related files that help project management and the release process of CLAMS apps.
3+
To use these workflows, your app must be part of `clamsproject` organization.
4+
To create a new repository under the `clamsproject` organization, here's some naming convention to follow.
5+
6+
* App repositories in the `clamsproject` organization should be prefixed with `app-` (e.g., `app-myapp`).
7+
* An app that wraps an extant tool or application should be suffixed with `-wrapper` (e.g., `app-their-app-wrapper`).
8+
* `LICENSE` file should always contain licensing information of the terminal code. If the app is a wrapper, an additional file containing licensing information of the underlying tool must be placed next to the `LICENSE` file when the original license requires so.
9+
10+
(Your "app name" that you used in `clams develop` to create this scaffolding doesn't have to match the repository name.)
11+
12+
In the `workflows` directory, you'll find;
13+
14+
* `issue-apps-project.yml`: this workflow will add all new issues and PRs to our [`apps` project board](https://github.com/orgs/clamsproject/projects/12).
15+
* `issue-assign.yml`: this workflow will assign an issue to the person who created a branch for the issue. A branch is for an issue when its name starts with the `issueNum-` prefix. (e.g., `3-fix` branch is for issue number 3)
16+
* `issue-close.yml`: this workflow will remove all assignee from closed/merged issues and PRs.
17+
* `publish.yml`: this workflow is the main driver for the app release process. A release process is set to be triggered by **any** tag push. To change the trigger edit `on:` part of the file. To change the trigger edit `on.push.tags` part of the file. ([reference](https://docs.github.com/en/actions/using-workflows/events-that-trigger-workflows#running-your-workflow-only-when-a-push-of-specific-tags-occurs)).
18+
* The workflow will
19+
1. build a container image for the app and push it to [the `clamsproject` ghcr](https://github.com/orgs/clamsproject/packages).
20+
2. generate app directory entry files and create a PR to [the app directory repository](https://github.com/clamsproject/apps) for registration.
21+
* **NOTE**: Throughout the entire release process, the git tag that triggered the workflow will be used as the version of the app.

.github/workflows/issue-apps-project.yml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
name: "add to `apps` GH project when a new issue is submitted"
1+
name: "🗂 Add new issue to `apps` GHP"
22

33
on:
44
issues:
@@ -11,8 +11,9 @@ on:
1111

1212
jobs:
1313
call-assign:
14+
name: "🤙 Call GHP workflow"
1415
uses: clamsproject/.github/.github/workflows/repo-issue-project.yml@main
1516
secrets: inherit
1617
with:
17-
projectnum: '12'
18+
projectnum: 12
1819

.github/workflows/issue-assign.yml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,11 @@
1-
name: "assign an issue when issue branch is created"
1+
name: "🙆 Assign issue"
22

33
on:
44
create:
55

66
jobs:
77
call-assign:
88
if: github.ref_type == 'branch'
9+
name: "🤙 Call assignment workflow"
910
uses: clamsproject/.github/.github/workflows/repo-issue-assign.yml@main
1011
secrets: inherit

.github/workflows/issue-close.yml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
name: "unassign all when an issue is closed"
1+
name: "🙅 Unassign assignees"
22

33
on:
44
issues:
@@ -10,5 +10,6 @@ on:
1010

1111
jobs:
1212
call-unassign:
13+
name: "🤙 Call unassignment workflow"
1314
uses: clamsproject/.github/.github/workflows/repo-issue-close.yml@main
1415
secrets: inherit

.github/workflows/publish.yml

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
name: "App Publisher: image>ghcr, metadata>appdir"
1+
name: "📦 Publish image>ghcr, metadata>appdir"
22

33
on:
44
workflow_dispatch:
@@ -14,33 +14,33 @@ on:
1414
jobs:
1515
set-version:
1616
runs-on: ubuntu-latest
17-
name: 📌 Set VERSION value
17+
name: "🏷 Set version value"
1818
outputs:
1919
version: ${{ steps.output_version.outputs.version }}
2020
steps:
21-
- name: set VERSION value from dispatch inputs
21+
- name: "📌 Set VERSION value from dispatch inputs"
2222
id: get_version_dispatch
2323
if: ${{ github.event_name == 'workflow_dispatch' }}
2424
run: echo "VERSION=${{ github.event.inputs.tag }}" >> $GITHUB_ENV
25-
- name: set VERSION value from pushed tag
25+
- name: "📌 Set VERSION value from pushed tag"
2626
id: get_version_tag
2727
if: ${{ github.event_name == 'push' }}
2828
run: echo "VERSION=$(echo "${{ github.ref }}" | cut -d/ -f3)" >> $GITHUB_ENV
29-
- name: output result into an env-var
29+
- name: "🏷 Record VERSION for follow-up jobs"
3030
id: output_version
3131
run: |
3232
echo "version=${{ env.VERSION }}" >> $GITHUB_OUTPUT
3333
publish-image:
3434
needs: ['set-version']
35-
name: 🐳 Build and deploy to a container repository
35+
name: "🤙 Call app container workflow"
3636
uses: clamsproject/.github/.github/workflows/app-container.yml@main
3737
secrets: inherit
3838
with:
3939
version: ${{ needs.set-version.outputs.version }}
4040
arm64: false
4141
register-appdir:
4242
needs: ['set-version', 'publish-image']
43-
name: 📝 Register to CLASM app directory
43+
name: "🤙 Call app registration workflow"
4444
uses: clamsproject/apps/.github/workflows/register.yml@main
4545
secrets: inherit
4646
with:

.gitignore

Lines changed: 0 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -162,10 +162,3 @@ dmypy.json
162162
# ctag generated file
163163
tags
164164
.tags
165-
166-
/en.tar.gz
167-
/en
168-
/dbpedia-spotlight-1.0.0.jar
169-
/spotlight-model_lang=en.tar.gz
170-
/venv
171-

Containerfile

Lines changed: 23 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,32 @@
1-
FROM ghcr.io/clamsproject/clams-python:0.6.0
1+
# Use the same base image version as the clams-python python library version
2+
FROM ghcr.io/clamsproject/clams-python:1.0.7
3+
# See https://github.com/orgs/clamsproject/packages?tab=packages&q=clams-python for more base images
4+
# IF you want to automatically publish this image to the clamsproject organization,
5+
# 1. you should have generated this template without --no-github-actions flag
6+
# 1. to add arm64 support, change relevant line in .github/workflows/container.yml
7+
# * NOTE that a lots of software doesn't install/compile or run on arm64 architecture out of the box
8+
# * make sure you locally test the compatibility of all software dependencies before using arm64 support
9+
# 1. use a git tag to trigger the github action. You need to use git tag to properly set app version anyway
210

11+
################################################################################
12+
# DO NOT EDIT THIS SECTION
313
ARG CLAMS_APP_VERSION
414
ENV CLAMS_APP_VERSION ${CLAMS_APP_VERSION}
15+
################################################################################
516

17+
################################################################################
18+
# clams-python base images are based on debian distro
19+
# install more system packages as needed using the apt manager
20+
################################################################################
21+
22+
################################################################################
23+
# main app installation
624
COPY ./ /app
725
WORKDIR /app
826
RUN pip3 install -r requirements.txt
927

28+
RUN python3 -m spacy download en_core_web_sm
29+
30+
# default command to run the CLAMS app in a production server
1031
CMD ["python3", "app.py", "--production"]
32+
################################################################################

README.md

Lines changed: 46 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,48 @@
1-
## User instruction
1+
# Spacy NLP Service
22

3-
General user instructions for CLAMS apps is available at [CLAMS Apps documentation](https://apps.clams.ai/clamsapp).
3+
The spaCy NLP tool wrapped as a CLAMS service, spaCy is distributed under the [MIT license](https://github.com/explosion/spaCy/blob/master/LICENSE).
44

5+
This requires Python 3.8 or higher. For local install of required Python modules see [requirements.txt](requirements.txt).
6+
7+
## Using this service
8+
9+
Use `python app.py -t example-mmif.json out.json` just to test the wrapping code without using a server. To test this using a server you run the app as a service in one terminal:
10+
11+
```bash
12+
$ python app.py
13+
```
14+
15+
And poke at it from another:
16+
17+
```bash
18+
$ curl http://0.0.0.0:5000/
19+
$ curl -H "Accept: application/json" -X POST [email protected] http://0.0.0.0:5000/
20+
```
21+
22+
In CLAMS you usually run this in a container. To create an image
23+
24+
```bash
25+
$ docker build -f Containerfile -t clams-spacy-wrapper .
26+
```
27+
28+
And to run it as a container:
29+
30+
```bash
31+
$ docker run --rm -d -p 5000:5000 clams-spacy-wrapper
32+
$ curl -H "Accept: application/json" -X POST [email protected] http://0.0.0.0:5000/
33+
```
34+
35+
The spaCy code will run on each text document in the input MMIF file. The file `example-mmif.json` has one text document in the top level `documents` property and two text documents in one of the views. The text documents all look as follows:
36+
37+
```json
38+
{
39+
"@type": "http://mmif.clams.ai/0.4.0/vocabulary/TextDocument",
40+
"properties": {
41+
"id": "m2",
42+
"text": {
43+
"@value": "Hello, this is Jim Lehrer with the NewsHour on PBS...."
44+
}
45+
}
46+
}
47+
```
48+
Instead of a `text:@value` property the text could in an external file, which would be given as a URI in the `location` property. See the readme file in [https://github.com/clamsproject/app-nlp-example](https://github.com/clamsproject/app-nlp-example) on how to do this.

app.py

Lines changed: 41 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -1,56 +1,58 @@
1-
"""app.py
2-
3-
Wrapping Spacy NLP to extract tokens, tags, lemmas, sentences, chunks and named
4-
entities.
5-
6-
Usage:
7-
8-
$ python app.py -t example-mmif.json out.json
9-
$ python app.py [--develop]
10-
11-
The first invocation is to just test the app without running a server. The
12-
second is to start a server, which you can ping with
1+
"""
2+
DELETE THIS MODULE STRING AND REPLACE IT WITH A DESCRIPTION OF YOUR APP.
133
14-
$ curl -H "Accept: application/json" -X POST [email protected] http://0.0.0.0:5000/
4+
app.py Template
155
16-
With the --develop option you get a FLask server running in development mode,
17-
without it Gunicorn will be used for a more stable server.
6+
The app.py script does several things:
7+
- import the necessary code
8+
- create a subclass of ClamsApp that defines the metadata and provides a method to run the wrapped NLP tool
9+
- provide a way to run the code as a RESTful Flask service
1810
19-
Normally you would run this in a Docker container, see README.md.
2011
2112
"""
2213

2314
import argparse
2415
from typing import Union
2516

26-
import spacy
27-
from clams.app import ClamsApp
28-
from clams.restify import Restifier
17+
# Imports needed for Clams and MMIF.
18+
# Non-NLP Clams applications will require AnnotationTypes
19+
20+
from clams import ClamsApp, Restifier
21+
from mmif import Mmif, View, Annotation, Document, AnnotationTypes, DocumentTypes
22+
23+
# For an NLP tool we need to import the LAPPS vocabulary items
2924
from lapps.discriminators import Uri
30-
from mmif.serialize import Mmif
31-
from mmif.vocabulary import DocumentTypes
32-
from spacy.tokens import Doc
3325

26+
# Spacy imports
27+
import spacy
28+
from spacy.tokens import Doc
3429

3530
class SpacyWrapper(ClamsApp):
3631

3732
def __init__(self):
3833
super().__init__()
39-
# Load small English core model
34+
# load small English core model
4035
self.nlp = spacy.load("en_core_web_sm")
4136

4237
def _appmetadata(self):
38+
# see metadata.py
4339
pass
4440

4541
def _annotate(self, mmif: Union[str, dict, Mmif], **parameters) -> Mmif:
46-
for doc in mmif.get_documents_by_type(DocumentTypes.TextDocument):
42+
if type(mmif) == Mmif:
43+
44+
mmif_obj = mmif
45+
else:
46+
mmif_obj = Mmif(mmif)
47+
48+
for doc in mmif_obj.get_documents_by_type(DocumentTypes.TextDocument):
4749
in_doc = None
4850
tok_idx = {}
49-
if 'pretokenized' in parameters and parameters['pretokenized']:
50-
for view in mmif.get_views_for_document(doc.id):
51+
if 'pretokenizd' in parameters and parameters['pretokenized']:
52+
for view in mmif_obj.get_Views_for_document(doc.id):
5153
if Uri.TOKEN in view.metadata.contains:
52-
tokens = [token.properties['text'] for token in view.get_annotations(Uri.TOKEN)]
53-
tok_idx = {i: f'{view.id}:{token.id}'
54+
tokens = [token.get_property('text') for token in view.get_annotations(Uri.TOKEN)]
55+
tok_idx = {i : f'{view.id}:{token.id}'
5456
for i, token in enumerate(view.get_annotations(Uri.TOKEN))}
5557
in_doc = Doc(self.nlp.vocab, tokens)
5658
self.nlp.add_pipe("sentencizer")
@@ -59,19 +61,18 @@ def _annotate(self, mmif: Union[str, dict, Mmif], **parameters) -> Mmif:
5961
if in_doc is None:
6062
in_doc = doc.text_value if not doc.location else open(doc.location_path()).read()
6163
in_doc = self.nlp(in_doc)
62-
64+
6365
did = f'{doc.parent}:{doc.id}' if doc.parent else doc.id
6466
view = mmif.new_view()
65-
self.sign_view(view)
66-
for attype in (Uri.TOKEN, Uri.POS, Uri.LEMMA,
67-
Uri.NCHUNK, Uri.SENTENCE, Uri.NE):
67+
self.sign_view(view, parameters)
68+
for attype in (Uri.TOKEN, Uri.POS, Uri.LEMMA, Uri.NCHUNK, Uri.SENTENCE, Uri.NE):
6869
view.new_contain(attype, document=did)
69-
70+
7071
for n, tok in enumerate(in_doc):
7172
a = view.new_annotation(Uri.TOKEN)
7273
if n not in tok_idx:
73-
a.add_property('start', tok.idx)
74-
a.add_property('end', tok.idx + len(tok.text))
74+
a.add_property("start", tok.idx)
75+
a.add_property("end", tok.idx + len(tok_idx))
7576
tok_idx[n] = a.id
7677
else:
7778
a.add_property('targets', [tok_idx[n]])
@@ -86,11 +87,9 @@ def _annotate(self, mmif: Union[str, dict, Mmif], **parameters) -> Mmif:
8687
a.add_property('text', segment.text)
8788
if segment.label_:
8889
a.add_property('category', segment.label_)
90+
return mmif_obj
8991

90-
return mmif
91-
92-
93-
def test(infile, outfile):
92+
def _test(infile, outfile):
9493
"""Run spacy on an input MMIF file. This bypasses the server and just pings
9594
the annotate() method on the SpacyWrapper class. Prints a summary of the views
9695
in the end result."""
@@ -118,14 +117,16 @@ def test(infile, outfile):
118117
parsed_args = parser.parse_args()
119118

120119
if parsed_args.test:
121-
test(parsed_args.infile, parsed_args.outfile)
120+
_test(parsed_args.infile, parsed_args.outfile)
122121
else:
123122
# create the app instance
124123
app = SpacyWrapper()
125124

126125
http_app = Restifier(app, port=int(parsed_args.port)
127126
)
127+
# for running the application in production mode
128128
if parsed_args.production:
129129
http_app.serve_production()
130+
# development mode
130131
else:
131132
http_app.run()

0 commit comments

Comments
 (0)