Skip to content

Commit 18449a2

Browse files
authored
Use the AutoML Model and the universal model to generate predictions. (kubeflow#134)
Create an AutoML Model class to generate predictions using AutoML. * Create a new class/module automl_model to generate predictions using AutoML models * Change the function signature of predict_labels to include org and repo because we want org and repo to be features because they can be highly informative. * Also change predict_issue_labels to take in a list of strings for the body text because in follow on PRs we will start taking into additional comments and not just the first one. * Define github_util.build_issue_doc to construct a text document out of the various features. Testing * Dev instance successfully used AutoML model. kubeflow#131 (comment) * Check in hydrated configs for prod. * prod has also been updated and looks to be using the new model correctly. Related issues: * Hopefully this model is an improvement. Miscellaneous changes Add logging and monitoring instructions. Update automl notebook to use the new code to build an issue.
1 parent 09bc395 commit 18449a2

20 files changed

+544
-37
lines changed

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,8 @@
66
!.gitignore
77
!.dockerignore
88
**/flask_session
9+
**/.cache
10+
**/.data
911
build/**
1012
fairing/__pycache__/**
1113
**/__pycache__/**
Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
apiVersion: extensions/v1beta1
2+
kind: Deployment
3+
metadata:
4+
labels:
5+
app: label-bot
6+
environment: prod
7+
service: label-bot
8+
name: label-bot-worker
9+
namespace: label-bot-prod
10+
spec:
11+
replicas: 5
12+
selector:
13+
matchLabels:
14+
app: label-bot
15+
environment: prod
16+
service: label-bot
17+
template:
18+
metadata:
19+
labels:
20+
app: label-bot
21+
environment: prod
22+
service: label-bot
23+
spec:
24+
containers:
25+
- command:
26+
- python3
27+
- -m
28+
- label_microservice.worker
29+
- subscribe_from_env
30+
env:
31+
- name: PORT
32+
value: "80"
33+
- name: ISSUE_EMBEDDING_SERVICE
34+
value: http://issue-embedding-server
35+
- name: PROJECT
36+
value: issue-label-bot-dev
37+
- name: ISSUE_EVENT_TOPIC
38+
value: event_queue
39+
- name: ISSUE_EVENT_SUBSCRIPTION
40+
value: label_bot_prod
41+
- name: GITHUB_APP_ID
42+
value: "27079"
43+
- name: GITHUB_APP_PEM_KEY
44+
value: /var/secrets/github/issue-label-bot-github-app.private-key.pem
45+
image: gcr.io/issue-label-bot-dev/bot-worker:011a589
46+
name: app
47+
resources:
48+
requests:
49+
cpu: "4"
50+
memory: 4Gi
51+
volumeMounts:
52+
- mountPath: /var/secrets/github
53+
name: github-app
54+
restartPolicy: Always
55+
serviceAccountName: default-editor
56+
volumes:
57+
- name: github-app
58+
secret:
59+
secretName: github-app
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
apiVersion: v1
2+
kind: Service
3+
metadata:
4+
labels:
5+
app: label-bot
6+
environment: prod
7+
service: label-bot
8+
name: label-bot-worker
9+
namespace: label-bot-prod
10+
spec:
11+
ports:
12+
- name: http
13+
port: 80
14+
protocol: TCP
15+
targetPort: 80
16+
selector:
17+
app: label-bot
18+
environment: prod
19+
service: label-bot
20+
type: ClusterIP

Label_Microservice/Makefile

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
2+
CONTEXT=issue-label-bot
3+
4+
hydrate-prod:
5+
rm -rf .build/prod
6+
mkdir -p .build/prod
7+
kustomize build -o .build/prod deployment/overlays/prod
8+
9+
apply-prod: hydrate-prod
10+
kubectl --context=$(CONTEXT) apply -f .build/prod

Label_Microservice/deployment/overlays/dev/kustomization.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,9 @@ kind: Kustomization
33
bases:
44
- ../../base
55
images:
6-
- digest: sha256:cb2b2e604d4056b78ecd51d7113de04ebfa60e542310265b3871e7873417e34a
6+
- #digest: sha256:cb2b2e604d4056b78ecd51d7113de04ebfa60e542310265b3871e7873417e34a
77
name: gcr.io/issue-label-bot-dev/bot-worker
8-
newName: gcr.io/issue-label-bot-dev/bot-worker:3a82547
8+
#newName: gcr.io/issue-label-bot-dev/bot-worker:3a82547
99
commonLabels:
1010
environment: dev
1111
namespace: label-bot-dev

Label_Microservice/deployment/overlays/prod/kustomization.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,4 +10,4 @@ resources:
1010
images:
1111
- name: gcr.io/issue-label-bot-dev/bot-worker
1212
newName: gcr.io/issue-label-bot-dev/bot-worker
13-
newTag: 79cd85a-dirty
13+
newTag: 011a589

Label_Microservice/deployment/requirements.worker.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ google-api-core==1.14.2
1111
google-api-python-client==1.7.10
1212
google-auth==1.6.3
1313
google-auth-httplib2==0.0.3
14+
google-cloud-automl==0.10.0
1415
#google-cloud-bigquery==1.17.0
1516
google-cloud-core==1.0.3
1617
google-cloud-pubsub==0.45.0

Label_Microservice/developer_guide.md

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -68,19 +68,29 @@ Setup a namespace for your development
6868
1. Send a prediction request using pubsub
6969

7070
```
71-
python -m label_microservice.py --issue=kubeflow/kubeflow#4602
71+
python -m label_microservice.cli label-issue --issue=kubeflow/kubeflow#4602 --topic=projects/issue-label-bot-dev/topics/TEST_event_queue
7272
```
7373

7474
* Look at the logs of the pod to see the prediction
7575
* Ensure that you don't have other pods using the same pubsub subscription; otherwise your item might not get handled by the pod you are looking at
7676

7777

78+
1. Get pod logs
79+
80+
```
81+
python -m label_microservice.cli pod-logs --pod=<pod name>
82+
```
83+
84+
* This will pretty print the json logs which is easier to read.
85+
7886
1. Ensure your kubeconfig context sets the namespace to the namespace skaffold is deploying in; otherwise file sync and log streaming doesn't seem to work.
7987

8088
## Unresolved Issues
8189

8290
* skaffold continuous mode (`skaffold dev` ) doesn't appear to detect changes in the python files and retrigger the build and deployment
8391

92+
* skaffold doesn't appear to substitute the newly built image into the kustomize package
93+
8494

8595
### Kaniko Image Caching
8696

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
# Logging and Monitoring
2+
3+
4+
## Stackdriver logs
5+
6+
* Label bot workers use structured json logs
7+
* You can search the logs in stackdrive some examples below
8+
* There is also a BigQuery sink for the stackdriver logs to facilitate analysis and querying
9+
10+
11+
Use a label like the following to see messages for
12+
a specific issue
13+
14+
```
15+
jsonPayload.repo_owner = "kubeflow"
16+
jsonPayload.repo_name = "code-intelligence"
17+
jsonPayload.issue_num = "132"
18+
resource.labels.namespace_name = "label-bot-prod"
19+
```

Label_Microservice/notebooks/automl.ipynb

Lines changed: 133 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8351,7 +8351,8 @@
83518351
" blob = bucket.blob(obj_path)\n",
83528352
" \n",
83538353
" # Include the owner and repo in the text body because it is predictive\n",
8354-
" blob.upload_from_string(issue[\"title\"] + \"\\n\" + owner_repo + \"\\n\" + issue[\"body\"])\n",
8354+
" doc = github_util.build_issue_doc(owner, repo, issue[\"title\"], [issue[\"body\"]])\n",
8355+
" blob.upload_from_string(doc)\n",
83558356
" logging.info(f\"Created {target}\")\n",
83568357
"\n",
83578358
" info.iloc[i][\"url\"] = target \n",
@@ -8674,6 +8675,26 @@
86748675
"model_name = result.name"
86758676
]
86768677
},
8678+
{
8679+
"cell_type": "code",
8680+
"execution_count": 55,
8681+
"metadata": {},
8682+
"outputs": [
8683+
{
8684+
"data": {
8685+
"text/plain": [
8686+
"'projects/976279526634/locations/us-central1/models/TCN654213816573231104'"
8687+
]
8688+
},
8689+
"execution_count": 55,
8690+
"metadata": {},
8691+
"output_type": "execute_result"
8692+
}
8693+
],
8694+
"source": [
8695+
"model_name"
8696+
]
8697+
},
86778698
{
86788699
"cell_type": "code",
86798700
"execution_count": 39,
@@ -8790,6 +8811,117 @@
87908811
" )\n",
87918812
" )"
87928813
]
8814+
},
8815+
{
8816+
"cell_type": "code",
8817+
"execution_count": 57,
8818+
"metadata": {},
8819+
"outputs": [
8820+
{
8821+
"data": {
8822+
"text/plain": [
8823+
"google.protobuf.pyext._message.RepeatedCompositeContainer"
8824+
]
8825+
},
8826+
"execution_count": 57,
8827+
"metadata": {},
8828+
"output_type": "execute_result"
8829+
}
8830+
],
8831+
"source": [
8832+
"response.payload.__class__"
8833+
]
8834+
},
8835+
{
8836+
"cell_type": "code",
8837+
"execution_count": null,
8838+
"metadata": {},
8839+
"outputs": [],
8840+
"source": [
8841+
"automl.types"
8842+
]
8843+
},
8844+
{
8845+
"cell_type": "code",
8846+
"execution_count": 61,
8847+
"metadata": {},
8848+
"outputs": [],
8849+
"source": [
8850+
"from google.cloud.automl import types as automl_types"
8851+
]
8852+
},
8853+
{
8854+
"cell_type": "code",
8855+
"execution_count": 74,
8856+
"metadata": {},
8857+
"outputs": [],
8858+
"source": [
8859+
"predict_response = automl_types.PredictResponse()"
8860+
]
8861+
},
8862+
{
8863+
"cell_type": "code",
8864+
"execution_count": 77,
8865+
"metadata": {},
8866+
"outputs": [],
8867+
"source": [
8868+
"predict_response.payload.append(annotation)"
8869+
]
8870+
},
8871+
{
8872+
"cell_type": "code",
8873+
"execution_count": 78,
8874+
"metadata": {},
8875+
"outputs": [
8876+
{
8877+
"data": {
8878+
"text/plain": [
8879+
"[classification {\n",
8880+
" score: 0.8999999761581421\n",
8881+
"}\n",
8882+
"display_name: \"area-jupyter\"\n",
8883+
"]"
8884+
]
8885+
},
8886+
"execution_count": 78,
8887+
"metadata": {},
8888+
"output_type": "execute_result"
8889+
}
8890+
],
8891+
"source": [
8892+
"predict_response.payload"
8893+
]
8894+
},
8895+
{
8896+
"cell_type": "code",
8897+
"execution_count": 67,
8898+
"metadata": {},
8899+
"outputs": [
8900+
{
8901+
"data": {
8902+
"text/plain": [
8903+
"google.cloud.automl_v1.types.AnnotationPayload"
8904+
]
8905+
},
8906+
"execution_count": 67,
8907+
"metadata": {},
8908+
"output_type": "execute_result"
8909+
}
8910+
],
8911+
"source": [
8912+
"annotation_payload.__class__"
8913+
]
8914+
},
8915+
{
8916+
"cell_type": "code",
8917+
"execution_count": 70,
8918+
"metadata": {},
8919+
"outputs": [],
8920+
"source": [
8921+
"annotation = automl_types.AnnotationPayload()\n",
8922+
"annotation.display_name = \"area-jupyter\"\n",
8923+
"annotation.classification.score = .9"
8924+
]
87938925
}
87948926
],
87958927
"metadata": {

py/code_intelligence/github_util.py

Lines changed: 24 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
import os
22
import logging
33
from code_intelligence import github_app
4+
import typing
45
import yaml
56

67
def get_issue_handle(installation_id, username, repository, number):
@@ -28,7 +29,29 @@ def get_yaml(owner, repo, ghapp=None):
2829
# get the repo handle, which allows you got get the file contents
2930
repo = inst.repository(owner=owner, repository=repo)
3031
results = repo.file_contents('.github/issue_label_bot.yaml').decoded
31-
except:
32+
# TODO(jlewi): We should probably catching more narrow exceptions and
33+
# not swallowing all exceptions. The exceptions we should swallow are
34+
# the ones related to the configuration file not existing.
35+
except Exception as e:
36+
logging.info(f"Exception occured getting .github/issue_label_bot.yaml: {e}")
3237
return None
3338

3439
return yaml.safe_load(results)
40+
41+
def build_issue_doc(org:str, repo:str, title:str, text:typing.List[str]):
42+
"""Build a document string out of various github features.
43+
44+
Args:
45+
org: The organization the issue belongs in
46+
repo: The repository.
47+
title: Issue title
48+
text: List of contents of the comments on the issue
49+
50+
Returns:
51+
content: The document to classify
52+
"""
53+
pieces = [title]
54+
pieces.append(f"{org.lower()}_{repo.lower()}")
55+
pieces.extend(text)
56+
content = "\n".join(pieces)
57+
return content

0 commit comments

Comments
 (0)