Skip to content

Commit 17c8608

Browse files
authored
LabelBot should take into account all comments on an issue. (kubeflow#138)
* LabelBot should take into account all comments on an issue. * As described in kubeflow#133 as people comment on an issue; label bot should take these additional comments into account when predicting labels. * Hopefully these additional comments will lead to better predictions as they will contain valuable information. To support this: * get_issue should get all comments (not just the body) * We also need to get any labels that have been explicitly removed as well as any labels already on the issue. We need this because we want to take into account multiple comments and not just the first one when predicting labels. * Since we are going to add additional labels based on additional comments we want to be sure not to add back labels which were explicitly removed. * issue_label_predictor should filter out labels which have already been applied or any labels which have been explicitly removed. This is necessary to ensure we don't spam the issue when we allow the bot to comment not just in response to the first comment but additional comments. * Likewise, we only want to apply the comment about not being able to label an issue once. So we need to check if the label bot has already commented on the issue. * Update the readme to account for the new staging and prod environments for the front end as described in machine-learning-apps/Issue-Label-Bot#57 * Fix log messages. * Update prod to use a newly built image.
1 parent 18449a2 commit 17c8608

File tree

11 files changed

+392
-146
lines changed

11 files changed

+392
-146
lines changed

Label_Microservice/.build/prod/extensions_v1beta1_deployment_label-bot-worker.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ spec:
4242
value: "27079"
4343
- name: GITHUB_APP_PEM_KEY
4444
value: /var/secrets/github/issue-label-bot-github-app.private-key.pem
45-
image: gcr.io/issue-label-bot-dev/bot-worker:011a589
45+
image: gcr.io/issue-label-bot-dev/bot-worker:6848ad6
4646
name: app
4747
resources:
4848
requests:

Label_Microservice/README.md

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,7 @@ The following describes the GCP projects and clusters where the two services are
6262
- **repository**: [machine-learning-apps/Issue-Label-Bot](https://github.com/machine-learning-apps/Issue-Label-Bot)
6363
- **GCP project**: github-probots
6464
- **cluster**: kf-ci-ml
65-
- **namespace**: mlapp
65+
- **namespace**: label-bot-prod
6666
- **yaml files**: [deployment](https://github.com/machine-learning-apps/Issue-Label-Bot/tree/master/deployment)
6767

6868
1. Repo-specific label microservice
@@ -76,9 +76,9 @@ The following describes the GCP projects and clusters where the two services are
7676

7777
1. The flask app
7878
- **repository**: [machine-learning-apps/Issue-Label-Bot](https://github.com/machine-learning-apps/Issue-Label-Bot)
79-
- **GCP project**: issue-label-bot-dev
80-
- **cluster**: github-mlapp-test
81-
- **namespace**: mlapp
79+
- **GCP project**: github-probots
80+
- **cluster**: kf-ci-ml
81+
- **namespace**: label-bot-dev
8282
- **yaml files**: [deployment](https://github.com/machine-learning-apps/Issue-Label-Bot/tree/master/deployment)
8383

8484
1. Repo-specific label microservice
@@ -88,6 +88,10 @@ The following describes the GCP projects and clusters where the two services are
8888
- **namespace**: default
8989
- **yaml files**: [Label\_Microservice/deployment](https://github.com/kubeflow/code-intelligence/tree/master/Label_Microservice/deployment)
9090

91+
1, GitHub bot - **kf-label-bot-dev**
92+
93+
- see [kubeflow/code-intelligence#84](https://github.com/kubeflow/code-intelligence/issues/84) for information on the setup
94+
- see [machine-learning-apps/Issue-Label-Bot#57](https://github.com/machine-learning-apps/Issue-Label-Bot/issues/57)
9195

9296
## Instructions
9397

Label_Microservice/deployment/overlays/prod/kustomization.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,4 +10,4 @@ resources:
1010
images:
1111
- name: gcr.io/issue-label-bot-dev/bot-worker
1212
newName: gcr.io/issue-label-bot-dev/bot-worker
13-
newTag: 011a589
13+
newTag: 6848ad6

py/code_intelligence/embeddings.py

Lines changed: 3 additions & 56 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,7 @@ def get_issue_text(num, idx, owner, repo, skip_issue=True):
4242
dict
4343
{'title':str, 'body':str}
4444
"""
45+
logging.warning("get_issue_text is deprecated; use github_util.get_issue")
4546
url = f'https://github.com/{owner}/{repo}/issues/{num}'
4647
status_code = requests.head(url).status_code
4748
if status_code != 200:
@@ -73,60 +74,6 @@ def get_issue_text(num, idx, owner, repo, skip_issue=True):
7374
'labels': labels,
7475
'num': num}
7576

76-
# TODO(https://github.com/kubeflow/code-intelligence/issues/126): This function should replace
77-
# get_issue_text
78-
def get_issue(url, gh_client):
79-
"""Fetch the issue data using GraphQL
80-
81-
Args:
82-
url: Url of the GitHub isue to fetch
83-
gh_client: GitHub GraphQl client.
84-
85-
Returns
86-
------
87-
dict
88-
{'title':str, 'body':str}
89-
"""
90-
issue_query = """query getIssue($url: URI!) {
91-
resource(url: $url) {
92-
__typename
93-
... on Issue {
94-
author {
95-
__typename
96-
... on User {
97-
login
98-
}
99-
... on Bot {
100-
login
101-
}
102-
}
103-
id
104-
title
105-
body
106-
url
107-
state
108-
labels(first: 30) {
109-
totalCount
110-
edges {
111-
node {
112-
name
113-
}
114-
}
115-
}
116-
}
117-
}
118-
}"""
119-
120-
variables = {
121-
"url": url,
122-
}
123-
issue_results = gh_client.run_query(issue_query, variables)
124-
125-
if "errors" in issue_results:
126-
logging.error(f"There was a problem running the github query; {issue_results['errors']}")
127-
raise ValueError(f"There was a problem running the github query: {issue_results['errors']}")
128-
return issue_results["data"]["resource"]
129-
13077
def get_all_issue_text(owner, repo, inf_wrapper, workers=64):
13178
"""
13279
Prepare embedding features of all issues in a given repository.
@@ -191,9 +138,9 @@ def load_model_artifact(model_url, local_dir=None):
191138
if not local_dir:
192139
home = str(Path.home())
193140
local_dir = os.path.join(home, "model_files")
194-
141+
195142
full_path = os.path.join(local_dir, 'model.pkl')
196-
143+
197144
if not full_path.exists():
198145
logging.info('Loading model.')
199146
path.mkdir(exist_ok=True)

py/code_intelligence/github_util.py

Lines changed: 201 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -1,57 +1,212 @@
1+
import fire
12
import os
23
import logging
34
from code_intelligence import github_app
45
import typing
56
import yaml
67

78
def get_issue_handle(installation_id, username, repository, number):
8-
"get an issue object."
9-
ghapp = github_app.GitHubApp.create_from_env()
10-
install = ghapp.get_installation(installation_id)
11-
return install.issue(username, repository, number)
9+
"get an issue object."
10+
ghapp = github_app.GitHubApp.create_from_env()
11+
install = ghapp.get_installation(installation_id)
12+
return install.issue(username, repository, number)
1213

1314
def get_yaml(owner, repo, ghapp=None):
14-
"""
15-
Looks for the yaml file in a /.github directory.
16-
17-
yaml file must be named issue_label_bot.yaml
18-
"""
19-
20-
if not ghapp:
21-
# TODO(jlewi): Should we deprecate this code path and always pass
22-
# in the github app?
23-
ghapp = github_app.GitHubApp.create_from_env()
24-
25-
try:
26-
# get the app installation handle
27-
inst_id = ghapp.get_installation_id(owner=owner, repo=repo)
28-
inst = ghapp.get_installation(installation_id=inst_id)
29-
# get the repo handle, which allows you got get the file contents
30-
repo = inst.repository(owner=owner, repository=repo)
31-
results = repo.file_contents('.github/issue_label_bot.yaml').decoded
32-
# TODO(jlewi): We should probably catching more narrow exceptions and
33-
# not swallowing all exceptions. The exceptions we should swallow are
34-
# the ones related to the configuration file not existing.
35-
except Exception as e:
36-
logging.info(f"Exception occured getting .github/issue_label_bot.yaml: {e}")
37-
return None
38-
39-
return yaml.safe_load(results)
15+
"""
16+
Looks for the yaml file in a /.github directory.
17+
18+
yaml file must be named issue_label_bot.yaml
19+
"""
20+
21+
if not ghapp:
22+
# TODO(jlewi): Should we deprecate this code path and always pass
23+
# in the github app?
24+
ghapp = github_app.GitHubApp.create_from_env()
25+
26+
try:
27+
# get the app installation handle
28+
inst_id = ghapp.get_installation_id(owner=owner, repo=repo)
29+
inst = ghapp.get_installation(installation_id=inst_id)
30+
# get the repo handle, which allows you got get the file contents
31+
repo = inst.repository(owner=owner, repository=repo)
32+
results = repo.file_contents('.github/issue_label_bot.yaml').decoded
33+
# TODO(jlewi): We should probably catching more narrow exceptions and
34+
# not swallowing all exceptions. The exceptions we should swallow are
35+
# the ones related to the configuration file not existing.
36+
except Exception as e:
37+
logging.info(f"Exception occured getting .github/issue_label_bot.yaml: {e}")
38+
return None
39+
40+
return yaml.safe_load(results)
4041

4142
def build_issue_doc(org:str, repo:str, title:str, text:typing.List[str]):
42-
"""Build a document string out of various github features.
43-
44-
Args:
45-
org: The organization the issue belongs in
46-
repo: The repository.
47-
title: Issue title
48-
text: List of contents of the comments on the issue
49-
50-
Returns:
51-
content: The document to classify
52-
"""
53-
pieces = [title]
54-
pieces.append(f"{org.lower()}_{repo.lower()}")
55-
pieces.extend(text)
56-
content = "\n".join(pieces)
57-
return content
43+
"""Build a document string out of various github features.
44+
45+
Args:
46+
org: The organization the issue belongs in
47+
repo: The repository.
48+
title: Issue title
49+
text: List of contents of the comments on the issue
50+
51+
Returns:
52+
content: The document to classify
53+
"""
54+
pieces = [title]
55+
pieces.append(f"{org.lower()}_{repo.lower()}")
56+
pieces.extend(text)
57+
content = "\n".join(pieces)
58+
return content
59+
60+
# TODO(https://github.com/kubeflow/code-intelligence/issues/126): This function should replace
61+
# get_issue_text
62+
def get_issue(url, gh_client):
63+
"""Fetch the issue data using GraphQL.
64+
65+
Args:
66+
url: Url of the GitHub isue to fetch
67+
gh_client: GitHub GraphQl client.
68+
69+
Returns
70+
------
71+
dict
72+
{'title':str,
73+
'comments':List[str]
74+
'labels': List[str]
75+
'removed_labels': List[str]}
76+
77+
comments is a list of comments. The first one will be the body of the issue.
78+
79+
labels: Labels currently on the issue
80+
removed_labels: Labels that have been removed
81+
"""
82+
83+
# The "!" means the variable can't be null. We allow the cursors
84+
# to be null so that on the first call we fetch the first couple items.
85+
issue_query = """query getIssue($url: URI!, $labelCursor: String, $timelineCursor: String, $commentsCursor: String) {
86+
resource(url: $url) {
87+
__typename
88+
... on Issue {
89+
author {
90+
__typename
91+
... on User {
92+
login
93+
}
94+
... on Bot {
95+
login
96+
}
97+
}
98+
id
99+
title
100+
body
101+
url
102+
state
103+
comments(first: 100, after: $commentsCursor) {
104+
totalCount
105+
edges {
106+
node {
107+
author {
108+
login
109+
}
110+
body
111+
}
112+
}
113+
pageInfo {
114+
hasNextPage
115+
endCursor
116+
}
117+
}
118+
timelineItems(first: 100, itemTypes: [UNLABELED_EVENT], after: $timelineCursor) {
119+
totalCount
120+
edges {
121+
node {
122+
__typename
123+
... on UnlabeledEvent {
124+
createdAt
125+
label {
126+
name
127+
}
128+
}
129+
}
130+
}
131+
pageInfo {
132+
hasNextPage
133+
endCursor
134+
}
135+
}
136+
labels(first: 100, after: $labelCursor) {
137+
totalCount
138+
pageInfo {
139+
hasNextPage
140+
endCursor
141+
}
142+
edges {
143+
node {
144+
name
145+
}
146+
}
147+
}
148+
}
149+
}
150+
}"""
151+
152+
variables = {
153+
"url": url,
154+
"labelCursor": None,
155+
"commentsCursor": None,
156+
"timelineCurosr": None,
157+
}
158+
159+
has_more = True
160+
161+
result = {
162+
"title": None,
163+
"comments": [],
164+
"comment_authors": [],
165+
"labels": set(),
166+
"removed_labels": set(),
167+
}
168+
while has_more:
169+
issue_results = gh_client.run_query(issue_query, variables)
170+
171+
if "errors" in issue_results:
172+
logging.error(f"There was a problem running the github query; {issue_results['errors']}")
173+
raise ValueError(f"There was a problem running the github query: {issue_results['errors']}")
174+
175+
issue = issue_results["data"]["resource"]
176+
177+
# Only set the title once on the first call
178+
if not result["title"]:
179+
result["title"] = issue["title"]
180+
181+
if not result["comments"]:
182+
result["comments"].append(issue["body"])
183+
result["comment_authors"].append(issue["author"]["login"])
184+
185+
for e in issue["comments"]["edges"]:
186+
node = e["node"]
187+
result["comments"].append(node["body"])
188+
result["comment_authors"].append(node["author"]["login"])
189+
190+
for e in issue["labels"]["edges"]:
191+
node = e["node"]
192+
result["labels"].add(node["name"])
193+
194+
for e in issue["timelineItems"]["edges"]:
195+
node = e["node"]
196+
result["removed_labels"].add(node["label"]["name"])
197+
198+
has_more = False
199+
200+
for f in ["comments", "labels", "timelineItems"]:
201+
has_more = has_more or issue[f].get("pageInfo").get("hasNextPage")
202+
203+
variables["labelCursor"] = issue["labels"]["pageInfo"]["endCursor"]
204+
variables["commentsCursor"] = issue["comments"]["pageInfo"]["endCursor"]
205+
variables["timelineCursor"] = issue["timelineItems"]["pageInfo"]["endCursor"]
206+
207+
# For removed_labels we only want labels that were permanently removed
208+
result["removed_labels"] = result["removed_labels"] - result["labels"]
209+
210+
result["labels"] = list(result["labels"])
211+
result["removed_labels"] = list(result["removed_labels"])
212+
return result

0 commit comments

Comments
 (0)