-
Notifications
You must be signed in to change notification settings - Fork 326
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Collect stdlib benchmarks results #7599
Collect stdlib benchmarks results #7599
Conversation
std-bits/benchmarks/src/bench/java/org/enso/benchmarks/libs/LibBenchRunner.java
Outdated
Show resolved
Hide resolved
If we need to, it is rather straightforward to add a step to the benchmark job that not only uploads the results as a GH artifact, but also as a file to S3 - we already have/had such functionality. On S3 we can store the results indefinitely, if we want to. |
Upload to S3 would be very nice, that would solve a lot of problems. Do you know how to do that? Or refer me to the point when we used to do that? |
Co-authored-by: GregoryTravis <[email protected]>
I've been writing this using the Github Actions a long time ago. I think we are still doing this, but I'm less familiar with the new Rust CI structure - I think Lines 88 to 123 in b8edf6a
|
Thanks for the comments. I have created #7669 to track the results uploading. That would make the result processing much simpler. |
When I run this script on my Ubuntu 22.04 I get: $ python --version
Python 3.10.12
$ tools/performance/engine-benchmarks$ ./bench_download.py -s stdlib -b develop
Traceback (most recent call last):
File "tools/performance/engine-benchmarks/./bench_download.py", line 908, in <module>
asyncio.run(main())
File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "tools/performance/engine-benchmarks/./bench_download.py", line 814, in main
await get_bench_runs(since, until, branch, workflow_id)
File "tools/performance/engine-benchmarks/./bench_download.py", line 429, in get_bench_runs
async with asyncio.TaskGroup() as task_group:
AttributeError: module 'asyncio' has no attribute 'TaskGroup' I assume this is caused by the attempt to download the results in parallel that is using "too advanced" features of Python for my Unbuntu 22.04. I'd prefer slower download, but greater portability. |
Following patch makes the script work on my Python 3.10.2: tools/performance/engine-benchmarks$ git diff
diff --git tools/performance/engine-benchmarks/bench_download.py tools/performance/engine-benchmarks/bench_download.py
index 55aa752df8..8ec3fb4d82 100755
--- tools/performance/engine-benchmarks/bench_download.py
+++ tools/performance/engine-benchmarks/bench_download.py
@@ -45,7 +45,6 @@ Dependencies for the script:
- Used as a template engine for the HTML.
"""
-import asyncio
import json
import logging
import logging.config
@@ -81,7 +80,7 @@ except ModuleNotFoundError as err:
DATE_FORMAT = "%Y-%m-%d"
ENGINE_BENCH_WORKFLOW_ID = 29450898
"""
-Workflow ID of engine benchmarks, got via `gh api
+Workflow ID of engine benchmarks, got via `gh api
'/repos/enso-org/enso/actions/workflows'`.
The name of the workflow is 'Benchmark Engine'
"""
@@ -92,7 +91,7 @@ since 2023-08-22.
"""
STDLIBS_BENCH_WORKFLOW_ID = 66661001
"""
-Workflod ID of stdlibs benchmarks, got via `gh api
+Workflod ID of stdlibs benchmarks, got via `gh api
'/repos/enso-org/enso/actions/workflows'`.
The name is 'Benchmark Standard Libraries'
"""
@@ -297,7 +296,7 @@ def _read_json(json_file: str) -> Dict[Any, Any]:
return json.load(f)
-async def _invoke_gh_api(endpoint: str,
+def _invoke_gh_api(endpoint: str,
query_params: Dict[str, str] = {},
result_as_text: bool = True) -> Union[Dict[str, Any], bytes]:
query_str_list = [key + "=" + value for key, value in query_params.items()]
@@ -307,21 +306,18 @@ async def _invoke_gh_api(endpoint: str,
"api",
f"/repos/enso-org/enso{endpoint}" + ("" if len(query_str) == 0 else "?" + query_str)
]
- logging.info(f"Starting subprocess `{' '.join(cmd)}`")
- proc = await asyncio.create_subprocess_exec("gh", *cmd[1:],
- stdout=subprocess.PIPE,
- stderr=subprocess.PIPE)
- out, err = await proc.communicate()
- logging.info(f"Finished subprocess `{' '.join(cmd)}`")
- if proc.returncode != 0:
- print("Command `" + " ".join(cmd) + "` FAILED with errcode " + str(
- proc.returncode))
- print(err.decode())
- exit(proc.returncode)
- if result_as_text:
- return json.loads(out.decode())
- else:
- return out
+ logging.info(f"Running subprocess `{' '.join(cmd)}`")
+ try:
+ ret = subprocess.run(cmd, check=True, text=result_as_text, capture_output=True)
+ if result_as_text:
+ return json.loads(ret.stdout)
+ else:
+ return ret.stdout
+ except subprocess.CalledProcessError as err:
+ print("Command `" + " ".join(cmd) + "` FAILED with errcode " + str(err.returncode))
+ print(err.stdout)
+ print(err.stderr)
+ exit(err.returncode)
class Cache:
@@ -393,7 +389,7 @@ class FakeCache:
return 0
-async def get_bench_runs(since: datetime, until: datetime, branch: str, workflow_id: int) -> List[JobRun]:
+def get_bench_runs(since: datetime, until: datetime, branch: str, workflow_id: int) -> List[JobRun]:
"""
Fetches the list of all the job runs from the GH API for the specified `branch`.
"""
@@ -407,16 +403,16 @@ async def get_bench_runs(since: datetime, until: datetime, branch: str, workflow
# Start with 1, just to determine the total count
"per_page": "1"
}
- res = await _invoke_gh_api(f"/actions/workflows/{workflow_id}/runs", query_fields)
+ res = _invoke_gh_api(f"/actions/workflows/{workflow_id}/runs", query_fields)
total_count = int(res["total_count"])
per_page = 3
logging.debug(f"Total count of all runs: {total_count} for workflow ID "
f"{workflow_id}. Will process {per_page} runs per page")
- async def get_and_parse_run(page: int, parsed_bench_runs) -> None:
+ def get_and_parse_run(page: int, parsed_bench_runs) -> None:
_query_fields = query_fields.copy()
_query_fields["page"] = str(page)
- res = await _invoke_gh_api(f"/actions/workflows/{workflow_id}/runs", _query_fields)
+ res = _invoke_gh_api(f"/actions/workflows/{workflow_id}/runs", _query_fields)
bench_runs_json = res["workflow_runs"]
_parsed_bench_runs = [_parse_bench_run_from_json(bench_run_json)
for bench_run_json in bench_runs_json]
@@ -426,15 +422,13 @@ async def get_bench_runs(since: datetime, until: datetime, branch: str, workflow
query_fields["per_page"] = str(per_page)
num_queries = math.ceil(total_count / per_page)
parsed_bench_runs = []
- async with asyncio.TaskGroup() as task_group:
- # Page is indexed from 1
- for page in range(1, num_queries + 1):
- task_group.create_task(get_and_parse_run(page, parsed_bench_runs))
+ for page in range(1, num_queries + 1):
+ get_and_parse_run(page, parsed_bench_runs)
return parsed_bench_runs
-async def get_bench_report(bench_run: JobRun, cache: Cache, temp_dir: str) -> Optional[JobReport]:
+def get_bench_report(bench_run: JobRun, cache: Cache, temp_dir: str) -> Optional[JobReport]:
"""
Extracts some data from the given bench_run, which was fetched via the GH API,
optionally getting it from the cache.
@@ -451,7 +445,7 @@ async def get_bench_report(bench_run: JobRun, cache: Cache, temp_dir: str) -> Op
# There might be multiple artifacts in the artifact list for a benchmark run
# We are looking for the one named 'Runtime Benchmark Report', which will
# be downloaded as a ZIP file.
- obj: Dict[str, Any] = await _invoke_gh_api(f"/actions/runs/{bench_run.id}/artifacts")
+ obj: Dict[str, Any] = _invoke_gh_api(f"/actions/runs/{bench_run.id}/artifacts")
artifacts = obj["artifacts"]
assert len(artifacts) == 1, "There should be exactly one artifact for a benchmark run"
bench_report_artifact = artifacts[0]
@@ -466,7 +460,7 @@ async def get_bench_report(bench_run: JobRun, cache: Cache, temp_dir: str) -> Op
return None
# Get contents of the ZIP artifact file
- artifact_ret = await _invoke_gh_api(f"/actions/artifacts/{artifact_id}/zip", result_as_text=False)
+ artifact_ret = _invoke_gh_api(f"/actions/artifacts/{artifact_id}/zip", result_as_text=False)
zip_file_name = os.path.join(temp_dir, artifact_id + ".zip")
logging.debug(f"Writing artifact ZIP content into {zip_file_name}")
with open(zip_file_name, "wb") as zip_file:
@@ -683,7 +677,7 @@ def ensure_gh_installed() -> None:
exit(1)
-async def main():
+if __name__ == '__main__':
default_since: datetime = (datetime.now() - timedelta(days=14))
default_until: datetime = datetime.now()
default_cache_dir = path.expanduser("~/.cache/enso_bench_download")
@@ -808,27 +802,20 @@ async def main():
""" Set of all gathered benchmark labels from all the job reports """
job_reports_per_branch: Dict[str, List[JobReport]] = {}
for branch in branches:
- bench_runs: List[JobRun] = []
for workflow_id in bench_source.workflow_ids():
- bench_runs.extend(
- await get_bench_runs(since, until, branch, workflow_id)
- )
- if len(bench_runs) == 0:
- print(
- f"No successful benchmarks found within period since {since}"
- f" until {until} for branch {branch}")
- exit(1)
+ bench_runs = get_bench_runs(since, until, branch, workflow_id)
+ if len(bench_runs) == 0:
+ print(
+ f"No successful benchmarks found within period since {since}"
+ f" until {until} for branch {branch}")
+ exit(1)
job_reports: List[JobReport] = []
- async def _process_report(_bench_run):
- _job_report = await get_bench_report(_bench_run, cache, temp_dir)
- if _job_report:
- job_reports.append(_job_report)
-
- async with asyncio.TaskGroup() as task_group:
- for bench_run in bench_runs:
- task_group.create_task(_process_report(bench_run))
+ for bench_run in bench_runs:
+ job_report = get_bench_report(bench_run, cache, temp_dir)
+ if job_report:
+ job_reports.append(job_report)
logging.debug(f"Got {len(job_reports)} job reports for branch {branch}")
if len(job_reports) == 0:
@@ -839,13 +826,15 @@ async def main():
logging.debug("Sorting job_reports by commit date")
- def _get_timestamp(job_report: JobReport) -> datetime:
+
+ def get_timestamp(job_report: JobReport) -> datetime:
return datetime.strptime(
job_report.bench_run.head_commit.timestamp,
GH_DATE_FORMAT
)
- job_reports.sort(key=lambda report: _get_timestamp(report))
+
+ job_reports.sort(key=lambda report: get_timestamp(report))
if create_csv:
write_bench_reports_to_csv(job_reports, csv_output)
@@ -902,7 +891,3 @@ async def main():
index_html_abs_path = path.abspath(site_path)
print(f"The generated HTML is in {index_html_abs_path}")
print(f"Open file://{index_html_abs_path} in the browser")
-
-
-if __name__ == "__main__":
- asyncio.run(main()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make the script portable, so it continues to run on Ubuntu 22.04 and/or GraalPy 17.
TaskGroup requires Python 3.10+
@JaroslavTulach Fixed with 8484e6e - I don't want to give up on asynchronous fetching, since I have changed the behavior of |
This reverts commit 340e9bc.
Co-authored-by: Radosław Waśko <[email protected]>
Closes #7598
Notable changes to the
bench_download.py
script:-s|--source
parameter that can be eitherengine
orstdlib
.xml
std library to parse the bench reports instead of regular expression matching.Changes to the generated website: