-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[METRICS] get times on queries using UAST functions #606
Comments
Because of a bblfshd issue it's impossible to gather data from a big dataset. So I had to resort to getting this data from a much much smaller dataset. It has to be executed with Query 1SELECT f.file_path, uast(f.blob_content, language(f.file_path))
FROM ref_commits r
INNER JOIN commit_files c ON r.commit_hash = c.commit_hash
AND r.repository_id = c.repository_id
INNER JOIN files f ON c.file_path = f.file_path AND c.tree_hash = f.tree_hash
AND f.blob_hash = c.blob_hash
AND f.repository_id = c.repository_id
WHERE r.ref_name = 'HEAD' AND language(f.file_path) = 'Go' AND NOT is_binary(f.blob_content)
LIMIT 200 Results Total sent to client: 13590150 bytes Query 2SELECT f.file_path, uast_xpath(uast(f.blob_content, language(f.file_path)), "//uast:Identifier")
FROM ref_commits r
INNER JOIN commit_files c ON r.commit_hash = c.commit_hash
AND r.repository_id = c.repository_id
INNER JOIN files f ON c.file_path = f.file_path AND c.tree_hash = f.tree_hash
AND f.blob_hash = c.blob_hash
AND f.repository_id = c.repository_id
WHERE r.ref_name = 'HEAD' AND language(f.file_path) = 'Go' AND NOT is_binary(f.blob_content)
LIMIT 200 Results Total sent to client: 5875504 Query 3SELECT f.file_path, uast_extract(uast(f.blob_content, language(f.file_path), "//uast:Block"), "@pos")
FROM ref_commits r
INNER JOIN commit_files c ON r.commit_hash = c.commit_hash
AND r.repository_id = c.repository_id
INNER JOIN files f ON c.file_path = f.file_path AND c.tree_hash = f.tree_hash
AND f.blob_hash = c.blob_hash
AND f.repository_id = c.repository_id
WHERE r.ref_name = 'HEAD' AND language(f.file_path) = 'Go' AND NOT is_binary(f.blob_content)
LIMIT 1000 Results Total sent to client: 6679540 |
@erizocosmico nice! On
if you could provide a link to the issue I'll be happy to take a look from the bblfshd side. |
@bzz this issue bblfsh/bblfshd#209 |
Moving to TODO to give a second chance with new bblfsh versions. |
Query 1SELECT f.file_path, uast(f.blob_content, language(f.file_path))
FROM ref_commits r
INNER JOIN commit_files c ON r.commit_hash = c.commit_hash
AND r.repository_id = c.repository_id
INNER JOIN files f ON c.file_path = f.file_path AND c.tree_hash = f.tree_hash
AND f.blob_hash = c.blob_hash
AND f.repository_id = c.repository_id
WHERE r.ref_name = 'HEAD' AND language(f.file_path) = 'Go' AND NOT is_binary(f.blob_content)
LIMIT 200 Results Bytes sent to client: 13581842 Query 2SELECT f.file_path, uast(f.blob_content, language(f.file_path))
FROM ref_commits r
INNER JOIN commit_files c ON r.commit_hash = c.commit_hash
AND r.repository_id = c.repository_id
INNER JOIN files f ON c.file_path = f.file_path AND c.tree_hash = f.tree_hash
AND f.blob_hash = c.blob_hash
AND f.repository_id = c.repository_id
WHERE r.ref_name = 'HEAD' AND language(f.file_path) = 'Go' AND NOT is_binary(f.blob_content)
LIMIT 1000 Results Can't make it work with bblfshd. Query 3SELECT f.file_path, uast_xpath(uast(f.blob_content, language(f.file_path)), "//uast:Identifier")
FROM ref_commits r
INNER JOIN commit_files c ON r.commit_hash = c.commit_hash
AND r.repository_id = c.repository_id
INNER JOIN files f ON c.file_path = f.file_path AND c.tree_hash = f.tree_hash
AND f.blob_hash = c.blob_hash
AND f.repository_id = c.repository_id
WHERE r.ref_name = 'HEAD' AND language(f.file_path) = 'Go' AND NOT is_binary(f.blob_content)
LIMIT 200 Results Bytes sent to client: 13581842 Query 4SELECT f.file_path, uast_xpath(uast(f.blob_content, language(f.file_path)), "//uast:Identifier")
FROM ref_commits r
INNER JOIN commit_files c ON r.commit_hash = c.commit_hash
AND r.repository_id = c.repository_id
INNER JOIN files f ON c.file_path = f.file_path AND c.tree_hash = f.tree_hash
AND f.blob_hash = c.blob_hash
AND f.repository_id = c.repository_id
WHERE r.ref_name = 'HEAD' AND language(f.file_path) = 'Go' AND NOT is_binary(f.blob_content)
LIMIT 1000 Results Can't be done due to bblfsh issues. Query 5SELECT f.file_path, uast_extract(uast(f.blob_content, language(f.file_path), "//uast:Block"), "@pos")
FROM ref_commits r
INNER JOIN commit_files c ON r.commit_hash = c.commit_hash
AND r.repository_id = c.repository_id
INNER JOIN files f ON c.file_path = f.file_path AND c.tree_hash = f.tree_hash
AND f.blob_hash = c.blob_hash
AND f.repository_id = c.repository_id
WHERE r.ref_name = 'HEAD' AND language(f.file_path) = 'Go' AND NOT is_binary(f.blob_content)
LIMIT 1000 Results Bytes sent to client: 14865664 Slightly faster than before. But the issue with large datasets still persists. Queries last hours, nothing gets processed, lots of failures, etc. So we can only take metrics for little datasets like this. Whenever I run it with limit 1000 it starts failing. |
Thanks a lot @erizocosmico |
I pushed the metrics code to feature/metrics-bblfsh on my fork in case we need it again. Cause I deleted it the first time and had to do it again 🙃 |
The text was updated successfully, but these errors were encountered: