-
-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate Performance Issues With TRSS / MongoDB #3877
Comments
Some initial investigation on our TRSS db server.. Hi Lan, Im going to apologize in advance for the length of this message, let me know if you'd prefer it in an email ( and an email address! ) .. I've written some scripts, and done some digging into the trss mongo db... and have a few potential issues identified, but having some points of comparison will be very useful..
This script extracts a bunch of information from a linux system about its hardware configuration, it needs no amendments, and can be run as is! Results Should Look Something Like This.. Extracting CPU Performance Information...
If you are running your mongodb inside a docker container, then some performance stats for the docker container would be useful this script extracts them. Results Should Look Something Like This MongoDB Container Name: aqa-test-tools-mongo-1
docker inspect --format='{{.HostConfig.CpuQuota}}, {{.HostConfig.CpuPeriod}}, {{.HostConfig.CpuShares}}' <container_id>
docker logs << container id >> | grep "Slow query"|grep timeReadingMicros This may return a lot of data, it does for our instance... which I think is actually the problem being reported. e.g. docker logs 2601dd8ff38c | grep "Slow query"|grep timeReadingMicros {"t":{"$date":"2025-01-23T15:21:29.747+00:00"},"s":"I", "c":"WRITE", "id":51803, "ctx":"conn28","msg":"Slow query","attr":{"type":"remove","ns":"exampleDb.auditLogs","command":{"q":{"timestamp":{"$lt":{"$date":"2024-12-24T15:21:12.478Z"}}},"limit":0},"planSummary":"COLLSCAN","keysExamined":0,"docsExamined":27814188,"ndeleted":0,"numYields":27814,"queryHash":"97131801","planCacheKey":"97131801","locks":{"ParallelBatchWriterMode":{"acquireCount":{"r":27815}},"FeatureCompatibilityVersion":{"acquireCount":{"w":27815}},"ReplicationStateTransition":{"acquireCount":{"w":27815}},"Global":{"acquireCount":{"w":27815}},"Database":{"acquireCount":{"w":27815}},"Collection":{"acquireCount":{"w":27815}},"Mutex":{"acquireCount":{"r":1}}},"flowControl":{"acquireCount":27815,"timeAcquiringMicros":9973},"storage":{"data":{"bytesRead":5983072617,"timeReadingMicros":207010}},"remote":"172.24.0.5:58150","durationMillis":17268}}
Log into the mongosh, and switch to the trss DB ( ours is called exampleDb ), and check what indexes are on the testResults collection
Now get an execution plan for this query.. you may need to update the URL and BuildName appropriately to your instance.. db.testResults.find({ This should give me some useful points of comparison... |
Could we start by identifying the symptoms of TRSS performance issues to investigate them effectively? It is hard to determine the root cause based solely on |
@llxia yes, this is what Im trying to do. I believe the poor performance is occurring when data is being inserted into trss, I think I've identified a couple of areas that are possible culprits, thus querying the DB for slow queries and checking the index states. the above commands would allow me to compare machine specifications ( as I believe our MongoDB is being limited by the underlying server and docker container configuration ), which is causing the inserts/updates etc to take significantly longer than it should. Also the indexing on the database could be improved, but I'd just to like to compare your indexing with ours to see if you have the same issues ( although less pronounced ). @smlambert is there an "easy" way to replicate this issue on our trss, I could definitely do some more interactive performance monitoring whilst the issue is occurring, rather than relying on the container/mongodb logs |
Anectdotally, it takes up to several days to see a full (completed) pipeline in a Grid view in TRSS, which is not good enough for use during releases. This performance was not the case previously, but several changes have occurred (1) running in containers and (2) Azure machine that this server is running on has changed (with reduced capacity). @steelhead31 - depending on how involved you want to be, 2 options: |
Thanks, I'll try those.. I do believe the reduction to a 2 CPU machine is a large part of the issue, as the MongoDB docker container is using most of the machines resources even when idle. There is sufficient memory , and the disk is fast, but analysing the slow_query logs suggest CPU is the likely culprit. Some additional indexing ( particularly on the auditlog and testresults collections ) may also be beneficial |
Ok, so after some extensive performance analysis...
Most of the SLOW_QUERY log entries are related to the DB performing a COLLSCAN instead of using indexes. This is a major performance issue, especially when dealing with large collections the 2 biggest culprits are auditLogs and testResults. The query logs show that each operation scans millions of documents without using indexes...
This means MongoDB is scanning the entire collection instead of using an efficient index.
Queries are reading gigabytes of data (bytesRead: 5,350,000,000+), causing high I/O.
Many slow queries are DELETE operations on exampleDb.auditLogs, filtering by timestamp:
This means that MongoDB is scanning all documents to find ones that match.
The query filtering for status: {$ne: "Done"} also uses a collection scan:
|
@steelhead31 What is that based on? From what I could see it was only ever chewing up 100% of one CPU which would not indicate an issue with core count specifically. |
I've had several top commands in the container, averaging 135% of the CPU... |
For the record it's showing this right now, which would typically indicate one full core in use. Even if that had 135% as opposed to 102% (was that purely on the The amount of swap it's using may be a bit of a concern though. |
Im inclined to try adding some indexing before anything else.. and investigating whats in the auditlog, as that's by far the biggest volume of data it attempts to read for each operation. |
During the last release cycle , the performance of the TRSS server / MongoDB was quite poor.
This needs to be investigated.
The text was updated successfully, but these errors were encountered: