-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
List strings without dedupe. #233
Comments
Hey thanks for your interest in hprof-slurp 👋 You are correct, the strings are deduplicated. When parsing the test HPROF dumps I own, strings are defined only once. TBH I am not sure how to provide an instance count per string. |
Tool is fantastic btw has been a great help on large hprof files that usually kill my local intellij. I guess I'm wondering if you dedupe in your tool (although while I'm not a rust dev looking at your code you're parsing them into a vec and not a set) so does that mean they are already deduped in the hprof file itself? If they are duped in the hprof file even if they were just printed out multiple times in the slurp output I could pass that to uniq to see the counts of dupe strings. But if it's not even duplicated in hprof then that clearly wouldn't be possible. I'm not familiar at all with the hprof format however. |
Thanks for the kind words, happy you are able to analyze large hprof files 👍 AFAIK the strings are not duplicated into the hprof format. I am accumulating those into a hashmap string_id -> string_value. It may be possible to compute the number of instances per string based on the full instance graph but it would require two passes on the dump file. |
got it, how would the second pass help if they are using the same object ids? I wonder what a heap dump would look like with say 10 dupe strings (but different string instances in code) with "-XX:+UseStringDeduplication" vs "-XX:-UseStringDeduplication" I guess I would have thought the ids would be shared in the case where string dedupe is enabled, but not where it is disabled? |
In that case it would not help either 👍
This sounds like a logical explanation. Sound good to you? |
I ended up cutting a new release with a basic String duplication detector in 0.5.4 This will find Strings with the same value but registered under different ids. |
oh amazing, sorry i was not able to get you the heap dumps before then. Let me give it a try on some real heaps though. Many thanks for this! EDIT: List<String> names = List.of("Liam", "Sophia", "Noah", "Liam", "Sophia", "Emma");
Map<String, Long> nameCount = names.stream()
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting())); Similar to this in java. |
No worries, I figured it would be really inconvenient to send large heap dump files over the Internet anyway.
Yes, the specifics of the report can be changed easily. |
Hi there, was trying to use the tool to determine if our application would benefits from string dedupe. Looking at a dump with -L shows a lot of strings but it seems to de deduping them when printing. Can you confirm this is the case? And perhaps it would be useful for a way to run without duplication if so to determine if running the jvm with string dedupe enabled would benefits. Many thanks.
The text was updated successfully, but these errors were encountered: