-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sped up Tokenizer::dump()
#5009
Conversation
c58378b
to
e4f46ac
Compare
It looks like the pointer address is just used as a unique ID for the objects in the dump. @danmar? If that is the case (and as the pointer string representation is different on Windows and Linux) we could use something more light-weight, simple and portable instead. Or simply use just one representation. |
fb4bbd7
to
c3058d2
Compare
Scanning with
|
d95f72c
to
e184436
Compare
I know that the new code is horrible but that's what we have to pay for having overengineered, unusable garbage in the standard (also looking at you I am aware that we might also be able use the |
66172f1
to
63da8d1
Compare
Yes it is a unique ID. In python it can be any arbitrary string, doesn't have to be numeric. But the premium addon expects that it's a hexadecimal value. I am not against that you change it if you think there is faster approach. |
As you see the value is different on several platforms and might not even be a hexadecimal value ( So I would keep the current format for now and prepare another PR which generates a consistent output across all platform. You could then test that with premium as well. Also if we run into issues we have a separate commit to revert which only contains the changes in the identifier. |
Personally I don't think it's horrible. The ostream never made me feel excited anyway. |
Yes it does. If there is a "0x" or not does not matter. It always uses base 16 when converting the string. Even if the string says |
As far as I see there are many opportunities to "fold" newlines.. if you do that I don't have any negative opinion about this. looks OK to merge then. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ptr_to_string
lib/utils.h
Outdated
// a-f / A-F | ||
c = 55 + temp; | ||
#if !defined(_WIN32) || defined(__MINGW32__) | ||
c += 32; // add 32 for lowercase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is no technical reason to use lower case as far as I know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is necessary to match the result of std::ostringstream
. As mentioned before I will do the portable approach in another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we really want to lowercase, I think that either c = 'a' + temp;
or c += 'a' - 'A';
would be more elegant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see why it's important to match std::ostringstream though. the only possible usage I can think of for this function is when generating the dump file or debug output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
People might depend on the format. Also I want that change separate so it can easily be reverted if necessary. I will also put it behind a define for a release or two.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we really want to lowercase, I think that either
c = 'a' + temp;
orc += 'a' - 'A';
would be more elegant.
I think the - 10
is better as it highlights that we are dealing with a "base 16"/hex value as input.
line = tok->linenr(); | ||
if (!xml) { | ||
ValueFlow::Value::ValueKind valueKind = values->front().valueKind; | ||
const bool same = std::all_of(values->begin(), values->end(), [&](const ValueFlow::Value& value) { | ||
return value.valueKind == valueKind; | ||
}); | ||
out << " " << tok->str() << " "; | ||
outs += " "; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we can use ' '
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably - if it were a single character... but I guess that is just a typo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems the code using a char
is much slower similar to the stream insertion case I mentioned in another comment. Will investigate but that is out-of-scope of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes it should have been a single character that was a typo 👍
Regarding single character char vs. string literal in stream insertion I encountered something weird - a char is slower: llvm/llvm-project#65040 |
…al` to `std::unordered_map`
ok |
Scanning the
cli
folder withDISABLE_VALUEFLOW=1
Tokenizer::dump()
will consume almost 25% of the total Ir count when an addon is specified. This is mainly caused by the usage ofstd::ostream
.Encountered while profiling #4958.