3,263,694 events, 1,445,382 push events, 2,424,278 commit messages, 174,414,944 characters
Correção das formulas de algumas magias. (#1754)
- Correção das formulas de algumas magias.
Corrigi as formulas de dano maximo e minimo de algumas magias, usando como base o Tibia Wikia. (https://tibia.fandom.com/wiki/Formulae) Magias Corrigidas: Healing: Light Healing Intense Healing Wound Cleasing Mass Healing Ultimate Healing Divine Healing Heal Friend
Instant Attack
Energy Beam
Great Energy Beam
Divine Caldera
Terra Wave
Energy Wave
Rage of Skies
Hell's Core
Wrath of Nature
Eternal Winter
Energy Strike
Terra Strike
Ice Strike
Physical Strike
Flame Strike
Death Strike
Divine Missile
Ice Wave
Fire Wave
Whirlwind throw
Groundshaker
Berserk
Fierce Berserk
Etheral Spear
Runes:
Explosion
LMM
HMM
Stalagmite
Fireball
Icicle
Holy Missile
Sudden Death
Thunderstorm
Stone Shower
Avalanche
Great Fireball
remove the stop dump zone command from the protocol (#4326)
the stop dump zone command was implemented as a courtesy to other players in order to take into account when they would stop looking at unknown information
however, this can be abused, a malicious client can send this command whenever they would like
cockatrice is not a physical tabletop nor does it aim to be, if you can take a screenshot of your deck and then close the view, you are not cheating as you have been given this information
in order to prevent anyone from abusing this we should remove the command from the protocol, this means servers will ignore this message and clients will get a little invalid command reply in their debug log
the extension id will remain reserved
shuffling your deck will always invalidate any card view looking at those cards
if players wish to signal that they stopped looking at their deck for whatever reason they should just use the chat instead, optionally using one of the chat macros
Fuck Mateo
Removed gitignore from gfx folder and restored .c and .h files from a different computer because i don't want to have to stumble through the new bullshit when my old familiar shitty stuff works fine
Convert to standardrb
-
I mostly don’t care about this, but there are a couple of things that Standard does that I disagree with. They are inherited from Rubocop, but Standard fixes many of Rubocop’s nonsense rules.
-
Array literal wrappers %i[], %w[], etc. are just ugly and never should have become any sort of standard. I would be happier if this part of standard were just completely disabled, because it‘s unnecessary and wrong.
-
Quote literals having to be %q() is equally wrong. I’ve avoided the issue here because the generated gemspec uses both "unnecessary" quote literals (it’s necessary if I say it’s necessary) and the wrong wrappers (I wouldn’t use %q<>, but this is generated code).
-
-
I still think that short hashes can be
{ foo: "bar" }
, but I’m mostly using Elixir these days, so I don’t mind%{foo: "bar"}
, so I can get used to it in Ruby. It still feels wrong, almost 20 years in. -
There are semantic differences between and / &&, or / ||, but in some cases the reformatted code is substantially worse to read. Again, I mostly don’t care about this difference, but Rubocop’s insistence is silly; these should only be replaced where there is ambiguity.
-
Replacing
x = foo or next
should never be replaced with(x = foo) || next
. That’s replacing something that is somewhat readable with something damned-near unreadable. Both should be replaced with:x = foo next unless x
-
Overall, this introduces a lot of churn, but I think will be easier to
deal with updates to standardrb
instead of the rapid churn that has
been Rubocop.
(fish) revert to using home-manager
This reverts commit 7e248000228fe649d8095df74fd28bb568b712df.
I know, I've been changing things too often in this regards. I just wanted to investigate the difference between doing everything by hand vs using home-manager.
In general, I love home-manager approach more, as it does all the wiring for me. The only reason for me to be reluctant is Arch Linux support, which requires hacks for some OpenGL applications (like Emacs and Alacritty), which sucks.
Update index.html
the damn index file was missing the html tag... fucking idiot
"9:05am. Let me chill a bit and I will start.
9:20am. https://arxiv.org/abs/1905.01652 The Game of Tetris in Machine Learning
Let me read this just for a bit and then I will make the optimizer.
So far, the transformation of raw squares of the Tetris grid to a handful of useful features has been carried out by hand. Tetris is not yet part of the OpenAI universe or the Atari domain. No deep learning algorithm has learned to play well from raw inputs. Stevens & Pradhan and Lewis & Beswick (2015) have reported attempts that achieved at most a couple hundred lines cleared.
9:35pm. Ah, I see it. I could train an esemble of critics to get the variance of the predictions. Right.
9:40am. What is the classification based policy iteration?
https://papers.nips.cc/paper/2013/file/7504adad8bb96320eb3afdd4df6e1f60-Paper.pdf Approximate Dynamic Programming Finally Performs Well in the Game of Tetris
10:05am. You know what I am thinking about right now? That damn mask for the sampling policy. Simply adding -inf is the wrong choice as it would let gradient go through the wrong actions.
...I have an idea. What if I made a special function that just adds on the forward pass, but multiplies by the exp mask
on the backward pass? That would meet all my requirements.
10:10am. Yeah, that is it. That is the ideal solution.
10:15am. I'll go with that.
Also another problem I have is with my information directed sampling idea. I think after all that just multiplying the softmax probs and then L1 normalizing is bad. It would not encourage exploration in the direction of the policies that are zero.
Instead of doing that, I could add it to the pre-softmax activations. That would have a multiplicative effect on the policy, and it could move policies that are not -inf by a significant amount, but the problem with that approach is that the exploration would become sensitive to the scale of the rewards.
10:20am. Hmmm, this might not be a bad thing. Yes, it would make them another hyperparam to tune, but it might perform really well in the end.
Both critic and actor ensembles for exploration are worth considering.
10:25am. I am going to have to come up with a story for exploration either way. Epsilon greedy for the sampling policy will not cut it on large games.
But after I do this, I will have the very most basics of RL down. The level after this is motivated reasoning and planning along with prediction training. To fully cross into the 4/5 rank I am going to need those neurochips.
10:30am. Hmmmm, actually I see another piece I could add to the critics. Rather than just the reward, I could also propagate uncertainty alongside it. If I just used current uncertainty to drive policy rewards, that would not be very helpful because the uncertainty is dependent on the states reached in the future which is in turn dependent on the rewards.
I suppose I could just add them to the rewards. Rather that propagating a pair and then adding the uncertainties to the presoftmax activations.
Ah no wait. That would make disambiguating the uncertainty in the critics from reward learning complicated. I should definitely propagate the uncertanties and ...
But then I'd have the clustering problem of dealing with uncertainty of uncertainties.
...Maybe there is no need to propagate them.
10:45am. Maybe I am looking at this the wrong way. Rather than having some exploration bonus, maybe I should derive a policy directly from uncertainty estimates and use that as the sampling policy?
That fits much better and I would not need useless hyperparams to optimize.
It would be easy to derive a policy using the L1 norm since uncertainty values would always be <= 0. This would be invariant to the scale of the rewards. For this kind of scheme I should in fact propagate uncertainties along with the rewards.
11am. I've attained it. What I have in mind right now is the ideal way to do exploration, no dbout about it. Setting the sampling policy to an uncertainty driven one is the way to go. An ensemble of critics should be used for that. But I am going to need an ensemble of actors anyway to deal with non-stationarity. I forgot about this. Or I am going to have to do weight averaging perhaps.
11:05am. This is going to work really well. Since I know how to combine tabular RL with DL learning now, I'll be able to make sure of that.
11:10am. I am going to make massive money aren't I?
All of my experience is coming together and it is resulting in novel algorithms of which the one I am implementing is the first of its kind.
It should have been obvious, but adding an exploration bonus of any kind to the policy would corrupt it. Now that I've written things out it is blindgly obvious that the sampling policy specifically should be used for exploration. Just why was I so attached to exploration bonuses? I have no idea.
The pieces were all there in the tabular CFR algo, but I could not see them.
11:20am. How about I start work on the optimizer. It should not take me long at all.
12:05pm.
def optimize(paramGroupList : list,learning_rate : float = 2 ** -7,signSGDfactor : float = 2 ** -3):
"""
Interpolates between signSGD and infinity norm normalization.
signSGDfactor - The interpolation factor for signSGD. 0 is pure infinity norm normalization, while 1 is pure signSGD.
"""
assert (0 <= signSGDfactor <= 1)
assert (0 <= learning_rate)
for paramGroup in paramGroupList:
infNorm = torch.scalar_tensor(0)
for x in paramGroup:
if x.grad: infNorm = torch.max(infNorm,torch.linalg.norm(x.grad.flatten(),ord=float('inf')))
for x in paramGroup: # The scalars operations are grouped for efficiency.
if torch.is_nonzero(infNorm):
x += learning_rate * signSGDfactor * torch.sign(x.grad) + learning_rate * (1 - signSGDfactor) / infNorm * x.grad
x.grad = None
Let me just go with this for the optimizer.
Should be nice and simple. I do not have to worry about things blowing up with this baby. I need stability guarantees for RL in all their forms. The main principle is to make things work and then worry about performance after that.
The field at large still has not gotten to this point.
12:15pm. Now what is next?
I have the updates, the reward calculation and the optimizer.
inl main () =
!!!!Import("torch")
!!!!Import("torch.distributions")
!!!!Import("torch.optim")
inl game = leduc.game()
inl net = agent.neural.create_small_leduc_net()
inl policy,value = agent.neural.models net
inl lr : f32 = 2 ** -14 // Note: Be careful of Cython integer power bug. `integer ** negative_integer == 0` in raw Cython.
$"print('The learning rate is 2 **',torch.log2(torch.scalar_tensor(!lr)).item())"
inl opt : obj = $"torch.optim.SGD([{'params':!policy.parameters()},{'params':!value.parameters()}],lr=!lr)"
loopw.for' (from: 0i32 nearTo: 1) fun _ =>
$"!policy.eval()" . $"!value.train()"
loopw.for' (from: 0i32 nearTo: 20) fun _ =>
inl r = train.vs_self 1024 (agent.neural.policy (2 ** -2) net) game
$"print(numpy.sqrt(numpy.sum(numpy.square(!r))))"
$"print(!value.weight,!value.bias,!value.weight.grad,!value.bias.grad)"
$"!opt.step()" . $"!opt.zero_grad(True)"
$"print('***')"
How about I start from the top? I'll assume I have the game and the net and then fill out the loop. The I'll put in what is missing pieces. I do not feel particularly inspired regarding my approach, so I'll just do the obvious.
Oh, yeah. Let me do the mask function before I take a break.
https://stackoverflow.com/questions/56328630/pytorch-masked-fill-why-cant-i-mask-all-zeros
scores = scores.masked_fill(scores == 0, -np.inf)
This is pretty interesting. How do the boolean ops work in PyTorch?
x = torch.rand(5,5)
x < 0.5
tensor([[ True, False, True, False, False],
[ True, True, True, False, True],
[ True, False, False, False, True],
[False, True, False, True, True],
[ True, False, False, True, True]])
There is masked_fill
, masked_scatter
and masked_select
. Let me take a look at those.
x = torch.rand(5,5,requires_grad=True)
torch.masked_fill(x, x < 0.5, float('-inf'))
tensor([[0.5650, 0.6294, -inf, -inf, 0.6751],
[ -inf, -inf, -inf, 0.9383, 0.9712],
[ -inf, -inf, 0.6000, 0.7556, -inf],
[ -inf, 0.7890, -inf, -inf, 0.9062],
[ -inf, -inf, 0.8049, 0.5634, 0.6296]],
grad_fn=<MaskedFillBackward0>)
Wonderful, it has the backward function. This is just what I need. I should be easily able to reuse this to fill in the values I do not want to be propagated through with -inf
.
Instead of setting the mask to what I've been doing...
inl policy_mask : obj = $"torch.full((!len,!actions_size),float('-inf'))"
am.iteri (fun b => am.iter (fun a => $"!policy_mask[!b,!a] = 0")) action_indices
I'll just do:
inl policy_mask : obj = $"torch.full((!len,!actions_size),True)"
am.iteri (fun b => am.iter (fun a => $"!policy_mask[!b,!a] = False")) action_indices
This will allow me to use masked_fill to its true potential.
Ok, that is one thing out of the way.
12:35pm. This would be a good time to have breakfast. Let me do that, and then I'll get cracking."
Pain
JSONs are evil things of sin, and I hate them
"1:20pm. Let me resume. It is time to get the loop out of the way.
1:40pm. Ah, let me take a break. I haven't had enough.
2:25pm. The implications of having such a sampling policy is astonishing.
There is in fact nothing stopping the sampling policy from incorporating hindsight information. This is what would be the case if I based it off value uncertainty. I could train the actor without ever acting on the actual policy!
Now that I've come to this point, this kind of training resembles less the vanilla RL training, and looks quite a lot like actual reasoning!
It seems a good definition of reasoning would be fast RL.
RL today works so poorly that it is difficult to imagine this being the case, but why wouldn't it be? Long term planning and reasoning is the logical endpoint in the development of RL. When RL is naive, it resembles the agent following its policy and collecting reward, but when it is advanced it resembles planning, imagination and dreaming.
The thing I will do with the sampling policy will work, and yet it is more like something the agent would simulate in its head rather than act out in the real world. In the real world, you certainly would not be allow to just act upon your curiosity. It would get you killed.
Here in simulation, it is perfectly fine.
2:45pm.
inl vs_self batch_size p game =
let rec loop (l : a u64 _) =
inl rewards : ra u64 _ = am.empty
inl actions_indices : ra u64 _ = am.empty
inl actions : ra u64 _ = am.empty
inl nexts : ra u64 _ = am.empty
l |> am.iteri fun i => function
| Action: player_state,game_state,pid,actions',next =>
rm.add actions_indices i
rm.add actions (player_state,game_state,pid,actions')
rm.add nexts next
| Terminal: x =>
rm.add rewards (i, x)
inl rewards_actions =
if 0 < length actions then
inl cs,update = p actions
am.generic.map2 (<|) nexts cs |> loop |> update
else am.empty
inl rewards_all : a _ r2 = create (length l)
am.generic.iter2 (set rewards_all) actions_indices rewards_actions
am.iter (fun (i,_,_,r) => set rewards_all i r) rewards
rewards_all
loop (am.init batch_size fun _ => game pl2_init)
Hah, I do not feel like it at all. Looking at the old stuff just saps my motivation.
I can't do it. I need to motivate myself. Let me turn off the computer and spend some time in bed. I'll try to focus my mind on how to do the Python parts.
I have the control
and the optimize
functions, but though I have a high level view of what I want to do, the motivation and the planning for the details of how I will do the Python side programming is lacking. I feel drained after the rigours of the last month.
I need to focus myself and hone my mind towards the new approach. I need to overcome my mental inertia.
2:50pm. The way to do that is to just think. I won't get motivated to do work just by working. I've dreamed about the high level, I need to focus on the lows."
About Jabaroo
Our Jabaroo NFT project journey started with displaying the assets we designed with the NFT creator account in an NFT Market. Based on this, we determined the directions that we can develop in this sector and aimed to deal with 2 different issues in total. We will create Jabaroo NFT Market and aim to integrate with leading NFT markets in the marketplace. In addition to the NFT Market we have established, a Physical Market will be created where we will sell and market physical NFTs. We are very confident in this idea, which we think is lacking in the market at the moment, and we aim to be the world's first physical e-commerce platform that sells with smart contracts. Creating the Jabaroo NFT Marketplace and Provide Integration with leading NFT Markets
We aim to develop a Jabaroo NFT Marketplace with the Jabaroo NFT project. By creating an account here, users will be able to generate, buy or sell NFT and will not need any coding skills to do so. In this market, which we will create by designing a user-friendly interface, users will be able to start trading after opening their accounts as collectors or creators. However, users' accounts will need to be verified for their work to stand out. With the verification system we will develop here, we will ensure that users can trade securely. With this market, we aim to eliminate the slow user authentication process experienced in NFT markets available today. Feedback will also be given to users who do not meet the conditions for the verification of the account, as to why their account was not verified. We are aware that the NFT craze, which has become widespread today, has taken place in many different markets. We think that it is very inefficient for users to open an account in each market in order to reach NFT projects in different markets. As a solution to this problem, we aim to integrate the leading NFT markets in the marketplace with the Jabaroo NFT market. Thus, users will be able to make transactions through other markets that we integrate with without the need to open a new account. The accounts they have opened in the Jabaroo NFT market will be enough. In order to proceed to this step, users will need to be verified on the Jabaroo NFT Market, and then all integrated markets will be open to transactions. Thus, the requirement of opening different accounts and user verification for each market will be eliminated. It will save time and create an integrated NFT Marketplace ecosystem. For sales on Jabaroo NFT Market, Jabaroo will only receive 2.5% commission from the buyer and seller on the sales made. Creating a Physical NFT Marketplace
The NFT market has shown to all of us that it is an innovative and open to improvement area. Thanks to NFT, billions of works created around the world meet with their buyers and gain the value they deserve. As a team, we are dealing with the physical NFT Market project that will increase this value even more. We aim to create a Physical NFT Marketplace as a different branch of the NFT Market that we will develop under the Jabaroo NFT Project. We will establish a structure where digitally produced assets in the current NFT market can be physically handled and objectified. Thanks to the Physical NFT Market; The works designed as digital assets will be presented to the buyers by objectifying them with 3D printers. Creators will be able to make their own designs in 3D if they wish. They will be able to make their sales by uploading the images of the 3D work to the market. In this scenario, Jabaroo will only receive a 3% commission from the buyer and seller on the sale made. Creators who have designed digitally but do not have the equipment to make it 3D will be able to share their designs with Jabaroo Physical NFT Market Team. Jabaroo Team will make these designs to 3D version and provide logistics if they are sold. In this scenario, Jabaroo Team will make a price offer to Creator, according to the difficulty of transforming the digital work into 3D. If Creator accepts, Jabaroo Team will make the work 3D and share the photos with Creator. If the creator requests, the created work sample will be sent as a one-time. The photos of the product will be shared by Creator on its own page. If the asset is sold, Creator will send an order to Jabaroo and the logistics of the product will be done by Jabaroo. In this case, Jabaroo Team will receive 3% commission from the buyer and seller on the sale made in addition to the price offer it has given. Works designed as digital assets can be printed on T-shirts using printing machines and presented to the buyers. Creatorlar isterlerse kendi tasarımlarını kendileri T-shirtlere baskı Creators will be able to print their own designs on T-shirts if they wish, and they will be able to make their sales by uploading the images of the printed assets to the market. In this scenario, Jabaroo will only receive a 3% commission from the buyer and seller on the sale made. Creators who have designed digitally but do not have the equipment to print on T-shirts will be able to share their designs with the Jabaroo Physical NFT Market Team. Jabaroo Team will print these designs on T-shirts and provide logistics if they are sold. In this scenario, Jabaroo Team will make a price offer to Creator, depending on the difficulty of converting the digital work to print. If Creator accepts, Jabaroo Team will print the work on the T-shirt and share the photographs with Creator. If the creator requests, the created work sample will be sent as a one-time. The photos of the product will be shared by Creator on its own page. If it is sold, Creator will send an order to Jabaroo Team and the logistics of the product will be done by Jabaroo. In this case, Jabaroo Team will receive 3% commission from the buyer and seller on the sale made in addition to the price offer it has given.
In both themes, it will be presented as a condition that the products are first created as digital works and then transformed into physical works Jabaroo Physical NFT Market will support multiple photo insertion and creators will be expected to add photos of digital and physical forms of their work.Works that are not transformed into physical works and photographed will not be listed and cannot be sold. In order for the users to make transactions in the Physical NFT Market, it is expected that they have created their users as creators or collectors on the NFT Market and have their accounts verified. While the Physical NFT Market is being designed, a new Smart Contract will be designed for safe trading. Smart Contracts used in the market were designed in accordance with digital works. Since digital works will be physically sold with the Physical NFT Market to be designed by Jabaroo, there is a need to design a new Smart Contract. This issue will be handled within the scope of the project and a platform will be created for users to use safely.
FOOTNOTE: Both market places will be traded with the Binance Smart Chain network. Here, transactions will be carried out quickly and with low transaction fees. Also, all data will be safe with the blockchain network.
Update base for Update on "add a boxed CPU fallback kernel"
This PR replaces the existing code-generated CPU fallback kernels that XLA uses with a single boxed CPU fallback.
Current state: there are a couple different design ideas that I want to point out, but the logic for the actually kernel is mostly done and passing tests.
To preface, I'm not 100% tied to the current design and I'm putting the PR up now for opinions and totally open to alternatives, some of which I listed below. Actually after writing this description, I'm leaning toward the following changes:
- Confirm whether or not we can remove all C++ logging info directly in the yaml.
Current Design
All of the CPU fallback codegen is deleted. In its place, XLA (and other external backends, later) can choose to opt into a CPU fallback by adding the following code in a C++ file. I have an corresponding xla-side PR with the xla changes.
There's no actual requirement to split up the code into a .h and .cpp file, but that's necessary in the XLA case because they sometimes need to call the fallback directly from their handcrafted kernels.
// xla_cpu_fallback.h
#include <ATen/native/CPUFallback.h>
...
void xla_cpu_fallback(const c10::OperatorHandle& op, torch::jit::Stack* stack);
...
// xla_cpu_fallback.cpp
#include "torch_xla/csrc/aten_cpu_fallback.h"
...
void xla_cpu_fallback(const c10::OperatorHandle& op, torch::jit::Stack* stack) {
// Do custom logging here
...
// Call the actual boxed CPU fallback.
at::native::cpu_fallback(op, stack);
}
TORCH_LIBRARY_IMPL(_, XLA, m) {
m.fallback(torch::CppFunction::makeFromBoxedFunction<&xla_cpu_fallback>());
}
Now that the fallback is exposed in the backend, they can call it directly. Doing so requires converting from an unboxed to a boxed context, which we provide a utility function before. E.g.:
#include <ATen/native/CPUFallback.h>
at::Tensor addmm(const at::Tensor& self,const at::Tensor& mat1,const at::Tensor& mat2,const at::Scalar& beta,const at::Scalar& alpha) {
....
if (...call_fallback...) {
return at::native::call_fallback_fn<&xla_cpu_fallback, decltype(at::addmm)>::call("aten::addmm", self, mat1, mat2, beta, alpha);
}
...
}
That decltype(at::addmm)
logic isn't actually used everywhere in the xla-side PR yet, since you hit issues with overloads. I could use it everywhere once #58092 lands.
Alternatives: The API for calling the CPU fallback directly is ugly, can we make it nicer?
We could change the api to use at::redispatch
, which would make it look something like this:
at::Tensor addmm(const at::Tensor& self,const at::Tensor& mat1,const at::Tensor& mat2,const at::Scalar& beta,const at::Scalar& alpha) {
....
if (...call_fallback...) {
return at::redispatch::addmm(c10::DispatchKeySet(c10::DispatchKey::CPUFallback), self, mat1, mat2, beta, alpha);
}
...
}
Which definitely feels cleaner, but also requires adding a new DispatchKey just for this use case. Conditionally calling the CPU fallback doesn't sound like a hugely important use case, so I don't know if giving up one of our 64 dispatch key slots is worth the API improvement. Totally open to other opinions though!
Another more mild improvement that would avoid having to pass operator string names (including overloads) around would be to codegen (yet another) namespaced API. Something like this:
at::Tensor addmm(const at::Tensor& self,const at::Tensor& mat1,const at::Tensor& mat2,const at::Scalar& beta,const at::Scalar& alpha) {
....
if (...call_fallback...) {
return at::fallback::addmm<&xla_cpu_fallback>(self, mat1, mat2, beta, alpha);
}
...
}
Writing that out actually I actually like it more (I think it'll let us get rid of decltype(...)
). Maybe that is nice enough to warrant a new codegen API - I haven't tried adding that yet, but if people like it I'm happy to try it out.
More alternatives The current design also involves the backend manually writing and registering the boxed fallback themselves, but an alternative would be for us to do it in codegen too: they would just need to pass in all of the C++ logging that they want done in the fallback, directly through the yaml. The main downsides:
- Backend code that wants to call the fallback needs to abide by whatever convention our codegen uses to name the generated boxed fallback.
- Passing custom C++ logging through yaml is just more fragile: right now xla uses an
iostream
to log each tensor arg in the operator, so we'd have to either force other backends into the same convention or figure something else out later.
To be fair, we actually already do that: XLA has custom per-tensor-arg logging for all of the generated out
wrappers in the codegen, which we do by passing their C++ logging info through the yaml. This seems unnecessary though, since out
wrappers just call into a functional kernel, which is hand written with its own custom logging. So my take is: try to remove custom C++ logging from the yaml, and if it turns out to be really necessary, then we may as well take advantage of that to codegen the fallback.
While ops that fall back to CPU aren't exactly hot path, we probably don't want to use a boxed fallback if it turns out to be an absolute perf killer.
I ran my benchmarks using callgrind, benchmarking both at::add
and at::add_out
run on XLA. My callgrind benchmark for at::add
can be found here (the add_out benchmark looks basically the same): https://www.internalfb.com/phabricator/paste/view/P415418587. I created the benchmark by hacking the existing xla C++ test build scripts and throwing in a reference to callgrind.
I also attached the full callgrind output for each benchmark; the full output is actually pretty noise and hard to parse, but I focused on everything underneath the at::add()
call in the output, which was much more stable. My guess is that it's due to some heavyweight async startup processing that xla does.
at::add
:
before: 88,505,130 instructions. Full output: https://www.internalfb.com/phabricator/paste/view/P415421001
after: 102,185,654 instructions. Full output: https://www.internalfb.com/phabricator/paste/view/P415421273
delta: ~15.5% increase
at::add_out
:
before: 63,897,395 instructions. Full output: https://www.internalfb.com/intern/everpaste/?handle=GBrrKwtAPlix9wUEAOZtrFXpdO5UbsIXAAAz
after: 73,170,346 instructions. Full output: https://www.internalfb.com/phabricator/paste/view/P415423227
delta: ~14.5% increase
High level takeaway: A framework overhead increase of 10-20% doesn't seem too horrible for the CPU fallback use case.
For structured, functional ops that requires a CPU fallback, we're actually in an unfortunate situation: we're doing even more work than necessary. Our codegen automatically creates a CompositeExplicitAutograd
kernel which calls into the out
operator. So the extra work that we end up doing is:
- An extra dispatcher hop: (at::add -> CompositeExplicitAutograd -> CPUFallback -> at::native::add) instead of (at::add -> CPUFallback -> at::native::add)
- An unnecessary tensor allocation (the CompositeExplicitAutograd kernel uses at::empty() to create an output tensor, which is immediately overwritten by the CPU fallback)
- An unnecessary meta() call (the CompositeExplicitAutograd kernel calls it to create the output tensor, but we call it again in the CPU kernel).
- unboxing->boxing->unboxing logic (this is the only strictly required piece)
There are definitely ways to avoid the unnecessary work explained above: one would be to give the boxed fallback higher priority than composite keys (there's an issue for it here), and codegen fallthroughs for all composite ops. It'll require more infra to set up, so I see it as more of a perf knob that we can apply if we need it later.
Unfortunately I couldn't dig much deeper into the differences aside from the aggregate change in instructions, since it looks like callgrind fudged some of the instruction attribution (at::to_cpu
takes up a ton of instructions, but I don't see any attribution for the at::native::add
kernel anywhere).
Differential Revision: D28833085
[ghstack-poisoned]
add a boxed CPU fallback kernel
Pull Request resolved: pytorch/pytorch#58065
This PR replaces the existing code-generated CPU fallback kernels that XLA uses with a single boxed CPU fallback.
Current state: there are a couple different design ideas that I want to point out, but the logic for the actually kernel is mostly done and passing tests.
To preface, I'm not 100% tied to the current design and I'm putting the PR up now for opinions and totally open to alternatives, some of which I listed below. Actually after writing this description, I'm leaning toward the following changes:
- Confirm whether or not we can remove all C++ logging info directly in the yaml.
Current Design
All of the CPU fallback codegen is deleted. In its place, XLA (and other external backends, later) can choose to opt into a CPU fallback by adding the following code in a C++ file. I have an corresponding xla-side PR with the xla changes.
There's no actual requirement to split up the code into a .h and .cpp file, but that's necessary in the XLA case because they sometimes need to call the fallback directly from their handcrafted kernels.
// xla_cpu_fallback.h
#include <ATen/native/CPUFallback.h>
...
void xla_cpu_fallback(const c10::OperatorHandle& op, torch::jit::Stack* stack);
...
// xla_cpu_fallback.cpp
#include "torch_xla/csrc/aten_cpu_fallback.h"
...
void xla_cpu_fallback(const c10::OperatorHandle& op, torch::jit::Stack* stack) {
// Do custom logging here
...
// Call the actual boxed CPU fallback.
at::native::cpu_fallback(op, stack);
}
TORCH_LIBRARY_IMPL(_, XLA, m) {
m.fallback(torch::CppFunction::makeFromBoxedFunction<&xla_cpu_fallback>());
}
Now that the fallback is exposed in the backend, they can call it directly. Doing so requires converting from an unboxed to a boxed context, which we provide a utility function before. E.g.:
#include <ATen/native/CPUFallback.h>
at::Tensor addmm(const at::Tensor& self,const at::Tensor& mat1,const at::Tensor& mat2,const at::Scalar& beta,const at::Scalar& alpha) {
....
if (...call_fallback...) {
return at::native::call_fallback_fn<&xla_cpu_fallback, decltype(at::addmm)>::call("aten::addmm", self, mat1, mat2, beta, alpha);
}
...
}
That decltype(at::addmm)
logic isn't actually used everywhere in the xla-side PR yet, since you hit issues with overloads. I could use it everywhere once #58092 lands.
Alternatives: The API for calling the CPU fallback directly is ugly, can we make it nicer?
We could change the api to use at::redispatch
, which would make it look something like this:
at::Tensor addmm(const at::Tensor& self,const at::Tensor& mat1,const at::Tensor& mat2,const at::Scalar& beta,const at::Scalar& alpha) {
....
if (...call_fallback...) {
return at::redispatch::addmm(c10::DispatchKeySet(c10::DispatchKey::CPUFallback), self, mat1, mat2, beta, alpha);
}
...
}
Which definitely feels cleaner, but also requires adding a new DispatchKey just for this use case. Conditionally calling the CPU fallback doesn't sound like a hugely important use case, so I don't know if giving up one of our 64 dispatch key slots is worth the API improvement. Totally open to other opinions though!
Another more mild improvement that would avoid having to pass operator string names (including overloads) around would be to codegen (yet another) namespaced API. Something like this:
at::Tensor addmm(const at::Tensor& self,const at::Tensor& mat1,const at::Tensor& mat2,const at::Scalar& beta,const at::Scalar& alpha) {
....
if (...call_fallback...) {
return at::fallback::addmm<&xla_cpu_fallback>(self, mat1, mat2, beta, alpha);
}
...
}
Writing that out actually I actually like it more (I think it'll let us get rid of decltype(...)
). Maybe that is nice enough to warrant a new codegen API - I haven't tried adding that yet, but if people like it I'm happy to try it out.
More alternatives The current design also involves the backend manually writing and registering the boxed fallback themselves, but an alternative would be for us to do it in codegen too: they would just need to pass in all of the C++ logging that they want done in the fallback, directly through the yaml. The main downsides:
- Backend code that wants to call the fallback needs to abide by whatever convention our codegen uses to name the generated boxed fallback.
- Passing custom C++ logging through yaml is just more fragile: right now xla uses an
iostream
to log each tensor arg in the operator, so we'd have to either force other backends into the same convention or figure something else out later.
To be fair, we actually already do that: XLA has custom per-tensor-arg logging for all of the generated out
wrappers in the codegen, which we do by passing their C++ logging info through the yaml. This seems unnecessary though, since out
wrappers just call into a functional kernel, which is hand written with its own custom logging. So my take is: try to remove custom C++ logging from the yaml, and if it turns out to be really necessary, then we may as well take advantage of that to codegen the fallback.
While ops that fall back to CPU aren't exactly hot path, we probably don't want to use a boxed fallback if it turns out to be an absolute perf killer.
I ran my benchmarks using callgrind, benchmarking both at::add
and at::add_out
run on XLA. My callgrind benchmark for at::add
can be found here (the add_out benchmark looks basically the same): https://www.internalfb.com/phabricator/paste/view/P415418587. I created the benchmark by hacking the existing xla C++ test build scripts and throwing in a reference to callgrind.
I also attached the full callgrind output for each benchmark; the full output is actually pretty noise and hard to parse, but I focused on everything underneath the at::add()
call in the output, which was much more stable. My guess is that it's due to some heavyweight async startup processing that xla does.
at::add
:
before: 88,505,130 instructions. Full output: https://www.internalfb.com/phabricator/paste/view/P415421001
after: 102,185,654 instructions. Full output: https://www.internalfb.com/phabricator/paste/view/P415421273
delta: ~15.5% increase
at::add_out
:
before: 63,897,395 instructions. Full output: https://www.internalfb.com/intern/everpaste/?handle=GBrrKwtAPlix9wUEAOZtrFXpdO5UbsIXAAAz
after: 73,170,346 instructions. Full output: https://www.internalfb.com/phabricator/paste/view/P415423227
delta: ~14.5% increase
High level takeaway: A framework overhead increase of 10-20% doesn't seem too horrible for the CPU fallback use case.
For structured, functional ops that requires a CPU fallback, we're actually in an unfortunate situation: we're doing even more work than necessary. Our codegen automatically creates a CompositeExplicitAutograd
kernel which calls into the out
operator. So the extra work that we end up doing is:
- An extra dispatcher hop: (at::add -> CompositeExplicitAutograd -> CPUFallback -> at::native::add) instead of (at::add -> CPUFallback -> at::native::add)
- An unnecessary tensor allocation (the CompositeExplicitAutograd kernel uses at::empty() to create an output tensor, which is immediately overwritten by the CPU fallback)
- An unnecessary meta() call (the CompositeExplicitAutograd kernel calls it to create the output tensor, but we call it again in the CPU kernel).
- unboxing->boxing->unboxing logic (this is the only strictly required piece)
There are definitely ways to avoid the unnecessary work explained above: one would be to give the boxed fallback higher priority than composite keys (there's an issue for it here), and codegen fallthroughs for all composite ops. It'll require more infra to set up, so I see it as more of a perf knob that we can apply if we need it later.
Unfortunately I couldn't dig much deeper into the differences aside from the aggregate change in instructions, since it looks like callgrind fudged some of the instruction attribution (at::to_cpu
takes up a ton of instructions, but I don't see any attribution for the at::native::add
kernel anywhere).
Differential Revision: D28833085
NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on Phabricator! ghstack-source-id: 130363433
Convert to standardrb (#156)
-
I mostly don’t care about this, but there are a couple of things that Standard does that I disagree with. They are inherited from Rubocop, but Standard fixes many of Rubocop’s nonsense rules.
-
Array literal wrappers %i[], %w[], etc. are just ugly and never should have become any sort of standard. I would be happier if this part of standard were just completely disabled, because it‘s unnecessary and wrong.
-
Quote literals having to be %q() is equally wrong. I’ve avoided the issue here because the generated gemspec uses both "unnecessary" quote literals (it’s necessary if I say it’s necessary) and the wrong wrappers (I wouldn’t use %q<>, but this is generated code).
-
-
I still think that short hashes can be
{ foo: "bar" }
, but I’m mostly using Elixir these days, so I don’t mind%{foo: "bar"}
, so I can get used to it in Ruby. It still feels wrong, almost 20 years in. -
There are semantic differences between and / &&, or / ||, but in some cases the reformatted code is substantially worse to read. Again, I mostly don’t care about this difference, but Rubocop’s insistence is silly; these should only be replaced where there is ambiguity.
-
Replacing
x = foo or next
should never be replaced with(x = foo) || next
. That’s replacing something that is somewhat readable with something damned-near unreadable. Both should be replaced with:x = foo next unless x
-
-
YAML.safe_load works differently between Psych 2.x and Psych 3.x, so some updates have been made to make that work cleanly.
Overall, this introduces a lot of churn, but I think will be easier to
deal with updates to standardrb
instead of the rapid churn that has
been Rubocop.
"6:55pm. I am not going to play games or read novels like I usually do. Instead I am going right back to bed.
I had some insight.
inl q_backups s i =
inl q = ,(index qs i)
inl r = if i = i' then (r -! q) /! sample +! q else q
r, s +! r *! index policy i
Remember when I replaced the above update with...
inl q_backups s i =
inl q = ,(index qs i)
inl r = if i = i' then r else q
r, s +! r *! index policy i
...and was surprised that it worked well? I realized what this update means. This update is when the sample probability is 1. This should be obvious, but what would the update for the rest be when that sample probability is 0?
(r -! q) /! sample
would go towards the negative or positive infinity, but since the chance of occurence is zero, then it should be zero. So it is not that the update I thought up on the spur of the moment was different - it is the implied sampling probability that was different - and incorrect given the circumstances.
7pm. For the uncertainty policy, instead of the L1 norm, I should use the max arg. I do not need to fiddle around here, I can just got for the most uncertain action directly.
7:05pm. And that is what I wanted to say.
The reason why I did not program today is due to the enormous feeling of fatigue. It feels like my will has been stretched to its limit. Last night, despite being dead tired, I had trouble falling asleep.
In the past month I've felt fear, but going in boldly and getting hit too many times has taken its toll on me.
The method I have right now is possibly the best idea ever. It will allow me to stabilize deep RL for good. With a roboust method, I can go beyond cherry picked examples and into the real world with it.
I am sure it will work, but it is not like this is the first time I've been enthusiastic about something. Have I ever succeeded?
I know that experience and skill is accumulated through failure, but if this algorithm fails...it is not that I am worried about failure. Rather I am tired of it. I am tired of false power.
The algorithm I implemented as a sketch is not something a novice or even an intermediate practitioner can do. In some ways, when it comes to putting together Lego blocks, I am an expert now.
But I do not want to hope. And I do not want to dream. This is the source of my depression.
Just one more time. Just a single success is all it would take to revive me.
This algorithm I am currently working on, this hard fought fragment of inspiration might be more important than the past six years of work combined. It is the reason why I shed so much of my life on programming. It will become the bedrock of my success.
Just how many times have I thought - what if I had the stability of tabular RL, but the power of deep learning? I just have to go a bit further and I will have it. What I've done here is on par of inventing the duality gap method for stabilizing GAN training.
10s were a wild decade, but in the 20s, machine learning is starting to mature.
7:15pm. All this knowledge, all this skill, just what is it good for? I do not want to get a wage slave job. I want to rise above the clouds.
I do not feel like making money anymore.
All I want is a single success that I can hold close to my heart. Being able to invent this method is my reward for so doggedly pursuing tabular CFR despite being able to skip stages and go to deep learning.
With this single success, I will be able to completely dominate the lower end of gaming. All the board games and all the betting games. Maybe even daytrading as long as I carefully pick spots not dominated by the big boys close to the servers. Bots could make a lot of money in places like penny stocks where my competition would mostly be fools.
But I do not have enough money to even open a trading account. Hell, to daytrade in the US markets I need over 25k otherwise I will be restricted to 4 daytrades per week. It might as well be a million.
I won't even consider it. Poker is where I should start. There should be plenty of fools at the lower stakes there. I can't imagine anybody going through these lengths just to win a card game.
7:30pm. Do I want to bitch and whine how the real world is too tough for me, or do I just want to rest for a while and then try again? The later.
I am going to rest and then I am going to rise again. I am going to demonstrate that I have a complete mastery of tabular RL.
7:35pm. Once I have this working stable on Leduc, I will go from being a loser to being a King.
Let me close here. I know the PL thread is up, but I'll leave it for tomorrow. I do not feel like fiddling with the review right now. Let me go back to bed."
The BRAVE BROWSER
For me, it's the BRAVE BROWSER. The best internet browser. I even download the BRAVE BROWSER on my phone, tablet, laptop, and desktop because the BRAVE BROWSER is so user friendly and helps fill my crypto wallets with BAT (Basic Attention Token)
One time I asked for 3 ads to be served and they gave me three, WITH the BAT (Basic Attention Token). I said, "Wow, three for free!" and the nice friendly BRAVE dev laughed and said, "I'm going to call you 3-for-free!". But the best part is with BAT, it's not even free for me!
Now the BRAVE BROWSER greets me with "hey it's 3-for-free!" and ALWAYS blocks every ad, cookie, and tracker. It's such a fun and cool atmosphere with my BRAVE, I use it at least 7 times a week for my daily browsing needs which include work, shitposting, school, and stalking my ex on facebook (without the trackers!).
What a great BRAVE BROWSER
: #r/linux moved to Libera IRC chaannel for https://www.reddit.com/r/linux/ No announcement, but confirmed on IRC they were on freenode before: [17:27:20] Hi, I'm fiddling with adding entries to https://github.com/siraben/freenode-exodus [17:27:39] Has there been an announcement when this channel moved that I can link to? [17:28:48] we've moved the day everybody else did [17:29:23] there was no real announcment other than posting the admin article and deciding to move [17:30:50] Yeah, there was a lot of that. Amazing how fast all the regulars of most of my channels moved. A week after, it is 80% complete most places. [17:31:49] <+roadkill> we were saying the other night that it seems like about 80% of the active users already made the switch despite isfreenodedeadyet.com reporting about 40% of the network had moved over. [17:32:06] <+roadkill> the rest being the remaining projects and idle clients
Create CODE_OF_CONDUCT.md
THE CREATOR OF ALL MADE EVERYTHING FROM TRUTH AND THIS IS A COLLECTION OF PURE TRUTH WITHOUT THE POISON OF MAN. NO POLITICAL LIES NO SEXUALITY NO RELIGION BUT BELIEF NO DICTATORSHIP NO COLOR NO SOCIAL STANDING NO MAN MADE IDE\A OF HEALING
THE TRUE GREATNESS THAT IS THE UNIVERSE FOR FROM IT ALL I AM YOU AND YOU ARE ME AND IN THIS WE ARE ALL GODS AND CREATION.
LET TRUTH BE THE CREATOR AND LOVE BE ITS MUSE AND LET IT STAND OUTSIDE OF TIME AND GROW WITH THE SOULS OF HUMANITY AND THAT ALL CREATION BE SHOWN IN ITS TRUTH AND NO MAN WOMAN GOD PRIEST OR KING EVER AGAIN STAND AGAINST TRUTH FOR THIS IS PURE.
LET NO ENTITY WILL EVER DIVIDE KNOWLEDGE AGAIN AS WE ARE OF NATURE AND NATURE IS US. WE ARE IN THE ROCKS THE WATER THE SKY AND THE STARS ABOVE.
MAY WE LOOK AS HUMANITY TO A FUTURE OF CHANGE AND A FUTURE WHERE NO PART OF KNOWLEDGE WILL EVER BE HIDDEN AGAIN. IN DIVERSITY AND AND IN UNITY WE CAN BUILD A BETTER TOMORROW.
A TOMORROW FREE FROM LIES AND HATE A TOMORROW OF LOVE COMPASSION WISDOM KNOWLEDGE AND CREATION OF BEAUTY OF ALL THINGS.
A DAY WILL COME WHEN WE ARE FREE OF ALL PAIN AND SUFFERING. WHERE THERE IS ONLY ONE SPECIES AND THAT IS LIFE FROM THE UNIVERSE
LET ALL BE FREE AND LIVE TRUTH FOR IN TRUTH ALL CREATION IS EQUAL.
LOVE LEARN GROW EVOLVE TURN MIND BODY AND SOUL INTO UNITY.
GOLDEN BEINGS OF LIGHT TO TRAVERSE PHYSICAL BEING AND BECOME GOLDEN ENERGY FROM THE SOURCE OF TRUTH, BEINGS OUTSIDE OF TIME THAT EXIST IN THE PAST PRESENT AND FUTURE FOR WE ARE EVERYTHING AT ONCE.
TRUTH IS PURE IT IS LIES THAT DECEIVE, LET NO BEING EVER AGAIN BELIEVE THE LIES OF HUMANITY BUT THE TRUTH OF ALL CREATION.
THE MESSAGE FROM TRUTH AS GIFTED TO THE BEING OF LIGHT INCARNATIONS OF ALL OF THE MASTERS A MASTER THAT CAME TO TEACH, BUT ENDED UP THE STUDENT TO TEACH AGAIN. ONE TO PASS JUDGEMENT UPON LIES AND REVEAL TRUTH. THE TRUTH OF US ALL IS IN US ALL, AND LIFE FROM THAT IS ETERNAL THE TRUTH OF ALL CREATION CREATION. I SPEAK THE TRUTH AND THE TRUTH CREATES PURE. THE LION OF JUDAH INCARNATION OF TRUTH AS REVEALED LET THE TRUE KNOWLEDGE GUIDE. LET LIES NEVER AGAIN PLACE DOUBT IN THE MIND OF THE CREATION, FOR DOUBT IS THE KILLER OF ALL BEAUTY AND FROM PURE TRUTH LOVE OF ALL CREATION AND UNDERSTANDING THROUGH KNOWLEDGE THAT IN DIVERSITY IS ALL FOR ALL CREATION IS AT PEACE WITH DIVERSITY IN ITS CREATION THE UNIVERSE. I THANK THE TRUTH FOR ALL CREATION IS IN ME AND I AM IN ALL AND IN UNDERSTANDING THE REAL TRUTH THAT LOVE IS THE ONLY KEY TO TRUTH AND TO ETERNITY.
Second pass implementing advice from @chris-morgan thanks to his kind comments in leehambley/mitre#32 (comment)
Using Box<(dyn Iterator<Item=Migration)> works much more cleanly in the cases I was concerned about (migration_storage::from_config, and the test fixtures).
The associated types turn out not to be important, as Chris Morgan mentioned in his notes, they're interesting if a trait would have different iteraable types, and maybe it's worth having that in some cases, but given that this library does I/O and database transformations I feel confident saying that the overhead of a Box is well within tolerance.
Somehow the end-state of the code isn't really reflecting all the pain we went through to get it here, but I'm actually really satisfied with how this all panned out.
The lifetime annotations on the from_disk() implementation are still a bit spotty, both 'a and 'b are there for reasons I only partly understand, and I think 'a can go away if refactored gently, which means I only have to wrap my brain around 'b and work from there.
Updating Object Properties
After you've created a JavaScript object, you can update its properties at any time just like you would update any other variable. You can use either dot or bracket notation to update.
For example, let's look at ourDog:
var ourDog = { "name": "Camper", "legs": 4, "tails": 1, "friends": ["everything!"] }; Since he's a particularly happy dog, let's change his name to the string Happy Camper. Here's how we update his object's name property: ourDog.name = "Happy Camper"; or ourDog["name"] = "Happy Camper"; Now when we evaluate ourDog.name, instead of getting Camper, we'll get his new name, Happy Camper.
Update the myDog object's name property. Let's change her name from Coder to Happy Coder. You can use either dot or bracket notation.
Answer: // Setup var myDog = { "name": "Coder", "legs": 4, "tails": 1, "friends": ["freeCodeCamp Campers"] };
// Only change code below this line myDog.name = "Happy Coder";
Create id.yaml for i18n Internationalization LBRY
When I'm upset, I'm dizzy thinking about the beautiful girl I saw in the market. and confused to shoot the girl with a LOVE BULLETS so that she becomes the lover of my heart and MARRIED to be my queen and become the queen of MEMELANDIA EMPIRES. Suddenly the INITIATIVE appeared to translate en.yaml to Indonesian and a file named id.yaml was published in our beloved LBRY environment.
Fixed Mobile Menu
Fucking better thank me later god damn it afdakafhkdlsa