-
Notifications
You must be signed in to change notification settings - Fork 834
Add a new DAE pass with a fixed point analysis #8217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
This replaces #8085 because so much has changed in the meantime. |
DAE can be slow because it performs several rounds of interleaved analysis and optimization. On top of this, the analysis it performs is not as precise as it could be because it never removes parameters from referenced functions and it cannot optimize unused parameters or results that are forwarded through recursive cycles. Start improving both the performance and the power of DAE by creating a new pass, called DAE2 for now. DAE2 performs a single parallel walk of the module to collect information with which it performs a fixed point analysis to find unused parameters, then does a single parallel walk of the module to optimize based on this analysis.
|
There is a lot of test code here - is it copied from the existing tests? If so, can we use another lit run line for it? If that isn't practical yet because of differences, I still wonder if we can reuse the existing tests somehow, even if the two passes have differing output in some cases (can use a different check prefix for those tests, maybe move them out to another file, etc.). On another topic, what is our overall plan here? Do you want to land this even before we know if we will be replacing the old pass? |
|
The tests here are all new. We could additionally run the existing DAE tests, but we wouldn't be able to remove many of the new tests that way. |
|
As far as the overall plan goes, I'd like to have this landed so we can consider making it more powerful iteratively. My hope is that if it does more work DAE currently does, then we'll start seeing better speedups. We could also use it immediately to replace the unprofitable bailout from DAE, but that's probably not worth it. |
| // forwarded on to other function calls. These forwarded parameters can still be | ||
| // optimized out as long as they are unused in the callees they are forwarded | ||
| // to. Since we perform a fixed point analysis, cycles of forwarded parameters | ||
| // can still be removed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe worth noting that "forwarded" is not just "literally the value is forwarded" but also transformations of the value.
| using Params = std::vector<Used::Element>; | ||
|
|
||
| // Function index and parameter index. | ||
| using ParamLoc = std::pair<Index, Index>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about FuncParamLoc, to match TypeParamLoc below?
|
|
||
| // A set of (source, destination) index pairs for parameters of a caller | ||
| // function being forwarded as arguments to a callee function. | ||
| using ForwardedParamSet = std::unordered_set<std::pair<Index, Index>>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| using ForwardedParamSet = std::unordered_set<std::pair<Index, Index>>; | |
| using ForwardedParamSet = std::unordered_set<ParamLoc>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be slightly deceptive because both Indexes here are parameter indices. In particular the first one is not a function index.
| #define TIME(...) | ||
| #endif // TIME_DAE | ||
|
|
||
| // Find the root of the subtyping hierarchy for a given HeapType. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // Find the root of the subtyping hierarchy for a given HeapType. | |
| // Find the non-basic root of the subtyping hierarchy for a given HeapType. |
| // Unreferenced functions can be optimized separately from referenced | ||
| // functions with the same type. For unreferenced functions in that situation, | ||
| // this is the new type that should be applied before global type rewriting to | ||
| // prevent the function from getting the wrong optimizations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't follow this. If a function is unreferenced, we track that in referenced, and in that case, we can just apply all the stuff we find, without caring about syncing it up with functions with a similar type. Why do we need to set a particular "replacement type" here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use a GlobalTypeRewriter to rewrite optimized function types for indirect calls globally. GlobalTypeRewriter does not provide any way to skip updating the types of non-referenced functions, so we have to arrange for the GlobalTypeRewriter to rewrite the types of non-referenced functions to the desired signatures as well. When different functions with the same original type should be optimized to have different types, we need to ensure those functions have different types before global type rewriting, otherwise it would not be possible for them to have different types after global type rewriting.
DAE can be slow because it performs several rounds of interleaved
analysis and optimization. On top of this, the analysis it performs is
not as precise as it could be because it never removes parameters from
referenced functions and it cannot optimize unused parameters or results
that are forwarded through recursive cycles.
Start improving both the performance and the power of DAE by creating a
new pass, called DAE2 for now. DAE2 performs a single parallel walk of
the module to collect information with which it performs a fixed point
analysis to find unused parameters, then does a single parallel walk of
the module to optimize based on this analysis.