-
Notifications
You must be signed in to change notification settings - Fork 26.3k
quant bench: update observer configs #42956
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary: In preparation for observer perf improvement, cleans up the micro benchmarks: * disable CUDA for histogram observers (it's too slow) * add larger shapes for better representation of real workloads Test Plan: ``` cd benchmarks/operator_benchmark python -m pt.qobserver_test ``` Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
💊 CI failures summary and remediationsAs of commit d96e549 (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group. This comment has been revised 3 times. |
Summary: In preparation for observer perf improvement, cleans up the micro benchmarks: * disable CUDA for histogram observers (it's too slow) * add larger shapes for better representation of real workloads Test Plan: ``` cd benchmarks/operator_benchmark python -m pt.qobserver_test ``` Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D23093996](https://our.internmc.facebook.com/intern/diff/D23093996) [ghstack-poisoned]
Summary: In preparation for observer perf improvement, cleans up the micro benchmarks: * disable CUDA for histogram observers (it's too slow) * add larger shapes for better representation of real workloads Test Plan: ``` cd benchmarks/operator_benchmark python -m pt.qobserver_test ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 6047570 Pull Request resolved: #42956
|
|
||
| def forward(self): | ||
| return self.op_func(self.f_input) | ||
| self.op_func(self.f_input) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously we had a forward and qparam benchmark separately which might be more useful in practice. We call forward for multiple iterations and calcqparams once at convert. With the separate ones, we can also synthesize the time taken for the combined forward+calcqparam call. Is there a reason to prefer this way of doing profiling?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is making the benchmark represent what happens inside the observer during QAT, not keeping the old code around because I'm not aware of a need for it in the near future. We have separate benchmarks for histogram observers, and I'm not aware of any requests to optimize observers outside of QAT + histogram observers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
calculate_qparams is called at every pass through the observer during QAT, when observers are enabled
|
This pull request has been merged in 5aa61af. |
Stack from ghstack:
Summary:
In preparation for observer perf improvement, cleans up the
micro benchmarks:
Test Plan:
Reviewers:
Subscribers:
Tasks:
Tags:
Differential Revision: D23093996