Skip to content

Conversation

@vkuzo
Copy link
Contributor

@vkuzo vkuzo commented Aug 13, 2020

Stack from ghstack:

Summary:

In preparation for observer perf improvement, cleans up the
micro benchmarks:

  • disable CUDA for histogram observers (it's too slow)
  • add larger shapes for better representation of real workloads

Test Plan:

cd benchmarks/operator_benchmark
python -m pt.qobserver_test

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: D23093996

Summary:

In preparation for observer perf improvement, cleans up the
micro benchmarks:
* disable CUDA for histogram observers (it's too slow)
* add larger shapes for better representation of real workloads

Test Plan:

```
cd benchmarks/operator_benchmark
python -m pt.qobserver_test
```

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
@dr-ci
Copy link

dr-ci bot commented Aug 13, 2020

💊 CI failures summary and remediations

As of commit d96e549 (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 3 times.

Summary:

In preparation for observer perf improvement, cleans up the
micro benchmarks:
* disable CUDA for histogram observers (it's too slow)
* add larger shapes for better representation of real workloads

Test Plan:

```
cd benchmarks/operator_benchmark
python -m pt.qobserver_test
```

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D23093996](https://our.internmc.facebook.com/intern/diff/D23093996)

[ghstack-poisoned]
vkuzo added a commit that referenced this pull request Aug 16, 2020
Summary:

In preparation for observer perf improvement, cleans up the
micro benchmarks:
* disable CUDA for histogram observers (it's too slow)
* add larger shapes for better representation of real workloads

Test Plan:

```
cd benchmarks/operator_benchmark
python -m pt.qobserver_test
```

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 6047570
Pull Request resolved: #42956

def forward(self):
return self.op_func(self.f_input)
self.op_func(self.f_input)
Copy link
Contributor

@raghuramank100 raghuramank100 Aug 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously we had a forward and qparam benchmark separately which might be more useful in practice. We call forward for multiple iterations and calcqparams once at convert. With the separate ones, we can also synthesize the time taken for the combined forward+calcqparam call. Is there a reason to prefer this way of doing profiling?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is making the benchmark represent what happens inside the observer during QAT, not keeping the old code around because I'm not aware of a need for it in the near future. We have separate benchmarks for histogram observers, and I'm not aware of any requests to optimize observers outside of QAT + histogram observers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

calculate_qparams is called at every pass through the observer during QAT, when observers are enabled

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 5aa61af.

@facebook-github-bot facebook-github-bot deleted the gh/vkuzo/122/head branch August 21, 2020 14:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants