Skip to content

Conversation

@supriyar
Copy link
Contributor

@supriyar supriyar commented Jul 17, 2020

Stack from ghstack:

Summary:
This change adds preliminary support to quantize the EmbeddingBag operators. We currently support 4-bit and 8-bit quantization+packing of the weights.

To quantize these operators, specify the operator name in the custom_op_name field of the NoopObserver. Based on the op name (4bit or 8bit) we call the corresponding quantization functions.
Refer to the testplan for how to invoke the qconfig for the embedding_bag ops.

Future versions of this will support 4-bit and 2-bit qtensors with native support to observe and quantize it.

NB - This version assumes that the weights in the EmbeddingBag Module reside on the same device.

Test Plan:
python test/test_quantization.py TestQuantizeDynamicJitOps.test_embedding_bag

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: D22609342

Summary:
This change adds preliminary support to quantize the EmbeddingBag operators. We currently support 4-bit and 8-bit quantization+packing of the weights.

To quantize these operators, specify the operator name in the `custom_op_name` field of the NoopObserver. Based on the op name (4bit or 8bit) we call the corresponding quantization functions.
Refer to the testplan for how to invoke the qconfig for the embedding_bag ops.

Future versions of this will support 4-bit and 2-bit qtensors with native support to observe and quantize it.

NB - This version assumes that the weights in the EmbeddingBag Module reside on the same device.

Test Plan:
python test/test_quantization.py TestQuantizeDynamicJitOps.test_embedding_bag

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
@supriyar supriyar requested a review from apaszke as a code owner July 17, 2020 22:34
@facebook-github-bot facebook-github-bot added the oncall: jit Add this issue/PR to JIT oncall triage queue label Jul 17, 2020
…rators"

Summary:
This change adds preliminary support to quantize the EmbeddingBag operators. We currently support 4-bit and 8-bit quantization+packing of the weights.

To quantize these operators, specify the operator name in the `custom_op_name` field of the NoopObserver. Based on the op name (4bit or 8bit) we call the corresponding quantization functions.
Refer to the testplan for how to invoke the qconfig for the embedding_bag ops.

Future versions of this will support 4-bit and 2-bit qtensors with native support to observe and quantize it.

NB - This version assumes that the weights in the EmbeddingBag Module reside on the same device.

Test Plan:
python test/test_quantization.py TestQuantizeDynamicJitOps.test_embedding_bag

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D22609342](https://our.internmc.facebook.com/intern/diff/D22609342)

[ghstack-poisoned]
@dr-ci
Copy link

dr-ci bot commented Jul 17, 2020

💊 CI failures summary and remediations

As of commit 3210bd9 (more details on the Dr. CI page):


  • 1/1 failures possibly* introduced in this PR
    • 1/1 non-CircleCI failure(s)

Extra GitHub checks: 1 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 15 times.

…rators"

Summary:
This change adds preliminary support to quantize the EmbeddingBag operators. We currently support 4-bit and 8-bit quantization+packing of the weights.

To quantize these operators, specify the operator name in the `custom_op_name` field of the NoopObserver. Based on the op name (4bit or 8bit) we call the corresponding quantization functions.
Refer to the testplan for how to invoke the qconfig for the embedding_bag ops.

Future versions of this will support 4-bit and 2-bit qtensors with native support to observe and quantize it.

NB - This version assumes that the weights in the EmbeddingBag Module reside on the same device.

Test Plan:
python test/test_quantization.py TestQuantizeDynamicJitOps.test_embedding_bag

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D22609342](https://our.internmc.facebook.com/intern/diff/D22609342)

[ghstack-poisoned]
supriyar added a commit that referenced this pull request Jul 18, 2020
Summary:
This change adds preliminary support to quantize the EmbeddingBag operators. We currently support 4-bit and 8-bit quantization+packing of the weights.

To quantize these operators, specify the operator name in the `custom_op_name` field of the NoopObserver. Based on the op name (4bit or 8bit) we call the corresponding quantization functions.
Refer to the testplan for how to invoke the qconfig for the embedding_bag ops.

Future versions of this will support 4-bit and 2-bit qtensors with native support to observe and quantize it.

NB - This version assumes that the weights in the EmbeddingBag Module reside on the same device.

Test Plan:
python test/test_quantization.py TestQuantizeDynamicJitOps.test_embedding_bag

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 5d8a834
Pull Request resolved: #41612
Copy link
Contributor

@vkuzo vkuzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, accepting to unblock. Feel free to wait for @jerryzh168 if a deeper review on the JIT pass is needed.

offsets = torch.tensor([0, 19, 20, 28, 28, 32])

from torch.quantization import QConfigDynamic, NoopObserver
int4_dynamic_qconfig = QConfigDynamic(activation=NoopObserver.with_args(custom_op_name="embedding_bag_4bit"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense that custom_op_name is a temporary solution. Do we have thoughts on what to replace it with eventually / why not now? Would it be EmbeddingBag{8|4|2}BitObserver / something else?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the longer term solution? are we planning to add torch.qint4, torch.qint2 etc.?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, long term this will be replaced with observer that can support torch.qint4 and torch.qint2

auto observer_module = module.attr(findObserverName(v).value()).toModule();
if (observer_module.hasattr("custom_op")) {
auto op_name = observer_module.attr("custom_op").toStringRef();
return isNoopObserver(observer) ? op_name : "";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

follow up PR: since NoopObserver is special, probably better to add a "_" prefix to reserve this for internal use.

}
// Insert prepack op
Node* prepack = g->create(Symbol::fromQualString(prepack_fn), prepack_inputs);
g->insertNode(prepack);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you also add a WithInsertPoint? we want to insert before the use node I think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should just be: WithInsertPoint ins(embedding_bag_loat_op);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've already added the insert point in insertQuantizationOps at the output of the observer node.

Comment on lines 856 to 860
for (const Use& use : uses) {
if (matchCallFuncToUse(use, "embedding_bag", 2)) {
embedding_bag_float_op = use.user;
}
}
Copy link
Contributor

@jerryzh168 jerryzh168 Jul 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are the possible cases? is the observer_out always going to be used by an embedding_bag op here?

…rators"

Summary:
This change adds preliminary support to quantize the EmbeddingBag operators. We currently support 4-bit and 8-bit quantization+packing of the weights.

To quantize these operators, specify the operator name in the `custom_op_name` field of the NoopObserver. Based on the op name (4bit or 8bit) we call the corresponding quantization functions.
Refer to the testplan for how to invoke the qconfig for the embedding_bag ops.

Future versions of this will support 4-bit and 2-bit qtensors with native support to observe and quantize it.

NB - This version assumes that the weights in the EmbeddingBag Module reside on the same device.

Test Plan:
python test/test_quantization.py TestQuantizeDynamicJitOps.test_embedding_bag

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D22609342](https://our.internmc.facebook.com/intern/diff/D22609342)

[ghstack-poisoned]
supriyar added a commit that referenced this pull request Jul 22, 2020
Summary:
This change adds preliminary support to quantize the EmbeddingBag operators. We currently support 4-bit and 8-bit quantization+packing of the weights.

To quantize these operators, specify the operator name in the `custom_op_name` field of the NoopObserver. Based on the op name (4bit or 8bit) we call the corresponding quantization functions.
Refer to the testplan for how to invoke the qconfig for the embedding_bag ops.

Future versions of this will support 4-bit and 2-bit qtensors with native support to observe and quantize it.

NB - This version assumes that the weights in the EmbeddingBag Module reside on the same device.

Test Plan:
python test/test_quantization.py TestQuantizeDynamicJitOps.test_embedding_bag

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 845f2a0
Pull Request resolved: #41612
…rators"

Summary:
This change adds preliminary support to quantize the EmbeddingBag operators. We currently support 4-bit and 8-bit quantization+packing of the weights.

To quantize these operators, specify the operator name in the `custom_op_name` field of the NoopObserver. Based on the op name (4bit or 8bit) we call the corresponding quantization functions.
Refer to the testplan for how to invoke the qconfig for the embedding_bag ops.

Future versions of this will support 4-bit and 2-bit qtensors with native support to observe and quantize it.

NB - This version assumes that the weights in the EmbeddingBag Module reside on the same device.

Test Plan:
python test/test_quantization.py TestQuantizeDynamicJitOps.test_embedding_bag

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D22609342](https://our.internmc.facebook.com/intern/diff/D22609342)

[ghstack-poisoned]
supriyar added a commit that referenced this pull request Jul 23, 2020
Summary:
This change adds preliminary support to quantize the EmbeddingBag operators. We currently support 4-bit and 8-bit quantization+packing of the weights.

To quantize these operators, specify the operator name in the `custom_op_name` field of the NoopObserver. Based on the op name (4bit or 8bit) we call the corresponding quantization functions.
Refer to the testplan for how to invoke the qconfig for the embedding_bag ops.

Future versions of this will support 4-bit and 2-bit qtensors with native support to observe and quantize it.

NB - This version assumes that the weights in the EmbeddingBag Module reside on the same device.

Test Plan:
python test/test_quantization.py TestQuantizeDynamicJitOps.test_embedding_bag

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 76a8c2a
Pull Request resolved: #41612
…rators"

Summary:
This change adds preliminary support to quantize the EmbeddingBag operators. We currently support 4-bit and 8-bit quantization+packing of the weights.

To quantize these operators, specify the operator name in the `custom_op_name` field of the NoopObserver. Based on the op name (4bit or 8bit) we call the corresponding quantization functions.
Refer to the testplan for how to invoke the qconfig for the embedding_bag ops.

Future versions of this will support 4-bit and 2-bit qtensors with native support to observe and quantize it.

NB - This version assumes that the weights in the EmbeddingBag Module reside on the same device.

Test Plan:
python test/test_quantization.py TestQuantizeDynamicJitOps.test_embedding_bag

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D22609342](https://our.internmc.facebook.com/intern/diff/D22609342)

[ghstack-poisoned]
inputs.push_back(g->insertGetAttr(self, qparam_name));
// Temporary solution to quantize embedding_bag operators.
auto embedding_bag_name = getEmbeddingBagObsName(module, observer);
if (quant_type == QuantType::DYNAMIC && embedding_bag_name &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you merge this branch with the one lin L399 now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer keeping it separate since this is a special case and this code will be removed in the future. Wanted to make that a little more obvious :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might also be good to have a isEmbeddingBagOp function to be consistent with isFP16NoopObserver

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, sure. do you mean we plan to remove the swapping of input and weight in the future in the embedding bag module? I think this will be needed in graph mode if that does not change

Comment on lines +389 to +390
observer_out->replaceAllUsesWith(original_val);
original_val->replaceAllUsesAfterNodeWith(dequant, dequant->output());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these two lines are the same as 417-419

Copy link
Contributor

@jerryzh168 jerryzh168 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! thanks. had a few more inline comments.

}

// find the observer for Value `v` and return the name of the observer
c10::optional<std::string> findObserverName(Value* v) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw we can check for types now I think

…rators"

Summary:
This change adds preliminary support to quantize the EmbeddingBag operators. We currently support 4-bit and 8-bit quantization+packing of the weights.

To quantize these operators, specify the operator name in the `custom_op_name` field of the NoopObserver. Based on the op name (4bit or 8bit) we call the corresponding quantization functions.
Refer to the testplan for how to invoke the qconfig for the embedding_bag ops.

Future versions of this will support 4-bit and 2-bit qtensors with native support to observe and quantize it.

NB - This version assumes that the weights in the EmbeddingBag Module reside on the same device.

Test Plan:
python test/test_quantization.py TestQuantizeDynamicJitOps.test_embedding_bag

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D22609342](https://our.internmc.facebook.com/intern/diff/D22609342)

[ghstack-poisoned]
supriyar added a commit that referenced this pull request Jul 23, 2020
Summary:
This change adds preliminary support to quantize the EmbeddingBag operators. We currently support 4-bit and 8-bit quantization+packing of the weights.

To quantize these operators, specify the operator name in the `custom_op_name` field of the NoopObserver. Based on the op name (4bit or 8bit) we call the corresponding quantization functions.
Refer to the testplan for how to invoke the qconfig for the embedding_bag ops.

Future versions of this will support 4-bit and 2-bit qtensors with native support to observe and quantize it.

NB - This version assumes that the weights in the EmbeddingBag Module reside on the same device.

Test Plan:
python test/test_quantization.py TestQuantizeDynamicJitOps.test_embedding_bag

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 7928558
Pull Request resolved: #41612
@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 36fb14b.

@facebook-github-bot facebook-github-bot deleted the gh/supriyar/150/head branch July 27, 2020 14:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged oncall: jit Add this issue/PR to JIT oncall triage queue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants