-
-
Notifications
You must be signed in to change notification settings - Fork 12.1k
[Frontend] improve error logging of chat completion #22957
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Frontend] improve error logging of chat completion #22957
Conversation
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request improves error logging for chat completions by catching exceptions and returning them in the HTTP response, which is a good improvement for client-side debugging. I've suggested a refinement to handle NotImplementedError specifically, which will provide more accurate HTTP status codes to the client and improve consistency with other API endpoints in the file.
| try: | ||
| generator = await handler.create_chat_completion(request, raw_request) | ||
| except Exception as e: | ||
| raise HTTPException(status_code=HTTPStatus.INTERNAL_SERVER_ERROR.value, | ||
| detail=str(e)) from e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While adding a general exception handler is a good step to improve error reporting, it would be more precise to handle specific exceptions like NotImplementedError with a more appropriate HTTP status code. This would align with REST best practices and improve consistency with other endpoints in this file, such as /tokenize, which returns a 501 Not Implemented status for this error. This provides clearer feedback to the client about why the request failed.
try:
generator = await handler.create_chat_completion(request, raw_request)
except NotImplementedError as e:
raise HTTPException(status_code=HTTPStatus.NOT_IMPLEMENTED.value,
detail=str(e)) from e
except Exception as e:
raise HTTPException(status_code=HTTPStatus.INTERNAL_SERVER_ERROR.value,
detail=str(e)) from e|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
| try: | ||
| generator = await handler.create_chat_completion(request, raw_request) | ||
| except Exception as e: | ||
| raise HTTPException(status_code=HTTPStatus.INTERNAL_SERVER_ERROR.value, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should have a breakdown of status codes instead of mapping everything to HTTP code 500.
The 5xx error codes also indicate that the error is server-blame (i.e. bug in server code) which isn't always the case (e.g. the request itself may have other issues).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if request has any issue, we expect the engine to return an error response without raising an exception. But happy to add if you can list some simple ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. So generator is always expected to return the next message which may include the error response, instead of raising any exception? If that's the case, then yeah, code 500 would be appropriate here.
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
My only concern is that this may log prompts/responses, so in sensitive deployments with |
|
What does "sensitive deployment" mean? Sensitive to speed or not give sensitive output to users? I thought |
|
merge with main branch to include the entrypoint CI failure fix from main branch |
|
I meant where prompts contain sensitive data that should not be logged. We administrate such deployments at work where shouldn't be able to know what our customers are prompting |
|
This PR doesn't add new logs to server side. The error information is at client side. |
Signed-off-by: Chen Zhang <zhangch99@outlook.com> Signed-off-by: Duncan Moss <djm.moss@gmail.com>
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: Chen Zhang <zhangch99@outlook.com> Signed-off-by: Xiao Yu <xiao.yu@amd.com>
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Purpose
Let the user to see more errors from chat completion API
Test Plan
Run a request that will raise error
Test Result
without this pr, client will see
with this pr, client will see
(Optional) Documentation Update
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.