Skip to content

Commit 1c4a0ee

Browse files
authored
Update windows related documentation (#59)
1 parent 26f797c commit 1c4a0ee

File tree

2 files changed

+37
-30
lines changed

2 files changed

+37
-30
lines changed

windows/README.md

Lines changed: 36 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -10,40 +10,42 @@
1010
- [Installation](#installation)
1111
- [Extra Steps for C++ Runtime Usage](#extra-steps-for-c-runtime-usage)
1212
- [Next Steps](#next-steps)
13+
- [Limitations](#limitations)
1314

1415
## Overview
1516

16-
TensorRT-LLM is supported on bare-metal Windows for single-GPU inference. We provide a release wheel for Windows which can be downloaded from https://developer.nvidia.com/. Alternatively, you may build TensorRT-LLM for Windows from source. Building from source is an advanced option and is not necessary for building or running LLM engines. It is, however, required if you plan to use the C++ runtime directly or run C++ benchmarks.
17+
TensorRT-LLM is supported on bare-metal Windows for single-GPU inference. The release supports GeForce 40-series GPUs.
18+
19+
The release wheel for Windows can be installed with `pip`. Alternatively, you may build TensorRT-LLM for Windows from source. Building from source is an advanced option and is not necessary for building or running LLM engines. It is, however, required if you plan to use the C++ runtime directly or run C++ benchmarks.
1720

1821
## Quick Start
1922

2023
If you encounter difficulties with any prerequisites, check the [Detailed Setup](#detailed-setup) instructions below.
2124

2225
Prerequisites:
23-
- [Python3 >= 3.9](https://www.python.org/downloads/windows/)
24-
- [CUDA 12.2 Toolkit](https://developer.nvidia.com/cuda-downloads?target_os=Windows&target_arch=x86_64)
26+
- [Python 3.10](https://www.python.org/downloads/windows/)
27+
- [CUDA 12.2 Toolkit](https://developer.nvidia.com/cuda-12-2-2-download-archive?target_os=Windows&target_arch=x86_64)
2528
- [Microsoft MPI](https://www.microsoft.com/en-us/download/details.aspx?id=57467)
2629
- [cuDNN](https://developer.nvidia.com/cudnn)
27-
- [TensorRT 9.1.0.4](https://docs.nvidia.com/deeplearning/tensorrt/install-guide/index.html#installing-zip)
30+
- [TensorRT 9.1.0.4 for TensorRT-LLM](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/secure/9.1.0/tars/tensorrt-9.1.0.4.windows10.x86_64.cuda-12.2.llm.beta.zip)
2831

2932
```
30-
pip install -r .\requirements-windows.txt
31-
pip install tensorrt_llm-<version>-py3-none-any.whl
33+
pip install tensorrt_llm --extra-index-url https://pypi.nvidia.com --extra-index-url https://download.pytorch.org/whl/nightly/cu121
3234
```
3335

3436
## Detailed Setup
3537

3638
### Python
37-
Install [Python3 >= 3.9](https://www.python.org/downloads/windows/). When installing, add to the system `Path` and click "Disable path length limit." The installation may only add the `python` command, but not the `python3` command. Navigate to the installation path, `C:\Users\<username>\AppData\Local\Programs\Python\Python39` (note `AppData` is a hidden folder), and copy `python.exe` to `python3.exe`.
39+
Install [Python 3.10](https://www.python.org/downloads/windows/). Select "Add python.exe to PATH" at the start of the installation. The installation may only add the `python` command, but not the `python3` command. Navigate to the installation path, `%USERPROFILE%\AppData\Local\Programs\Python\Python310` (note `AppData` is a hidden folder), and copy `python.exe` to `python3.exe`.
3840

3941
### CUDA
40-
Install the [CUDA 12.2 Toolkit](https://developer.nvidia.com/cuda-downloads?target_os=Windows&target_arch=x86_64). You may use the Express Installation option. Installation may require a restart.
42+
Install the [CUDA 12.2 Toolkit](https://developer.nvidia.com/cuda-12-2-2-download-archive?target_os=Windows&target_arch=x86_64). You may use the Express Installation option. Installation may require a restart.
4143

4244
### Microsoft MPI
4345
Download and install [Microsoft MPI](https://www.microsoft.com/en-us/download/details.aspx?id=57467). You will be prompted to choose between an `exe`, which installs the MPI executable, and an `msi`, which installs the MPI SDK. Download and install both.
4446

4547
### TensorRT-LLM Repo
46-
It may be useful to create a single folder for holding TensorRT-LLM and its dependencies, such as `C:\Users\<username>\inference\`. We will assume this directory structure in further steps.
48+
It may be useful to create a single folder for holding TensorRT-LLM and its dependencies, such as `%USERPROFILE%\inference\`. We will assume this directory structure in further steps.
4749

4850
Install [Git for Windows](https://git-scm.com/download/win).
4951

@@ -56,24 +58,23 @@ git submodule update --init --recursive
5658

5759
### cuDNN and TensorRT
5860

59-
Download and unzip [TensorRT 9.1.0.4](https://docs.nvidia.com/deeplearning/tensorrt/install-guide/index.html#installing-zip). Move the folder to a location you can reference later, such as `C:\Users\<username>\inference\TensorRT`.
61+
Download and unzip [TensorRT 9.1.0.4 for TensorRT-LLM](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/secure/9.1.0/tars/tensorrt-9.1.0.4.windows10.x86_64.cuda-12.2.llm.beta.zip). Move the folder to a location you can reference later, such as `%USERPROFILE%\inference\TensorRT`.
6062

61-
Download and unzip [cuDNN](https://developer.nvidia.com/cudnn). Move the folder to a location you can reference later, such as `C:\Users\<username>\inference\cuDNN`.
63+
Download and unzip [cuDNN](https://developer.nvidia.com/cudnn). Move the folder to a location you can reference later, such as `%USERPROFILE%\inference\cuDNN`.
6264

63-
You'll need to add libraries and binaries for TensorRT and cuDNN to your system's `Path` environment variable. To do so, click the Windows button and search for "environment variables." Select "Edit the system environment variables." A "System Properties" window will open. Select the "Environment Variables" button at the bottom right, then in the new window under "System variables" click "Path" then the "Edit" button. Add "New" lines for the `bin` and `lib` dirs of both TensorRT and cuDNN. Your `Path` should include lines like this:
65+
You'll need to add libraries and binaries for TensorRT and cuDNN to your system's `Path` environment variable. To do so, click the Windows button and search for "environment variables." Select "Edit the system environment variables." A "System Properties" window will open. Select the "Environment Variables" button at the bottom right, then in the new window under "System variables" click "Path" then the "Edit" button. Add "New" lines for the `lib` dir of TensorRT and for the `bin` and `lib` dirs of cuDNN. Your `Path` should include lines like this:
6466

6567
```
66-
C:\Users\<username>\inference\TensorRT\bin
67-
C:\Users\<username>\inference\TensorRT\lib
68-
C:\Users\<username>\inference\cuDNN\bin
69-
C:\Users\<username>\inference\cuDNN\lib
68+
%USERPROFILE%\inference\TensorRT\lib
69+
%USERPROFILE%\inference\cuDNN\bin
70+
%USERPROFILE%\inference\cuDNN\lib
7071
```
7172

7273
Click "OK" on all the open dialogue windows. Be sure to close and re-open any existing Powershell or Git Bash windows so they pick up the new `Path`.
7374

7475
Now, to install the TensorRT core libraries, run Powershell and use `pip` to install the Python wheel:
7576
```
76-
pip install C:\Users\<username>\inference\TensorRT\python\tensorrt-9.1.0.post12.dev4-cp39-none-win_amd64.whl
77+
pip install %USERPROFILE%\inference\TensorRT\python\tensorrt-9.1.0.post12.dev4-cp310-none-win_amd64.whl
7778
```
7879

7980
You may run the following command to verify that your TensorRT installation is working properly:
@@ -91,21 +92,22 @@ Install [CMake](https://cmake.org/download/) and select the option to add it to
9192

9293
Download and install [Visual Studio 2022](https://visualstudio.microsoft.com/). When prompted to select more Workloads, check "Desktop development with C++."
9394

94-
TensorRT-LLM on Windows currently depends on NVTX assets that do not come packaged with the CUDA12.2 installer. To install these assets, download the [CUDA11.8 Toolkit](https://developer.nvidia.com/cuda-11-8-0-download-archive?target_os=Windows&target_arch=x86_64). During installation, select "Advanced installation." Nsight NVTX is located in the CUDA drop down. Deselect *all* packages, and select Nsight NVTX.
95+
TensorRT-LLM on Windows currently depends on NVTX assets that do not come packaged with the CUDA12.2 installer. To install these assets, download the [CUDA11.8 Toolkit](https://developer.nvidia.com/cuda-11-8-0-download-archive?target_os=Windows&target_arch=x86_64). During installation, select "Advanced installation." Nsight NVTX is located in the CUDA drop down. Deselect all packages, and then select Nsight NVTX.
9596

9697
## Building from Source
9798

9899
*Advanced. Skip this section if you plan to use the pre-built TensorRT-LLM release wheel.*
99100

100101
In Powershell, from the `TensorRT-LLM` root folder, run:
101102
```
102-
python .\scripts\build_wheel.py -a <architecture> --trt_root <path_to_trt_root> --build_type Release -D "ENABLE_MULTI_DEVICE=0"
103+
python .\scripts\build_wheel.py -a "89-real" --trt_root <path_to_trt_root> --build_type Release -D "ENABLE_MULTI_DEVICE=0"
103104
```
104-
`<architecture>` should correspond to the architecture or list of architectures you wish to support, e.g `"86-real;89-real"` to support GeForce 30-series and 40-series cards.
105105

106106
The `-D "ENABLE_MULTI_DEVICE=0"` is required on Windows. Multi-device inference is supported on Linux, but not on Windows.
107107

108-
The above command will generate `build\tensorrt_llm-<version>-py3-none-any.whl`. Other generated files include:
108+
The `-a` flag specifies the device architecture. `"89-real"` supports GeForce 40-series cards.
109+
110+
The above command will generate `build\tensorrt_llm-0.5.0-py3-none-any.whl`. Other generated files include:
109111

110112
- `build\` - Contains the wheel and other built artifacts
111113
- `cpp\build\` - Contains cpp-related build files
@@ -114,10 +116,14 @@ The above command will generate `build\tensorrt_llm-<version>-py3-none-any.whl`.
114116

115117
## Installation
116118

117-
In Powershell, from the root of this repo, run:
119+
To download and install the wheel, in Powershell, run:
120+
```
121+
pip install tensorrt_llm --extra-index-url https://pypi.nvidia.com --extra-index-url https://download.pytorch.org/whl/nightly/cu121
122+
```
123+
124+
Alternatively, if you built the wheel from source, run:
118125
```
119-
pip install -r .\requirements-windows.txt
120-
pip install .\build\tensorrt_llm-<version>-py3-none-any.whl
126+
pip install .\build\tensorrt_llm-0.5.0-py3-none-any.whl
121127
```
122128

123129
You may run the following command to verify that your TensorRT-LLM installation is working properly:
@@ -147,17 +153,17 @@ Building from source will produce the following library files:
147153
- `th_common.exp`
148154
- `th_common.lib`
149155

150-
The locations of the DLLs, in addition to some `torch` DLLs, must be added to the Windows `Path` in order to us the TensorRT-LLM C++ runtime. As in [Setup](#setup), append the locations of these libraries to your `Path`. When complete, your `Path` should include lines similar to these:
156+
The locations of the DLLs, in addition to some `torch` DLLs, must be added to the Windows `Path` in order to us the TensorRT-LLM C++ runtime. As in [Detailed Setup](#detailed-setup), append the locations of these libraries to your `Path`. When complete, your `Path` should include lines similar to these:
151157

152158
```
153-
C:\Users\<username>\inference\TensorRT-LLM\cpp\build\tensorrt_llm\Release
154-
C:\Users\<username>\AppData\Local\Programs\Python\Python39\Lib\site-packages\tensorrt_llm\libs
155-
C:\Users\<username>\AppData\Local\Programs\Python\Python39\Lib\site-packages\torch\lib
159+
%USERPROFILE%\inference\TensorRT-LLM\cpp\build\tensorrt_llm\Release
160+
%USERPROFILE%\AppData\Local\Programs\Python\Python310\Lib\site-packages\tensorrt_llm\libs
161+
%USERPROFILE%\AppData\Local\Programs\Python\Python310\Lib\site-packages\torch\lib
156162
```
157163

158164
For examples of how to use the C++ runtime, see the unit tests in
159-
[gptSessionTest.cpp](cpp/tests/runtime/gptSessionTest.cpp) and the related
160-
[CMakeLists.txt](cpp/tests/CMakeLists.txt) file.
165+
[gptSessionTest.cpp](../cpp/tests/runtime/gptSessionTest.cpp) and the related
166+
[CMakeLists.txt](../cpp/tests/CMakeLists.txt) file.
161167

162168
## Next Steps
163169

windows/examples/llama/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ The TensorRT-LLM LLaMA example code is located in [`examples/llama`](../../../ex
99
Rather, here we showcase how to run a quick benchmark using the provided `benchmark.py` script. This script builds, runs, and benchmarks an INT4-GPTQ quantized LLaMA model using TensorRT.
1010

1111
```bash
12+
pip install pydantic pynvml
1213
python benchmark.py --model_dir .\tmp\llama\7B\ --quant_ckpt_path .\llama-7b-4bit-gs128.safetensors --engine_dir .\engines
1314
```
1415

0 commit comments

Comments
 (0)