[pytorch][torchelastic] Duplicate stdout and stderr and apply custom filter in torchrun#160712
[pytorch][torchelastic] Duplicate stdout and stderr and apply custom filter in torchrun#160712cnphil wants to merge 1 commit intopytorch:mainfrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/160712
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 280aefc with merge base 05b2e02 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
This pull request was exported from Phabricator. Differential Revision: D80188995 |
|
Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as |
ff7ebc4 to
591b883
Compare
|
@fduwjj Exported new changes from Pharbicator, PTAL :) |
…filter in torchrun (pytorch#160712) Summary: Part of an effort to extract some important error logs (e.g. [pytorch#157996](pytorch#157996)) that was `tee`'ed to `stdout` and `stderr`. The general idea is to: - Duplicate the `tee`s on `stdout` and `stderr` to a separate file, `filtered_stdout.log` and `filtered_stderr.log`, respectively. - In these files, as its name suggests, only log lines matching a customizable filter. - Later on in another PR, append the contents of these files to the reply file. Outline of changes in this PR: - Enhance `TailLog` to be able to 1) stream to a file, and 2) only write when the line matches the passed filter. - Add `filtered_stdout` and `filtered_stderr` to `LogsDest` and have `LogsSpecs` `reify` them. - In `start_processes()` and `PContext`, add params `duplicate_stdout_filters` and `duplicate_stderr_filters` to filter and write the duplicated stream to the files above. When no filters are passed in, no duplicated streams are created. Test Plan: ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/multiprocessing:api_test ``` ``` Buck UI: https://www.internalfb.com/buck2/f5c6b7da-217d-4a0b-872a-c7cd3d05587f Test UI: https://www.internalfb.com/intern/testinfra/testrun/4222124951617688 Network: Up: 398B Down: 44MiB (reSessionID-a489a961-b602-45be-b851-3490ebb7a26a) Analyzing targets. Remaining 0/200 Executing actions. Remaining 0/12856 0.1s exec time total Command: test. Finished 1 local Time elapsed: 17:37.9s Tests finished: Pass 52. Fail 0. Fatal 0. Skip 0. Build failure 0 ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/multiprocessing:tail_log_test ``` ``` Buck UI: https://www.internalfb.com/buck2/d6d5c1c1-db98-4d9c-b608-7ba6fbb5e3ee Test UI: https://www.internalfb.com/intern/testinfra/testrun/13510798985149262 Network: Up: 94KiB Down: 417MiB (reSessionID-27b46fba-d31c-4c04-8ede-a506454e6922) Analyzing targets. Remaining 0/3 536 actions, 555 artifacts declared Executing actions. Remaining 0/186 1:05.5s exec time total Command: test. Finished 7 local, 1 remote, 115 cache (93% hit) 37.0s exec time cached (56%) Time elapsed: 1:11.5s Tests finished: Pass 7. Fail 0. Fatal 0. Skip 0. Build failure 0 ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/agent/server/test:api_test ``` ``` Buck UI: https://www.internalfb.com/buck2/34f426fd-25a0-4cf5-8da3-2f3d84767d1e Test UI: https://www.internalfb.com/intern/testinfra/testrun/14918173871977118 Network: Up: 1.0MiB Down: 2.9GiB (reSessionID-048daa50-9ad4-4826-886f-08cec54c7d72) Analyzing targets. Remaining 0/5 533 actions, 552 artifacts declared Executing actions. Remaining 0/176 1:22.7s exec time total Command: test. Finished 51 local, 13 remote, 50 cache (44% hit) 19.8s exec time cached (23%) Time elapsed: 1:45.2s Tests finished: Pass 31. Fail 0. Fatal 0. Skip 0. Build failure 0 ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/agent/server/test:local_agent_test [DISABLED] ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/agent/server/test/fb:local_agent_fb_internal_test [DISABLED] ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher:api_test ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher:launch_test ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher:test_run ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher/fb:api_test ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher/fb:local_launch_mast_test ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher/fb:fb_run_test ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher/fb:launch_test ``` Reviewed By: mradmila Differential Revision: D80188995
591b883 to
1cada2c
Compare
…filter in torchrun (pytorch#160712) Summary: Part of an effort to extract some important error logs (e.g. [pytorch#157996](pytorch#157996)) that was `tee`'ed to `stdout` and `stderr`. The general idea is to: - Duplicate the `tee`s on `stdout` and `stderr` to a separate file, `filtered_stdout.log` and `filtered_stderr.log`, respectively. - In these files, as its name suggests, only log lines matching a customizable filter. - Later on in another PR, append the contents of these files to the reply file. Outline of changes in this PR: - Enhance `TailLog` to be able to 1) stream to a file, and 2) only write when the line matches the passed filter. - Add `filtered_stdout` and `filtered_stderr` to `LogsDest` and have `LogsSpecs` `reify` them. - In `start_processes()` and `PContext`, add params `duplicate_stdout_filters` and `duplicate_stderr_filters` to filter and write the duplicated stream to the files above. When no filters are passed in, no duplicated streams are created. Test Plan: ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/multiprocessing:api_test ``` ``` Buck UI: https://www.internalfb.com/buck2/f5c6b7da-217d-4a0b-872a-c7cd3d05587f Test UI: https://www.internalfb.com/intern/testinfra/testrun/4222124951617688 Network: Up: 398B Down: 44MiB (reSessionID-a489a961-b602-45be-b851-3490ebb7a26a) Analyzing targets. Remaining 0/200 Executing actions. Remaining 0/12856 0.1s exec time total Command: test. Finished 1 local Time elapsed: 17:37.9s Tests finished: Pass 52. Fail 0. Fatal 0. Skip 0. Build failure 0 ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/multiprocessing:tail_log_test ``` ``` Buck UI: https://www.internalfb.com/buck2/d6d5c1c1-db98-4d9c-b608-7ba6fbb5e3ee Test UI: https://www.internalfb.com/intern/testinfra/testrun/13510798985149262 Network: Up: 94KiB Down: 417MiB (reSessionID-27b46fba-d31c-4c04-8ede-a506454e6922) Analyzing targets. Remaining 0/3 536 actions, 555 artifacts declared Executing actions. Remaining 0/186 1:05.5s exec time total Command: test. Finished 7 local, 1 remote, 115 cache (93% hit) 37.0s exec time cached (56%) Time elapsed: 1:11.5s Tests finished: Pass 7. Fail 0. Fatal 0. Skip 0. Build failure 0 ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/agent/server/test:api_test ``` ``` Buck UI: https://www.internalfb.com/buck2/34f426fd-25a0-4cf5-8da3-2f3d84767d1e Test UI: https://www.internalfb.com/intern/testinfra/testrun/14918173871977118 Network: Up: 1.0MiB Down: 2.9GiB (reSessionID-048daa50-9ad4-4826-886f-08cec54c7d72) Analyzing targets. Remaining 0/5 533 actions, 552 artifacts declared Executing actions. Remaining 0/176 1:22.7s exec time total Command: test. Finished 51 local, 13 remote, 50 cache (44% hit) 19.8s exec time cached (23%) Time elapsed: 1:45.2s Tests finished: Pass 31. Fail 0. Fatal 0. Skip 0. Build failure 0 ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/agent/server/test:local_agent_test [DISABLED] ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/agent/server/test/fb:local_agent_fb_internal_test [DISABLED] ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher:api_test ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher:launch_test ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher:test_run ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher/fb:api_test ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher/fb:local_launch_mast_test ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher/fb:fb_run_test ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher/fb:launch_test ``` Reviewed By: mradmila Differential Revision: D80188995
4440f65 to
9d89360
Compare
…filter in torchrun (pytorch#160712) Summary: Part of an effort to extract some important error logs (e.g. [pytorch#157996](pytorch#157996)) that was `tee`'ed to `stdout` and `stderr`. The general idea is to: - Duplicate the `tee`s on `stdout` and `stderr` to a separate file, `filtered_stdout.log` and `filtered_stderr.log`, respectively. - In these files, as its name suggests, only log lines matching a customizable filter. - Later on in another PR, append the contents of these files to the reply file. Outline of changes in this PR: - Enhance `TailLog` to be able to 1) stream to a file, and 2) only write when the line matches the passed filter. - Add `filtered_stdout` and `filtered_stderr` to `LogsDest` and have `LogsSpecs` `reify` them. - In `start_processes()` and `PContext`, add params `duplicate_stdout_filters` and `duplicate_stderr_filters` to filter and write the duplicated stream to the files above. When no filters are passed in, no duplicated streams are created. Test Plan: ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/multiprocessing:api_test ``` ``` Buck UI: https://www.internalfb.com/buck2/f5c6b7da-217d-4a0b-872a-c7cd3d05587f Test UI: https://www.internalfb.com/intern/testinfra/testrun/4222124951617688 Network: Up: 398B Down: 44MiB (reSessionID-a489a961-b602-45be-b851-3490ebb7a26a) Analyzing targets. Remaining 0/200 Executing actions. Remaining 0/12856 0.1s exec time total Command: test. Finished 1 local Time elapsed: 17:37.9s Tests finished: Pass 52. Fail 0. Fatal 0. Skip 0. Build failure 0 ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/multiprocessing:tail_log_test ``` ``` Buck UI: https://www.internalfb.com/buck2/d6d5c1c1-db98-4d9c-b608-7ba6fbb5e3ee Test UI: https://www.internalfb.com/intern/testinfra/testrun/13510798985149262 Network: Up: 94KiB Down: 417MiB (reSessionID-27b46fba-d31c-4c04-8ede-a506454e6922) Analyzing targets. Remaining 0/3 536 actions, 555 artifacts declared Executing actions. Remaining 0/186 1:05.5s exec time total Command: test. Finished 7 local, 1 remote, 115 cache (93% hit) 37.0s exec time cached (56%) Time elapsed: 1:11.5s Tests finished: Pass 7. Fail 0. Fatal 0. Skip 0. Build failure 0 ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/agent/server/test:api_test ``` ``` Buck UI: https://www.internalfb.com/buck2/34f426fd-25a0-4cf5-8da3-2f3d84767d1e Test UI: https://www.internalfb.com/intern/testinfra/testrun/14918173871977118 Network: Up: 1.0MiB Down: 2.9GiB (reSessionID-048daa50-9ad4-4826-886f-08cec54c7d72) Analyzing targets. Remaining 0/5 533 actions, 552 artifacts declared Executing actions. Remaining 0/176 1:22.7s exec time total Command: test. Finished 51 local, 13 remote, 50 cache (44% hit) 19.8s exec time cached (23%) Time elapsed: 1:45.2s Tests finished: Pass 31. Fail 0. Fatal 0. Skip 0. Build failure 0 ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/agent/server/test:local_agent_test [DISABLED] ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/agent/server/test/fb:local_agent_fb_internal_test [DISABLED] ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher:api_test ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher:launch_test ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher:test_run ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher/fb:api_test ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher/fb:local_launch_mast_test ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher/fb:fb_run_test ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher/fb:launch_test ``` Reviewed By: mradmila Differential Revision: D80188995
…filter in torchrun (pytorch#160712) Summary: Part of an effort to extract some important error logs (e.g. [pytorch#157996](pytorch#157996)) that was `tee`'ed to `stdout` and `stderr`. The general idea is to: - Duplicate the `tee`s on `stdout` and `stderr` to a separate file, `filtered_stdout.log` and `filtered_stderr.log`, respectively. - In these files, as its name suggests, only log lines matching a customizable filter. - Later on in another PR, append the contents of these files to the reply file. Outline of changes in this PR: - Enhance `TailLog` to be able to 1) stream to a file, and 2) only write when the line matches the passed filter. - Add `filtered_stdout` and `filtered_stderr` to `LogsDest` and have `LogsSpecs` `reify` them. - In `start_processes()` and `PContext`, add params `duplicate_stdout_filters` and `duplicate_stderr_filters` to filter and write the duplicated stream to the files above. When no filters are passed in, no duplicated streams are created. Test Plan: ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/multiprocessing:api_test ``` ``` Buck UI: https://www.internalfb.com/buck2/f5c6b7da-217d-4a0b-872a-c7cd3d05587f Test UI: https://www.internalfb.com/intern/testinfra/testrun/4222124951617688 Network: Up: 398B Down: 44MiB (reSessionID-a489a961-b602-45be-b851-3490ebb7a26a) Analyzing targets. Remaining 0/200 Executing actions. Remaining 0/12856 0.1s exec time total Command: test. Finished 1 local Time elapsed: 17:37.9s Tests finished: Pass 52. Fail 0. Fatal 0. Skip 0. Build failure 0 ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/multiprocessing:tail_log_test ``` ``` Buck UI: https://www.internalfb.com/buck2/d6d5c1c1-db98-4d9c-b608-7ba6fbb5e3ee Test UI: https://www.internalfb.com/intern/testinfra/testrun/13510798985149262 Network: Up: 94KiB Down: 417MiB (reSessionID-27b46fba-d31c-4c04-8ede-a506454e6922) Analyzing targets. Remaining 0/3 536 actions, 555 artifacts declared Executing actions. Remaining 0/186 1:05.5s exec time total Command: test. Finished 7 local, 1 remote, 115 cache (93% hit) 37.0s exec time cached (56%) Time elapsed: 1:11.5s Tests finished: Pass 7. Fail 0. Fatal 0. Skip 0. Build failure 0 ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/agent/server/test:api_test ``` ``` Buck UI: https://www.internalfb.com/buck2/34f426fd-25a0-4cf5-8da3-2f3d84767d1e Test UI: https://www.internalfb.com/intern/testinfra/testrun/14918173871977118 Network: Up: 1.0MiB Down: 2.9GiB (reSessionID-048daa50-9ad4-4826-886f-08cec54c7d72) Analyzing targets. Remaining 0/5 533 actions, 552 artifacts declared Executing actions. Remaining 0/176 1:22.7s exec time total Command: test. Finished 51 local, 13 remote, 50 cache (44% hit) 19.8s exec time cached (23%) Time elapsed: 1:45.2s Tests finished: Pass 31. Fail 0. Fatal 0. Skip 0. Build failure 0 ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/agent/server/test:local_agent_test [DISABLED] ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/agent/server/test/fb:local_agent_fb_internal_test [DISABLED] ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher:api_test ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher:launch_test ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher:test_run ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher/fb:api_test ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher/fb:local_launch_mast_test ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher/fb:fb_run_test ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher/fb:launch_test ``` Reviewed By: mradmila Differential Revision: D80188995
99841b8 to
f814794
Compare
…filter in torchrun (pytorch#160712) Summary: Part of an effort to extract some important error logs (e.g. [pytorch#157996](pytorch#157996)) that was `tee`'ed to `stdout` and `stderr`. The general idea is to: - Duplicate the `tee`s on `stdout` and `stderr` to a separate file, `filtered_stdout.log` and `filtered_stderr.log`, respectively. - In these files, as its name suggests, only log lines matching a customizable filter. - Later on in another PR, append the contents of these files to the reply file. Outline of changes in this PR: - Enhance `TailLog` to be able to 1) stream to a file, and 2) only write when the line matches the passed filter. - Add `filtered_stdout` and `filtered_stderr` to `LogsDest` and have `LogsSpecs` `reify` them. - In `start_processes()` and `PContext`, add params `duplicate_stdout_filters` and `duplicate_stderr_filters` to filter and write the duplicated stream to the files above. When no filters are passed in, no duplicated streams are created. Test Plan: ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/multiprocessing:api_test ``` ``` Buck UI: https://www.internalfb.com/buck2/f5c6b7da-217d-4a0b-872a-c7cd3d05587f Test UI: https://www.internalfb.com/intern/testinfra/testrun/4222124951617688 Network: Up: 398B Down: 44MiB (reSessionID-a489a961-b602-45be-b851-3490ebb7a26a) Analyzing targets. Remaining 0/200 Executing actions. Remaining 0/12856 0.1s exec time total Command: test. Finished 1 local Time elapsed: 17:37.9s Tests finished: Pass 52. Fail 0. Fatal 0. Skip 0. Build failure 0 ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/multiprocessing:tail_log_test ``` ``` Buck UI: https://www.internalfb.com/buck2/d6d5c1c1-db98-4d9c-b608-7ba6fbb5e3ee Test UI: https://www.internalfb.com/intern/testinfra/testrun/13510798985149262 Network: Up: 94KiB Down: 417MiB (reSessionID-27b46fba-d31c-4c04-8ede-a506454e6922) Analyzing targets. Remaining 0/3 536 actions, 555 artifacts declared Executing actions. Remaining 0/186 1:05.5s exec time total Command: test. Finished 7 local, 1 remote, 115 cache (93% hit) 37.0s exec time cached (56%) Time elapsed: 1:11.5s Tests finished: Pass 7. Fail 0. Fatal 0. Skip 0. Build failure 0 ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/agent/server/test:api_test ``` ``` Buck UI: https://www.internalfb.com/buck2/34f426fd-25a0-4cf5-8da3-2f3d84767d1e Test UI: https://www.internalfb.com/intern/testinfra/testrun/14918173871977118 Network: Up: 1.0MiB Down: 2.9GiB (reSessionID-048daa50-9ad4-4826-886f-08cec54c7d72) Analyzing targets. Remaining 0/5 533 actions, 552 artifacts declared Executing actions. Remaining 0/176 1:22.7s exec time total Command: test. Finished 51 local, 13 remote, 50 cache (44% hit) 19.8s exec time cached (23%) Time elapsed: 1:45.2s Tests finished: Pass 31. Fail 0. Fatal 0. Skip 0. Build failure 0 ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/agent/server/test:local_agent_test [DISABLED] ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/agent/server/test/fb:local_agent_fb_internal_test [DISABLED] ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher:api_test ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher:launch_test ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher:test_run ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher/fb:api_test ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher/fb:local_launch_mast_test ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher/fb:fb_run_test ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher/fb:launch_test ``` Reviewed By: mradmila Differential Revision: D80188995
…filter in torchrun (pytorch#160712) Summary: Part of an effort to extract some important error logs (e.g. [pytorch#157996](pytorch#157996)) that was `tee`'ed to `stdout` and `stderr`. The general idea is to: - Duplicate the `tee`s on `stdout` and `stderr` to a separate file, `filtered_stdout.log` and `filtered_stderr.log`, respectively. - In these files, as its name suggests, only log lines matching a customizable filter. - Later on in another PR, append the contents of these files to the reply file. Outline of changes in this PR: - Enhance `TailLog` to be able to 1) stream to a file, and 2) only write when the line matches the passed filter. - Add `filtered_stdout` and `filtered_stderr` to `LogsDest` and have `LogsSpecs` `reify` them. - In `start_processes()` and `PContext`, add params `duplicate_stdout_filters` and `duplicate_stderr_filters` to filter and write the duplicated stream to the files above. When no filters are passed in, no duplicated streams are created. Test Plan: ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/multiprocessing:api_test ``` ``` Buck UI: https://www.internalfb.com/buck2/f5c6b7da-217d-4a0b-872a-c7cd3d05587f Test UI: https://www.internalfb.com/intern/testinfra/testrun/4222124951617688 Network: Up: 398B Down: 44MiB (reSessionID-a489a961-b602-45be-b851-3490ebb7a26a) Analyzing targets. Remaining 0/200 Executing actions. Remaining 0/12856 0.1s exec time total Command: test. Finished 1 local Time elapsed: 17:37.9s Tests finished: Pass 52. Fail 0. Fatal 0. Skip 0. Build failure 0 ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/multiprocessing:tail_log_test ``` ``` Buck UI: https://www.internalfb.com/buck2/d6d5c1c1-db98-4d9c-b608-7ba6fbb5e3ee Test UI: https://www.internalfb.com/intern/testinfra/testrun/13510798985149262 Network: Up: 94KiB Down: 417MiB (reSessionID-27b46fba-d31c-4c04-8ede-a506454e6922) Analyzing targets. Remaining 0/3 536 actions, 555 artifacts declared Executing actions. Remaining 0/186 1:05.5s exec time total Command: test. Finished 7 local, 1 remote, 115 cache (93% hit) 37.0s exec time cached (56%) Time elapsed: 1:11.5s Tests finished: Pass 7. Fail 0. Fatal 0. Skip 0. Build failure 0 ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/agent/server/test:api_test ``` ``` Buck UI: https://www.internalfb.com/buck2/34f426fd-25a0-4cf5-8da3-2f3d84767d1e Test UI: https://www.internalfb.com/intern/testinfra/testrun/14918173871977118 Network: Up: 1.0MiB Down: 2.9GiB (reSessionID-048daa50-9ad4-4826-886f-08cec54c7d72) Analyzing targets. Remaining 0/5 533 actions, 552 artifacts declared Executing actions. Remaining 0/176 1:22.7s exec time total Command: test. Finished 51 local, 13 remote, 50 cache (44% hit) 19.8s exec time cached (23%) Time elapsed: 1:45.2s Tests finished: Pass 31. Fail 0. Fatal 0. Skip 0. Build failure 0 ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/agent/server/test:local_agent_test [DISABLED] ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/agent/server/test/fb:local_agent_fb_internal_test [DISABLED] ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher:api_test ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher:launch_test ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher:test_run ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher/fb:api_test ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher/fb:local_launch_mast_test ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher/fb:fb_run_test ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher/fb:launch_test ``` Reviewed By: mradmila Differential Revision: D80188995
9fa3b52 to
b997417
Compare
…filter in torchrun (pytorch#160712) Summary: Pull Request resolved: pytorch#160712 Part of an effort to extract some important error logs (e.g. [pytorch#157996](pytorch#157996)) that was `tee`'ed to `stdout` and `stderr`. The general idea is to: - Duplicate the `tee`s on `stdout` and `stderr` to a separate file, `filtered_stdout.log` and `filtered_stderr.log`, respectively. - In these files, as its name suggests, only log lines matching a customizable filter. - Later on in another PR, append the contents of these files to the reply file. Outline of changes in this PR: - Enhance `TailLog` to be able to 1) stream to a file, and 2) only write when the line matches the passed filter. - Add `filtered_stdout` and `filtered_stderr` to `LogsDest` and have `LogsSpecs` `reify` them. - In `start_processes()` and `PContext`, add params `duplicate_stdout_filters` and `duplicate_stderr_filters` to filter and write the duplicated stream to the files above. When no filters are passed in, no duplicated streams are created. Test Plan: ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/multiprocessing:api_test ``` ``` Buck UI: https://www.internalfb.com/buck2/f5c6b7da-217d-4a0b-872a-c7cd3d05587f Test UI: https://www.internalfb.com/intern/testinfra/testrun/4222124951617688 Network: Up: 398B Down: 44MiB (reSessionID-a489a961-b602-45be-b851-3490ebb7a26a) Analyzing targets. Remaining 0/200 Executing actions. Remaining 0/12856 0.1s exec time total Command: test. Finished 1 local Time elapsed: 17:37.9s Tests finished: Pass 52. Fail 0. Fatal 0. Skip 0. Build failure 0 ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/multiprocessing:tail_log_test ``` ``` Buck UI: https://www.internalfb.com/buck2/d6d5c1c1-db98-4d9c-b608-7ba6fbb5e3ee Test UI: https://www.internalfb.com/intern/testinfra/testrun/13510798985149262 Network: Up: 94KiB Down: 417MiB (reSessionID-27b46fba-d31c-4c04-8ede-a506454e6922) Analyzing targets. Remaining 0/3 536 actions, 555 artifacts declared Executing actions. Remaining 0/186 1:05.5s exec time total Command: test. Finished 7 local, 1 remote, 115 cache (93% hit) 37.0s exec time cached (56%) Time elapsed: 1:11.5s Tests finished: Pass 7. Fail 0. Fatal 0. Skip 0. Build failure 0 ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/agent/server/test:api_test ``` ``` Buck UI: https://www.internalfb.com/buck2/34f426fd-25a0-4cf5-8da3-2f3d84767d1e Test UI: https://www.internalfb.com/intern/testinfra/testrun/14918173871977118 Network: Up: 1.0MiB Down: 2.9GiB (reSessionID-048daa50-9ad4-4826-886f-08cec54c7d72) Analyzing targets. Remaining 0/5 533 actions, 552 artifacts declared Executing actions. Remaining 0/176 1:22.7s exec time total Command: test. Finished 51 local, 13 remote, 50 cache (44% hit) 19.8s exec time cached (23%) Time elapsed: 1:45.2s Tests finished: Pass 31. Fail 0. Fatal 0. Skip 0. Build failure 0 ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/agent/server/test:local_agent_test [DISABLED] ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/agent/server/test/fb:local_agent_fb_internal_test [DISABLED] ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher:api_test ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher:launch_test ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher:test_run ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher/fb:api_test ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher/fb:local_launch_mast_test ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher/fb:fb_run_test ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher/fb:launch_test ``` Reviewed By: mradmila Differential Revision: D80188995
b997417 to
5cc9908
Compare
…filter in torchrun (pytorch#160712) Summary: Pull Request resolved: pytorch#160712 Part of an effort to extract some important error logs (e.g. [pytorch#157996](pytorch#157996)) that was `tee`'ed to `stdout` and `stderr`. The general idea is to: - Duplicate the `tee`s on `stdout` and `stderr` to a separate file, `filtered_stdout.log` and `filtered_stderr.log`, respectively. - In these files, as its name suggests, only log lines matching a customizable filter. - Later on in another PR, append the contents of these files to the reply file. Outline of changes in this PR: - Enhance `TailLog` to be able to 1) stream to a file, and 2) only write when the line matches the passed filter. - Add `filtered_stdout` and `filtered_stderr` to `LogsDest` and have `LogsSpecs` `reify` them. - In `start_processes()` and `PContext`, add params `duplicate_stdout_filters` and `duplicate_stderr_filters` to filter and write the duplicated stream to the files above. When no filters are passed in, no duplicated streams are created. Test Plan: ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/multiprocessing:api_test ``` ``` Buck UI: https://www.internalfb.com/buck2/f5c6b7da-217d-4a0b-872a-c7cd3d05587f Test UI: https://www.internalfb.com/intern/testinfra/testrun/4222124951617688 Network: Up: 398B Down: 44MiB (reSessionID-a489a961-b602-45be-b851-3490ebb7a26a) Analyzing targets. Remaining 0/200 Executing actions. Remaining 0/12856 0.1s exec time total Command: test. Finished 1 local Time elapsed: 17:37.9s Tests finished: Pass 52. Fail 0. Fatal 0. Skip 0. Build failure 0 ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/multiprocessing:tail_log_test ``` ``` Buck UI: https://www.internalfb.com/buck2/d6d5c1c1-db98-4d9c-b608-7ba6fbb5e3ee Test UI: https://www.internalfb.com/intern/testinfra/testrun/13510798985149262 Network: Up: 94KiB Down: 417MiB (reSessionID-27b46fba-d31c-4c04-8ede-a506454e6922) Analyzing targets. Remaining 0/3 536 actions, 555 artifacts declared Executing actions. Remaining 0/186 1:05.5s exec time total Command: test. Finished 7 local, 1 remote, 115 cache (93% hit) 37.0s exec time cached (56%) Time elapsed: 1:11.5s Tests finished: Pass 7. Fail 0. Fatal 0. Skip 0. Build failure 0 ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/agent/server/test:api_test ``` ``` Buck UI: https://www.internalfb.com/buck2/34f426fd-25a0-4cf5-8da3-2f3d84767d1e Test UI: https://www.internalfb.com/intern/testinfra/testrun/14918173871977118 Network: Up: 1.0MiB Down: 2.9GiB (reSessionID-048daa50-9ad4-4826-886f-08cec54c7d72) Analyzing targets. Remaining 0/5 533 actions, 552 artifacts declared Executing actions. Remaining 0/176 1:22.7s exec time total Command: test. Finished 51 local, 13 remote, 50 cache (44% hit) 19.8s exec time cached (23%) Time elapsed: 1:45.2s Tests finished: Pass 31. Fail 0. Fatal 0. Skip 0. Build failure 0 ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/agent/server/test:local_agent_test [DISABLED] ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/agent/server/test/fb:local_agent_fb_internal_test [DISABLED] ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher:api_test ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher:launch_test ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher:test_run ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher/fb:api_test ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher/fb:local_launch_mast_test ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher/fb:fb_run_test ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher/fb:launch_test ``` Reviewed By: mradmila Differential Revision: D80188995
5cc9908 to
1ff2dc8
Compare
…filter in torchrun (pytorch#160712) Summary: Part of an effort to extract some important error logs (e.g. [pytorch#157996](pytorch#157996)) that was `tee`'ed to `stdout` and `stderr`. The general idea is to: - Duplicate the `tee`s on `stdout` and `stderr` to a separate file, `filtered_stdout.log` and `filtered_stderr.log`, respectively. - In these files, as its name suggests, only log lines matching a customizable filter. - Later on in another PR, append the contents of these files to the reply file. Outline of changes in this PR: - Enhance `TailLog` to be able to 1) stream to a file, and 2) only write when the line matches the passed filter. - Add `filtered_stdout` and `filtered_stderr` to `LogsDest` and have `LogsSpecs` `reify` them. - In `start_processes()` and `PContext`, add params `duplicate_stdout_filters` and `duplicate_stderr_filters` to filter and write the duplicated stream to the files above. When no filters are passed in, no duplicated streams are created. Test Plan: ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/multiprocessing:api_test ``` ``` Buck UI: https://www.internalfb.com/buck2/f5c6b7da-217d-4a0b-872a-c7cd3d05587f Test UI: https://www.internalfb.com/intern/testinfra/testrun/4222124951617688 Network: Up: 398B Down: 44MiB (reSessionID-a489a961-b602-45be-b851-3490ebb7a26a) Analyzing targets. Remaining 0/200 Executing actions. Remaining 0/12856 0.1s exec time total Command: test. Finished 1 local Time elapsed: 17:37.9s Tests finished: Pass 52. Fail 0. Fatal 0. Skip 0. Build failure 0 ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/multiprocessing:tail_log_test ``` ``` Buck UI: https://www.internalfb.com/buck2/d6d5c1c1-db98-4d9c-b608-7ba6fbb5e3ee Test UI: https://www.internalfb.com/intern/testinfra/testrun/13510798985149262 Network: Up: 94KiB Down: 417MiB (reSessionID-27b46fba-d31c-4c04-8ede-a506454e6922) Analyzing targets. Remaining 0/3 536 actions, 555 artifacts declared Executing actions. Remaining 0/186 1:05.5s exec time total Command: test. Finished 7 local, 1 remote, 115 cache (93% hit) 37.0s exec time cached (56%) Time elapsed: 1:11.5s Tests finished: Pass 7. Fail 0. Fatal 0. Skip 0. Build failure 0 ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/agent/server/test:api_test ``` ``` Buck UI: https://www.internalfb.com/buck2/34f426fd-25a0-4cf5-8da3-2f3d84767d1e Test UI: https://www.internalfb.com/intern/testinfra/testrun/14918173871977118 Network: Up: 1.0MiB Down: 2.9GiB (reSessionID-048daa50-9ad4-4826-886f-08cec54c7d72) Analyzing targets. Remaining 0/5 533 actions, 552 artifacts declared Executing actions. Remaining 0/176 1:22.7s exec time total Command: test. Finished 51 local, 13 remote, 50 cache (44% hit) 19.8s exec time cached (23%) Time elapsed: 1:45.2s Tests finished: Pass 31. Fail 0. Fatal 0. Skip 0. Build failure 0 ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/agent/server/test:local_agent_test [DISABLED] ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/agent/server/test/fb:local_agent_fb_internal_test [DISABLED] ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher:api_test ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher:launch_test ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher:test_run ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher/fb:api_test ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher/fb:local_launch_mast_test ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher/fb:fb_run_test ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher/fb:launch_test ``` Reviewed By: mradmila Differential Revision: D80188995
…filter in torchrun (pytorch#160712) Summary: Part of an effort to extract some important error logs (e.g. [pytorch#157996](pytorch#157996)) that was `tee`'ed to `stdout` and `stderr`. The general idea is to: - Duplicate the `tee`s on `stdout` and `stderr` to a separate file, `filtered_stdout.log` and `filtered_stderr.log`, respectively. - In these files, as its name suggests, only log lines matching a customizable filter. - Later on in another PR, append the contents of these files to the reply file. Outline of changes in this PR: - Enhance `TailLog` to be able to 1) stream to a file, and 2) only write when the line matches the passed filter. - Add `filtered_stdout` and `filtered_stderr` to `LogsDest` and have `LogsSpecs` `reify` them. - In `start_processes()` and `PContext`, add params `duplicate_stdout_filters` and `duplicate_stderr_filters` to filter and write the duplicated stream to the files above. When no filters are passed in, no duplicated streams are created. Test Plan: ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/multiprocessing:api_test ``` ``` Buck UI: https://www.internalfb.com/buck2/f5c6b7da-217d-4a0b-872a-c7cd3d05587f Test UI: https://www.internalfb.com/intern/testinfra/testrun/4222124951617688 Network: Up: 398B Down: 44MiB (reSessionID-a489a961-b602-45be-b851-3490ebb7a26a) Analyzing targets. Remaining 0/200 Executing actions. Remaining 0/12856 0.1s exec time total Command: test. Finished 1 local Time elapsed: 17:37.9s Tests finished: Pass 52. Fail 0. Fatal 0. Skip 0. Build failure 0 ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/multiprocessing:tail_log_test ``` ``` Buck UI: https://www.internalfb.com/buck2/d6d5c1c1-db98-4d9c-b608-7ba6fbb5e3ee Test UI: https://www.internalfb.com/intern/testinfra/testrun/13510798985149262 Network: Up: 94KiB Down: 417MiB (reSessionID-27b46fba-d31c-4c04-8ede-a506454e6922) Analyzing targets. Remaining 0/3 536 actions, 555 artifacts declared Executing actions. Remaining 0/186 1:05.5s exec time total Command: test. Finished 7 local, 1 remote, 115 cache (93% hit) 37.0s exec time cached (56%) Time elapsed: 1:11.5s Tests finished: Pass 7. Fail 0. Fatal 0. Skip 0. Build failure 0 ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/agent/server/test:api_test ``` ``` Buck UI: https://www.internalfb.com/buck2/34f426fd-25a0-4cf5-8da3-2f3d84767d1e Test UI: https://www.internalfb.com/intern/testinfra/testrun/14918173871977118 Network: Up: 1.0MiB Down: 2.9GiB (reSessionID-048daa50-9ad4-4826-886f-08cec54c7d72) Analyzing targets. Remaining 0/5 533 actions, 552 artifacts declared Executing actions. Remaining 0/176 1:22.7s exec time total Command: test. Finished 51 local, 13 remote, 50 cache (44% hit) 19.8s exec time cached (23%) Time elapsed: 1:45.2s Tests finished: Pass 31. Fail 0. Fatal 0. Skip 0. Build failure 0 ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/agent/server/test:local_agent_test [DISABLED] ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/agent/server/test/fb:local_agent_fb_internal_test [DISABLED] ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher:api_test ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher:launch_test ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher:test_run ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher/fb:api_test ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher/fb:local_launch_mast_test ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher/fb:fb_run_test ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher/fb:launch_test ``` Reviewed By: mradmila Differential Revision: D80188995
1ff2dc8 to
07e9f9f
Compare
…filter in torchrun (pytorch#160712) Summary: Part of an effort to extract some important error logs (e.g. [pytorch#157996](pytorch#157996)) that was `tee`'ed to `stdout` and `stderr`. The general idea is to: - Duplicate the `tee`s on `stdout` and `stderr` to a separate file, `filtered_stdout.log` and `filtered_stderr.log`, respectively. - In these files, as its name suggests, only log lines matching a customizable filter. - Later on in another PR, append the contents of these files to the reply file. Outline of changes in this PR: - Enhance `TailLog` to be able to 1) stream to a file, and 2) only write when the line matches the passed filter. - Add `filtered_stdout` and `filtered_stderr` to `LogsDest` and have `LogsSpecs` `reify` them. - In `start_processes()` and `PContext`, add params `duplicate_stdout_filters` and `duplicate_stderr_filters` to filter and write the duplicated stream to the files above. When no filters are passed in, no duplicated streams are created. Test Plan: ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/multiprocessing:api_test ``` ``` Buck UI: https://www.internalfb.com/buck2/f5c6b7da-217d-4a0b-872a-c7cd3d05587f Test UI: https://www.internalfb.com/intern/testinfra/testrun/4222124951617688 Network: Up: 398B Down: 44MiB (reSessionID-a489a961-b602-45be-b851-3490ebb7a26a) Analyzing targets. Remaining 0/200 Executing actions. Remaining 0/12856 0.1s exec time total Command: test. Finished 1 local Time elapsed: 17:37.9s Tests finished: Pass 52. Fail 0. Fatal 0. Skip 0. Build failure 0 ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/multiprocessing:tail_log_test ``` ``` Buck UI: https://www.internalfb.com/buck2/d6d5c1c1-db98-4d9c-b608-7ba6fbb5e3ee Test UI: https://www.internalfb.com/intern/testinfra/testrun/13510798985149262 Network: Up: 94KiB Down: 417MiB (reSessionID-27b46fba-d31c-4c04-8ede-a506454e6922) Analyzing targets. Remaining 0/3 536 actions, 555 artifacts declared Executing actions. Remaining 0/186 1:05.5s exec time total Command: test. Finished 7 local, 1 remote, 115 cache (93% hit) 37.0s exec time cached (56%) Time elapsed: 1:11.5s Tests finished: Pass 7. Fail 0. Fatal 0. Skip 0. Build failure 0 ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/agent/server/test:api_test ``` ``` Buck UI: https://www.internalfb.com/buck2/34f426fd-25a0-4cf5-8da3-2f3d84767d1e Test UI: https://www.internalfb.com/intern/testinfra/testrun/14918173871977118 Network: Up: 1.0MiB Down: 2.9GiB (reSessionID-048daa50-9ad4-4826-886f-08cec54c7d72) Analyzing targets. Remaining 0/5 533 actions, 552 artifacts declared Executing actions. Remaining 0/176 1:22.7s exec time total Command: test. Finished 51 local, 13 remote, 50 cache (44% hit) 19.8s exec time cached (23%) Time elapsed: 1:45.2s Tests finished: Pass 31. Fail 0. Fatal 0. Skip 0. Build failure 0 ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/agent/server/test:local_agent_test [DISABLED] ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/agent/server/test/fb:local_agent_fb_internal_test [DISABLED] ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher:api_test ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher:launch_test ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher:test_run ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher/fb:api_test ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher/fb:local_launch_mast_test ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher/fb:fb_run_test ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher/fb:launch_test ``` Reviewed By: mradmila Differential Revision: D80188995
07e9f9f to
57781d9
Compare
…filter in torchrun (pytorch#160712) Summary: Part of an effort to extract some important error logs (e.g. [pytorch#157996](pytorch#157996)) that was `tee`'ed to `stdout` and `stderr`. The general idea is to: - Duplicate the `tee`s on `stdout` and `stderr` to a separate file, `filtered_stdout.log` and `filtered_stderr.log`, respectively. - In these files, as its name suggests, only log lines matching a customizable filter. - Later on in another PR, append the contents of these files to the reply file. Outline of changes in this PR: - Enhance `TailLog` to be able to 1) stream to a file, and 2) only write when the line matches the passed filter. - Add `filtered_stdout` and `filtered_stderr` to `LogsDest` and have `LogsSpecs` `reify` them. - In `start_processes()` and `PContext`, add params `duplicate_stdout_filters` and `duplicate_stderr_filters` to filter and write the duplicated stream to the files above. When no filters are passed in, no duplicated streams are created. Test Plan: ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/multiprocessing:api_test ``` ``` Buck UI: https://www.internalfb.com/buck2/f5c6b7da-217d-4a0b-872a-c7cd3d05587f Test UI: https://www.internalfb.com/intern/testinfra/testrun/4222124951617688 Network: Up: 398B Down: 44MiB (reSessionID-a489a961-b602-45be-b851-3490ebb7a26a) Analyzing targets. Remaining 0/200 Executing actions. Remaining 0/12856 0.1s exec time total Command: test. Finished 1 local Time elapsed: 17:37.9s Tests finished: Pass 52. Fail 0. Fatal 0. Skip 0. Build failure 0 ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/multiprocessing:tail_log_test ``` ``` Buck UI: https://www.internalfb.com/buck2/d6d5c1c1-db98-4d9c-b608-7ba6fbb5e3ee Test UI: https://www.internalfb.com/intern/testinfra/testrun/13510798985149262 Network: Up: 94KiB Down: 417MiB (reSessionID-27b46fba-d31c-4c04-8ede-a506454e6922) Analyzing targets. Remaining 0/3 536 actions, 555 artifacts declared Executing actions. Remaining 0/186 1:05.5s exec time total Command: test. Finished 7 local, 1 remote, 115 cache (93% hit) 37.0s exec time cached (56%) Time elapsed: 1:11.5s Tests finished: Pass 7. Fail 0. Fatal 0. Skip 0. Build failure 0 ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/agent/server/test:api_test ``` ``` Buck UI: https://www.internalfb.com/buck2/34f426fd-25a0-4cf5-8da3-2f3d84767d1e Test UI: https://www.internalfb.com/intern/testinfra/testrun/14918173871977118 Network: Up: 1.0MiB Down: 2.9GiB (reSessionID-048daa50-9ad4-4826-886f-08cec54c7d72) Analyzing targets. Remaining 0/5 533 actions, 552 artifacts declared Executing actions. Remaining 0/176 1:22.7s exec time total Command: test. Finished 51 local, 13 remote, 50 cache (44% hit) 19.8s exec time cached (23%) Time elapsed: 1:45.2s Tests finished: Pass 31. Fail 0. Fatal 0. Skip 0. Build failure 0 ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/agent/server/test:local_agent_test [DISABLED] ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/agent/server/test/fb:local_agent_fb_internal_test [DISABLED] ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher:api_test ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher:launch_test ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher:test_run ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher/fb:api_test ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher/fb:local_launch_mast_test ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher/fb:fb_run_test ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher/fb:launch_test ``` Reviewed By: mradmila Differential Revision: D80188995
57781d9 to
5a6f7c7
Compare
5a6f7c7 to
ee434e6
Compare
…filter in torchrun (pytorch#160712) Summary: Part of an effort to extract some important error logs (e.g. [pytorch#157996](pytorch#157996)) that was `tee`'ed to `stdout` and `stderr`. The general idea is to: - Duplicate the `tee`s on `stdout` and `stderr` to a separate file, `filtered_stdout.log` and `filtered_stderr.log`, respectively. - In these files, as its name suggests, only log lines matching a customizable filter. - Later on in another PR, append the contents of these files to the reply file. Outline of changes in this PR: - Enhance `TailLog` to be able to 1) stream to a file, and 2) only write when the line matches the passed filter. - Add `filtered_stdout` and `filtered_stderr` to `LogsDest` and have `LogsSpecs` `reify` them. - In `start_processes()` and `PContext`, add params `duplicate_stdout_filters` and `duplicate_stderr_filters` to filter and write the duplicated stream to the files above. When no filters are passed in, no duplicated streams are created. Test Plan: ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/multiprocessing:api_test ``` ``` Buck UI: https://www.internalfb.com/buck2/f5c6b7da-217d-4a0b-872a-c7cd3d05587f Test UI: https://www.internalfb.com/intern/testinfra/testrun/4222124951617688 Network: Up: 398B Down: 44MiB (reSessionID-a489a961-b602-45be-b851-3490ebb7a26a) Analyzing targets. Remaining 0/200 Executing actions. Remaining 0/12856 0.1s exec time total Command: test. Finished 1 local Time elapsed: 17:37.9s Tests finished: Pass 52. Fail 0. Fatal 0. Skip 0. Build failure 0 ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/multiprocessing:tail_log_test ``` ``` Buck UI: https://www.internalfb.com/buck2/d6d5c1c1-db98-4d9c-b608-7ba6fbb5e3ee Test UI: https://www.internalfb.com/intern/testinfra/testrun/13510798985149262 Network: Up: 94KiB Down: 417MiB (reSessionID-27b46fba-d31c-4c04-8ede-a506454e6922) Analyzing targets. Remaining 0/3 536 actions, 555 artifacts declared Executing actions. Remaining 0/186 1:05.5s exec time total Command: test. Finished 7 local, 1 remote, 115 cache (93% hit) 37.0s exec time cached (56%) Time elapsed: 1:11.5s Tests finished: Pass 7. Fail 0. Fatal 0. Skip 0. Build failure 0 ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/agent/server/test:api_test ``` ``` Buck UI: https://www.internalfb.com/buck2/34f426fd-25a0-4cf5-8da3-2f3d84767d1e Test UI: https://www.internalfb.com/intern/testinfra/testrun/14918173871977118 Network: Up: 1.0MiB Down: 2.9GiB (reSessionID-048daa50-9ad4-4826-886f-08cec54c7d72) Analyzing targets. Remaining 0/5 533 actions, 552 artifacts declared Executing actions. Remaining 0/176 1:22.7s exec time total Command: test. Finished 51 local, 13 remote, 50 cache (44% hit) 19.8s exec time cached (23%) Time elapsed: 1:45.2s Tests finished: Pass 31. Fail 0. Fatal 0. Skip 0. Build failure 0 ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/agent/server/test:local_agent_test [DISABLED] ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/agent/server/test/fb:local_agent_fb_internal_test [DISABLED] ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher:api_test ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher:launch_test ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher:test_run ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher/fb:api_test ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher/fb:local_launch_mast_test ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher/fb:fb_run_test ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher/fb:launch_test ``` Reviewed By: mradmila Differential Revision: D80188995
…filter in torchrun (pytorch#160712) Summary: Part of an effort to extract some important error logs (e.g. [pytorch#157996](pytorch#157996)) that was `tee`'ed to `stdout` and `stderr`. The general idea is to: - Duplicate the `tee`s on `stdout` and `stderr` to a separate file, `filtered_stdout.log` and `filtered_stderr.log`, respectively. - In these files, as its name suggests, only log lines matching a customizable filter. - Later on in another PR, append the contents of these files to the reply file. Outline of changes in this PR: - Enhance `TailLog` to be able to 1) stream to a file, and 2) only write when the line matches the passed filter. - Add `filtered_stdout` and `filtered_stderr` to `LogsDest` and have `LogsSpecs` `reify` them. - In `start_processes()` and `PContext`, add params `duplicate_stdout_filters` and `duplicate_stderr_filters` to filter and write the duplicated stream to the files above. When no filters are passed in, no duplicated streams are created. Test Plan: ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/multiprocessing:api_test ``` ``` Buck UI: https://www.internalfb.com/buck2/f5c6b7da-217d-4a0b-872a-c7cd3d05587f Test UI: https://www.internalfb.com/intern/testinfra/testrun/4222124951617688 Network: Up: 398B Down: 44MiB (reSessionID-a489a961-b602-45be-b851-3490ebb7a26a) Analyzing targets. Remaining 0/200 Executing actions. Remaining 0/12856 0.1s exec time total Command: test. Finished 1 local Time elapsed: 17:37.9s Tests finished: Pass 52. Fail 0. Fatal 0. Skip 0. Build failure 0 ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/multiprocessing:tail_log_test ``` ``` Buck UI: https://www.internalfb.com/buck2/d6d5c1c1-db98-4d9c-b608-7ba6fbb5e3ee Test UI: https://www.internalfb.com/intern/testinfra/testrun/13510798985149262 Network: Up: 94KiB Down: 417MiB (reSessionID-27b46fba-d31c-4c04-8ede-a506454e6922) Analyzing targets. Remaining 0/3 536 actions, 555 artifacts declared Executing actions. Remaining 0/186 1:05.5s exec time total Command: test. Finished 7 local, 1 remote, 115 cache (93% hit) 37.0s exec time cached (56%) Time elapsed: 1:11.5s Tests finished: Pass 7. Fail 0. Fatal 0. Skip 0. Build failure 0 ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/agent/server/test:api_test ``` ``` Buck UI: https://www.internalfb.com/buck2/34f426fd-25a0-4cf5-8da3-2f3d84767d1e Test UI: https://www.internalfb.com/intern/testinfra/testrun/14918173871977118 Network: Up: 1.0MiB Down: 2.9GiB (reSessionID-048daa50-9ad4-4826-886f-08cec54c7d72) Analyzing targets. Remaining 0/5 533 actions, 552 artifacts declared Executing actions. Remaining 0/176 1:22.7s exec time total Command: test. Finished 51 local, 13 remote, 50 cache (44% hit) 19.8s exec time cached (23%) Time elapsed: 1:45.2s Tests finished: Pass 31. Fail 0. Fatal 0. Skip 0. Build failure 0 ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/agent/server/test:local_agent_test [DISABLED] ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/agent/server/test/fb:local_agent_fb_internal_test [DISABLED] ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher:api_test ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher:launch_test ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher:test_run ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher/fb:api_test ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher/fb:local_launch_mast_test ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher/fb:fb_run_test ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher/fb:launch_test ``` Reviewed By: mradmila Differential Revision: D80188995
ee434e6 to
29e037e
Compare
29e037e to
90f21f9
Compare
…filter in torchrun (pytorch#160712) Summary: Part of an effort to extract some important error logs (e.g. [pytorch#157996](pytorch#157996)) that was `tee`'ed to `stdout` and `stderr`. The general idea is to: - Duplicate the `tee`s on `stdout` and `stderr` to a separate file, `filtered_stdout.log` and `filtered_stderr.log`, respectively. - In these files, as its name suggests, only log lines matching a customizable filter. - Later on in another PR, append the contents of these files to the reply file. Outline of changes in this PR: - Enhance `TailLog` to be able to 1) stream to a file, and 2) only write when the line matches the passed filter. - Add `filtered_stdout` and `filtered_stderr` to `LogsDest` and have `LogsSpecs` `reify` them. - In `start_processes()` and `PContext`, add params `duplicate_stdout_filters` and `duplicate_stderr_filters` to filter and write the duplicated stream to the files above. When no filters are passed in, no duplicated streams are created. Test Plan: ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/multiprocessing:api_test ``` ``` Buck UI: https://www.internalfb.com/buck2/f5c6b7da-217d-4a0b-872a-c7cd3d05587f Test UI: https://www.internalfb.com/intern/testinfra/testrun/4222124951617688 Network: Up: 398B Down: 44MiB (reSessionID-a489a961-b602-45be-b851-3490ebb7a26a) Analyzing targets. Remaining 0/200 Executing actions. Remaining 0/12856 0.1s exec time total Command: test. Finished 1 local Time elapsed: 17:37.9s Tests finished: Pass 52. Fail 0. Fatal 0. Skip 0. Build failure 0 ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/multiprocessing:tail_log_test ``` ``` Buck UI: https://www.internalfb.com/buck2/d6d5c1c1-db98-4d9c-b608-7ba6fbb5e3ee Test UI: https://www.internalfb.com/intern/testinfra/testrun/13510798985149262 Network: Up: 94KiB Down: 417MiB (reSessionID-27b46fba-d31c-4c04-8ede-a506454e6922) Analyzing targets. Remaining 0/3 536 actions, 555 artifacts declared Executing actions. Remaining 0/186 1:05.5s exec time total Command: test. Finished 7 local, 1 remote, 115 cache (93% hit) 37.0s exec time cached (56%) Time elapsed: 1:11.5s Tests finished: Pass 7. Fail 0. Fatal 0. Skip 0. Build failure 0 ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/agent/server/test:api_test ``` ``` Buck UI: https://www.internalfb.com/buck2/34f426fd-25a0-4cf5-8da3-2f3d84767d1e Test UI: https://www.internalfb.com/intern/testinfra/testrun/14918173871977118 Network: Up: 1.0MiB Down: 2.9GiB (reSessionID-048daa50-9ad4-4826-886f-08cec54c7d72) Analyzing targets. Remaining 0/5 533 actions, 552 artifacts declared Executing actions. Remaining 0/176 1:22.7s exec time total Command: test. Finished 51 local, 13 remote, 50 cache (44% hit) 19.8s exec time cached (23%) Time elapsed: 1:45.2s Tests finished: Pass 31. Fail 0. Fatal 0. Skip 0. Build failure 0 ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/agent/server/test:local_agent_test [DISABLED] ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/agent/server/test/fb:local_agent_fb_internal_test [DISABLED] ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher:api_test ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher:launch_test ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher:test_run ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher/fb:api_test ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher/fb:local_launch_mast_test ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher/fb:fb_run_test ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher/fb:launch_test ``` Reviewed By: fduwjj, mradmila Differential Revision: D80188995
…filter in torchrun (pytorch#160712) Summary: Part of an effort to extract some important error logs (e.g. [pytorch#157996](pytorch#157996)) that was `tee`'ed to `stdout` and `stderr`. The general idea is to: - Duplicate the `tee`s on `stdout` and `stderr` to a separate file, `filtered_stdout.log` and `filtered_stderr.log`, respectively. - In these files, as its name suggests, only log lines matching a customizable filter. - Later on in another PR, append the contents of these files to the reply file. Outline of changes in this PR: - Enhance `TailLog` to be able to 1) stream to a file, and 2) only write when the line matches the passed filter. - Add `filtered_stdout` and `filtered_stderr` to `LogsDest` and have `LogsSpecs` `reify` them. - In `start_processes()` and `PContext`, add params `duplicate_stdout_filters` and `duplicate_stderr_filters` to filter and write the duplicated stream to the files above. When no filters are passed in, no duplicated streams are created. Test Plan: ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/multiprocessing:api_test ``` ``` Buck UI: https://www.internalfb.com/buck2/f5c6b7da-217d-4a0b-872a-c7cd3d05587f Test UI: https://www.internalfb.com/intern/testinfra/testrun/4222124951617688 Network: Up: 398B Down: 44MiB (reSessionID-a489a961-b602-45be-b851-3490ebb7a26a) Analyzing targets. Remaining 0/200 Executing actions. Remaining 0/12856 0.1s exec time total Command: test. Finished 1 local Time elapsed: 17:37.9s Tests finished: Pass 52. Fail 0. Fatal 0. Skip 0. Build failure 0 ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/multiprocessing:tail_log_test ``` ``` Buck UI: https://www.internalfb.com/buck2/d6d5c1c1-db98-4d9c-b608-7ba6fbb5e3ee Test UI: https://www.internalfb.com/intern/testinfra/testrun/13510798985149262 Network: Up: 94KiB Down: 417MiB (reSessionID-27b46fba-d31c-4c04-8ede-a506454e6922) Analyzing targets. Remaining 0/3 536 actions, 555 artifacts declared Executing actions. Remaining 0/186 1:05.5s exec time total Command: test. Finished 7 local, 1 remote, 115 cache (93% hit) 37.0s exec time cached (56%) Time elapsed: 1:11.5s Tests finished: Pass 7. Fail 0. Fatal 0. Skip 0. Build failure 0 ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/agent/server/test:api_test ``` ``` Buck UI: https://www.internalfb.com/buck2/34f426fd-25a0-4cf5-8da3-2f3d84767d1e Test UI: https://www.internalfb.com/intern/testinfra/testrun/14918173871977118 Network: Up: 1.0MiB Down: 2.9GiB (reSessionID-048daa50-9ad4-4826-886f-08cec54c7d72) Analyzing targets. Remaining 0/5 533 actions, 552 artifacts declared Executing actions. Remaining 0/176 1:22.7s exec time total Command: test. Finished 51 local, 13 remote, 50 cache (44% hit) 19.8s exec time cached (23%) Time elapsed: 1:45.2s Tests finished: Pass 31. Fail 0. Fatal 0. Skip 0. Build failure 0 ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/agent/server/test:local_agent_test [DISABLED] ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/agent/server/test/fb:local_agent_fb_internal_test [DISABLED] ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher:api_test ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher:launch_test ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher:test_run ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher/fb:api_test ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher/fb:local_launch_mast_test ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher/fb:fb_run_test ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher/fb:launch_test ``` Reviewed By: fduwjj, mradmila Differential Revision: D80188995
90f21f9 to
2c7c858
Compare
…filter in torchrun (pytorch#160712) Summary: Part of an effort to extract some important error logs (e.g. [pytorch#157996](pytorch#157996)) that was `tee`'ed to `stdout` and `stderr`. The general idea is to: - Duplicate the `tee`s on `stdout` and `stderr` to a separate file, `filtered_stdout.log` and `filtered_stderr.log`, respectively. - In these files, as its name suggests, only log lines matching a customizable filter. - Later on in another PR, append the contents of these files to the reply file. Outline of changes in this PR: - Enhance `TailLog` to be able to 1) stream to a file, and 2) only write when the line matches the passed filter. - Add `filtered_stdout` and `filtered_stderr` to `LogsDest` and have `LogsSpecs` `reify` them. - In `start_processes()` and `PContext`, add params `duplicate_stdout_filters` and `duplicate_stderr_filters` to filter and write the duplicated stream to the files above. When no filters are passed in, no duplicated streams are created. Test Plan: ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/multiprocessing:api_test ``` ``` Buck UI: https://www.internalfb.com/buck2/f5c6b7da-217d-4a0b-872a-c7cd3d05587f Test UI: https://www.internalfb.com/intern/testinfra/testrun/4222124951617688 Network: Up: 398B Down: 44MiB (reSessionID-a489a961-b602-45be-b851-3490ebb7a26a) Analyzing targets. Remaining 0/200 Executing actions. Remaining 0/12856 0.1s exec time total Command: test. Finished 1 local Time elapsed: 17:37.9s Tests finished: Pass 52. Fail 0. Fatal 0. Skip 0. Build failure 0 ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/multiprocessing:tail_log_test ``` ``` Buck UI: https://www.internalfb.com/buck2/d6d5c1c1-db98-4d9c-b608-7ba6fbb5e3ee Test UI: https://www.internalfb.com/intern/testinfra/testrun/13510798985149262 Network: Up: 94KiB Down: 417MiB (reSessionID-27b46fba-d31c-4c04-8ede-a506454e6922) Analyzing targets. Remaining 0/3 536 actions, 555 artifacts declared Executing actions. Remaining 0/186 1:05.5s exec time total Command: test. Finished 7 local, 1 remote, 115 cache (93% hit) 37.0s exec time cached (56%) Time elapsed: 1:11.5s Tests finished: Pass 7. Fail 0. Fatal 0. Skip 0. Build failure 0 ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/agent/server/test:api_test ``` ``` Buck UI: https://www.internalfb.com/buck2/34f426fd-25a0-4cf5-8da3-2f3d84767d1e Test UI: https://www.internalfb.com/intern/testinfra/testrun/14918173871977118 Network: Up: 1.0MiB Down: 2.9GiB (reSessionID-048daa50-9ad4-4826-886f-08cec54c7d72) Analyzing targets. Remaining 0/5 533 actions, 552 artifacts declared Executing actions. Remaining 0/176 1:22.7s exec time total Command: test. Finished 51 local, 13 remote, 50 cache (44% hit) 19.8s exec time cached (23%) Time elapsed: 1:45.2s Tests finished: Pass 31. Fail 0. Fatal 0. Skip 0. Build failure 0 ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/agent/server/test:local_agent_test [DISABLED] ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/agent/server/test/fb:local_agent_fb_internal_test [DISABLED] ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher:api_test ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher:launch_test ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher:test_run ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher/fb:api_test ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher/fb:local_launch_mast_test ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher/fb:fb_run_test ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/launcher/fb:launch_test ``` Reviewed By: fduwjj, mradmila Differential Revision: D80188995
2c7c858 to
280aefc
Compare
|
@pytorchbot merge |
|
@pytorchbot merge (Initiating merge automatically since Phabricator Diff has merged) |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Summary:
Part of an effort to extract some important error logs (e.g. #157996) that was
tee'ed tostdoutandstderr.The general idea is to:
tees onstdoutandstderrto a separate file,filtered_stdout.logandfiltered_stderr.log, respectively.Outline of changes in this PR:
TailLogto be able to 1) stream to a file, and 2) only write when the line matches the passed filter.filtered_stdoutandfiltered_stderrtoLogsDestand haveLogsSpecsreifythem.start_processes()andPContext, add paramsduplicate_stdout_filtersandduplicate_stderr_filtersto filter and write the duplicated stream to the files above. When no filters are passed in, no duplicated streams are created.Test Plan:
Rollback Plan:
Differential Revision: D80188995
cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta @msaroufim @dcci