Skip to content

Commit 4dfbf98

Browse files
lemirejkeiserDaniel Lemire
authored
Using a worker instead of a thread per batch (simdjson#920)
In the parse_many function, we have one thread doing the stage 1, while the main thread does stage 2. So if stage 1 and stage 2 take half the time, the parse_many could run at twice the speed. It is unlikely to do so. Still, we see benefits of about 40% due to threading. To achieve this interleaving, we load the data in batches (blocks) of some size. In the current code (master), we create a new thread for each batch. Thread creation is expensive so our approach only works over sizeable batches. This PR improves things and makes parse_many faster when using small batches. This fixes our parse_stream benchmark which is just busted. This replaces the one-thread per batch routine by a worker object that reuses the same thread. In benchmarks, this allows us to get the same maximal speed, but with smaller processing blocks. It does not help much with larger blocks because the cost of the thread create gets amortized efficiently. This PR makes parse_many beneficial over small datasets. It also makes us less dependent on the thread creation time. Unfortunately, it is going to be difficult to say anything definitive in general. The cost of creating a thread varies widely depending on the OS. On some systems, it might be cheap, in others very expensive. It should be expected that the new code will depend less drastically on the performances of the underlying system, since we create juste one thread. Co-authored-by: John Keiser <john@johnkeiser.com> Co-authored-by: Daniel Lemire <lemire@gmai.com>
1 parent 1febf2e commit 4dfbf98

File tree

8 files changed

+336
-146
lines changed

8 files changed

+336
-146
lines changed

benchmark/parse_stream.cpp

100755100644
Lines changed: 159 additions & 125 deletions
Original file line numberDiff line numberDiff line change
@@ -1,147 +1,181 @@
1-
#include <iostream>
21
#include <algorithm>
32
#include <chrono>
4-
#include <vector>
3+
#include <iostream>
54
#include <map>
5+
#include <vector>
66

77
#include "simdjson.h"
88

9-
#define NB_ITERATION 5
10-
#define MIN_BATCH_SIZE 200000
9+
#define NB_ITERATION 20
10+
#define MIN_BATCH_SIZE 10000
1111
#define MAX_BATCH_SIZE 10000000
1212

1313
bool test_baseline = false;
1414
bool test_per_batch = true;
15-
bool test_best_batch = true;
15+
bool test_best_batch = false;
1616

17-
bool compare(std::pair<size_t, double> i, std::pair<size_t, double> j){
18-
return i.second > j.second;
17+
bool compare(std::pair<size_t, double> i, std::pair<size_t, double> j) {
18+
return i.second > j.second;
1919
}
2020

21-
int main (int argc, char *argv[]){
22-
23-
if (argc <= 1) {
24-
std::cerr << "Usage: " << argv[0] << " <jsonfile>" << std::endl;
25-
exit(1);
26-
}
27-
const char *filename = argv[1];
28-
auto [p, err] = simdjson::padded_string::load(filename);
29-
if (err) {
30-
std::cerr << "Could not load the file " << filename << std::endl;
21+
int main(int argc, char *argv[]) {
22+
23+
if (argc <= 1) {
24+
std::cerr << "Usage: " << argv[0] << " <jsonfile>" << std::endl;
25+
exit(1);
26+
}
27+
const char *filename = argv[1];
28+
auto[p, err] = simdjson::padded_string::load(filename);
29+
if (err) {
30+
std::cerr << "Could not load the file " << filename << std::endl;
31+
return EXIT_FAILURE;
32+
}
33+
if (test_baseline) {
34+
std::wclog << "Baseline: Getline + normal parse... " << std::endl;
35+
std::cout << "Gigabytes/second\t"
36+
<< "Nb of documents parsed" << std::endl;
37+
for (auto i = 0; i < 3; i++) {
38+
// Actual test
39+
simdjson::dom::parser parser;
40+
simdjson::error_code alloc_error = parser.allocate(p.size());
41+
if (alloc_error) {
42+
std::cerr << alloc_error << std::endl;
3143
return EXIT_FAILURE;
44+
}
45+
std::istringstream ss(std::string(p.data(), p.size()));
46+
47+
auto start = std::chrono::steady_clock::now();
48+
int count = 0;
49+
std::string line;
50+
int parse_res = simdjson::SUCCESS;
51+
while (getline(ss, line)) {
52+
// TODO we're likely triggering simdjson's padding reallocation here. Is
53+
// that intentional?
54+
parser.parse(line);
55+
count++;
56+
}
57+
58+
auto end = std::chrono::steady_clock::now();
59+
60+
std::chrono::duration<double> secs = end - start;
61+
double speedinGBs = static_cast<double>(p.size()) /
62+
(static_cast<double>(secs.count()) * 1000000000.0);
63+
std::cout << speedinGBs << "\t\t\t\t" << count << std::endl;
64+
65+
if (parse_res != simdjson::SUCCESS) {
66+
std::cerr << "Parsing failed" << std::endl;
67+
exit(1);
68+
}
3269
}
33-
if (test_baseline) {
34-
std::wclog << "Baseline: Getline + normal parse... " << std::endl;
35-
std::cout << "Gigabytes/second\t" << "Nb of documents parsed" << std::endl;
36-
for (auto i = 0; i < 3; i++) {
37-
//Actual test
38-
simdjson::dom::parser parser;
39-
simdjson::error_code alloc_error = parser.allocate(p.size());
40-
if (alloc_error) {
41-
std::cerr << alloc_error << std::endl;
42-
return EXIT_FAILURE;
43-
}
44-
std::istringstream ss(std::string(p.data(), p.size()));
45-
46-
auto start = std::chrono::steady_clock::now();
47-
int count = 0;
48-
std::string line;
49-
int parse_res = simdjson::SUCCESS;
50-
while (getline(ss, line)) {
51-
// TODO we're likely triggering simdjson's padding reallocation here. Is that intentional?
52-
parser.parse(line);
53-
count++;
54-
}
55-
56-
auto end = std::chrono::steady_clock::now();
57-
58-
std::chrono::duration<double> secs = end - start;
59-
double speedinGBs = static_cast<double>(p.size()) / (static_cast<double>(secs.count()) * 1000000000.0);
60-
std::cout << speedinGBs << "\t\t\t\t" << count << std::endl;
61-
62-
if (parse_res != simdjson::SUCCESS) {
63-
std::cerr << "Parsing failed" << std::endl;
64-
exit(1);
65-
}
66-
}
67-
}
68-
69-
std::map<size_t, double> batch_size_res;
70-
if(test_per_batch) {
71-
std::wclog << "parse_many: Speed per batch_size... from " << MIN_BATCH_SIZE
72-
<< " bytes to " << MAX_BATCH_SIZE << " bytes..." << std::endl;
73-
std::cout << "Batch Size\t" << "Gigabytes/second\t" << "Nb of documents parsed" << std::endl;
74-
for (size_t i = MIN_BATCH_SIZE; i <= MAX_BATCH_SIZE; i += (MAX_BATCH_SIZE - MIN_BATCH_SIZE) / 50) {
75-
batch_size_res.insert(std::pair<size_t, double>(i, 0));
76-
int count;
77-
for (size_t j = 0; j < 5; j++) {
78-
//Actual test
79-
simdjson::dom::parser parser;
80-
simdjson::error_code error;
81-
82-
auto start = std::chrono::steady_clock::now();
83-
count = 0;
84-
for (auto result : parser.parse_many(p, 4000000)) {
85-
error = result.error();
86-
count++;
87-
}
88-
auto end = std::chrono::steady_clock::now();
89-
90-
std::chrono::duration<double> secs = end - start;
91-
double speedinGBs = static_cast<double>(p.size()) / (static_cast<double>(secs.count()) * 1000000000.0);
92-
if (speedinGBs > batch_size_res.at(i))
93-
batch_size_res[i] = speedinGBs;
94-
95-
if (error != simdjson::SUCCESS) {
96-
std::wcerr << "Parsing failed with: " << error << std::endl;
97-
exit(1);
98-
}
99-
}
100-
std::cout << i << "\t\t" << std::fixed << std::setprecision(3) << batch_size_res.at(i) << "\t\t\t\t" << count << std::endl;
101-
70+
}
71+
72+
std::map<size_t, double> batch_size_res;
73+
if (test_per_batch) {
74+
std::wclog << "parse_many: Speed per batch_size... from " << MIN_BATCH_SIZE
75+
<< " bytes to " << MAX_BATCH_SIZE << " bytes..." << std::endl;
76+
std::cout << "Batch Size\t"
77+
<< "Gigabytes/second\t"
78+
<< "Nb of documents parsed" << std::endl;
79+
for (size_t i = MIN_BATCH_SIZE; i <= MAX_BATCH_SIZE;
80+
i += (MAX_BATCH_SIZE - MIN_BATCH_SIZE) / 100) {
81+
batch_size_res.insert(std::pair<size_t, double>(i, 0));
82+
int count;
83+
for (size_t j = 0; j < 5; j++) {
84+
// Actual test
85+
simdjson::dom::parser parser;
86+
simdjson::error_code error;
87+
88+
auto start = std::chrono::steady_clock::now();
89+
count = 0;
90+
for (auto result : parser.parse_many(p, i)) {
91+
error = result.error();
92+
if (error != simdjson::SUCCESS) {
93+
std::wcerr << "Parsing failed with: " << error_message(error) << std::endl;
94+
exit(1);
95+
}
96+
count++;
10297
}
98+
auto end = std::chrono::steady_clock::now();
99+
100+
std::chrono::duration<double> secs = end - start;
101+
double speedinGBs = static_cast<double>(p.size()) /
102+
(static_cast<double>(secs.count()) * 1000000000.0);
103+
if (speedinGBs > batch_size_res.at(i))
104+
batch_size_res[i] = speedinGBs;
105+
}
106+
std::cout << i << "\t\t" << std::fixed << std::setprecision(3)
107+
<< batch_size_res.at(i) << "\t\t\t\t" << count << std::endl;
103108
}
104-
105-
if (test_best_batch) {
106-
size_t optimal_batch_size;
107-
if (test_per_batch) {
108-
optimal_batch_size = (*min_element(batch_size_res.begin(), batch_size_res.end(), compare)).first;
109-
} else {
110-
optimal_batch_size = MIN_BATCH_SIZE;
109+
}
110+
size_t optimal_batch_size{};
111+
double best_speed{};
112+
if (test_per_batch) {
113+
std::pair<size_t, double> best_results;
114+
best_results =
115+
(*min_element(batch_size_res.begin(), batch_size_res.end(), compare));
116+
optimal_batch_size = best_results.first;
117+
best_speed = best_results.second;
118+
} else {
119+
optimal_batch_size = MIN_BATCH_SIZE;
120+
}
121+
std::wclog << "Seemingly optimal batch_size: " << optimal_batch_size << "..."
122+
<< std::endl;
123+
std::wclog << "Best speed: " << best_speed << "..." << std::endl;
124+
125+
if (test_best_batch) {
126+
std::wclog << "Starting speed test... Best of " << NB_ITERATION
127+
<< " iterations..." << std::endl;
128+
std::vector<double> res;
129+
for (int i = 0; i < NB_ITERATION; i++) {
130+
131+
// Actual test
132+
simdjson::dom::parser parser;
133+
simdjson::error_code error;
134+
135+
auto start = std::chrono::steady_clock::now();
136+
// This includes allocation of the parser
137+
for (auto result : parser.parse_many(p, optimal_batch_size)) {
138+
error = result.error();
139+
if (error != simdjson::SUCCESS) {
140+
std::wcerr << "Parsing failed with: " << error_message(error) << std::endl;
141+
exit(1);
111142
}
112-
std::wclog << "Starting speed test... Best of " << NB_ITERATION << " iterations..." << std::endl;
113-
std::wclog << "Seemingly optimal batch_size: " << optimal_batch_size << "..." << std::endl;
114-
std::vector<double> res;
115-
for (int i = 0; i < NB_ITERATION; i++) {
116-
117-
// Actual test
118-
simdjson::dom::parser parser;
119-
simdjson::error_code error;
120-
121-
auto start = std::chrono::steady_clock::now();
122-
// TODO this includes allocation of the parser; is that intentional?
123-
for (auto result : parser.parse_many(p, 4000000)) {
124-
error = result.error();
125-
}
126-
auto end = std::chrono::steady_clock::now();
127-
128-
std::chrono::duration<double> secs = end - start;
129-
res.push_back(secs.count());
130-
131-
if (error != simdjson::SUCCESS) {
132-
std::wcerr << "Parsing failed with: " << error << std::endl;
133-
exit(1);
134-
}
135-
136-
}
137-
138-
double min_result = *min_element(res.begin(), res.end());
139-
double speedinGBs = static_cast<double>(p.size()) / (min_result * 1000000000.0);
140-
143+
}
144+
auto end = std::chrono::steady_clock::now();
141145

142-
std::cout << "Min: " << min_result << " bytes read: " << p.size()
143-
<< " Gigabytes/second: " << speedinGBs << std::endl;
146+
std::chrono::duration<double> secs = end - start;
147+
res.push_back(secs.count());
144148
}
145149

146-
return 0;
150+
double min_result = *min_element(res.begin(), res.end());
151+
double speedinGBs =
152+
static_cast<double>(p.size()) / (min_result * 1000000000.0);
153+
154+
std::cout << "Min: " << min_result << " bytes read: " << p.size()
155+
<< " Gigabytes/second: " << speedinGBs << std::endl;
156+
}
157+
#ifdef SIMDJSON_THREADS_ENABLED
158+
// Multithreading probably does not help matters for small files (less than 10
159+
// MB).
160+
if (p.size() < 10000000) {
161+
std::cout << std::endl;
162+
163+
std::cout << "Warning: your file is small and the performance results are "
164+
"probably meaningless"
165+
<< std::endl;
166+
std::cout << "as far as multithreaded performance goes." << std::endl;
167+
168+
std::cout << std::endl;
169+
170+
std::cout
171+
<< "Try to concatenate the file with itself to generate a large one."
172+
<< std::endl;
173+
std::cout << "In bash: " << std::endl;
174+
std::cout << "for i in {1..1000}; do cat '" << filename
175+
<< "' >> bar.ndjson; done" << std::endl;
176+
std::cout << argv[0] << " bar.ndjson" << std::endl;
177+
}
178+
#endif
179+
180+
return 0;
147181
}

doc/basics.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -452,7 +452,7 @@ The simdjson library also support multithreaded JSON streaming through a large f
452452
smaller JSON documents in either [ndjson](http://ndjson.org) or [JSON lines](http://jsonlines.org)
453453
format. If your JSON documents all contain arrays or objects, we even support direct file
454454
concatenation without whitespace. The concatenated file has no size restrictions (including larger
455-
than 4GB), though each individual document must be less than 4GB.
455+
than 4GB), though each individual document must be no larger than 4 GB.
456456
457457
Here is a simple example, given "x.json" with this content:
458458
@@ -472,6 +472,8 @@ for (dom::element doc : parser.load_many(filename)) {
472472

473473
In-memory ndjson strings can be parsed as well, with `parser.parse_many(string)`.
474474

475+
Both `load_many` and `parse_many` take an optional parameter `size_t batch_size` which defines the window processing size. It is set by default to a large value (`1000000` corresponding to 1 MB). None of your JSON documents should exceed this window size, or else you will get the error `simdjson::CAPACITY`. You cannot set this window size larger than 4 GB: you will get the error `simdjson::CAPACITY`. The smaller the window size is, the less memory the function will use. Setting the window size too small (e.g., less than 100 kB) may also impact performance negatively. Leaving it to 1 MB is expected to be a good choice, unless you have some larger documents.
476+
475477
See [parse_many.md](parse_many.md) for detailed information and design.
476478

477479
Thread Safety

include/simdjson/dom/document_stream.h

Lines changed: 54 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,11 +6,63 @@
66
#include "simdjson/error.h"
77
#ifdef SIMDJSON_THREADS_ENABLED
88
#include <thread>
9+
#include <mutex>
10+
#include <condition_variable>
911
#endif
1012

1113
namespace simdjson {
1214
namespace dom {
1315

16+
17+
#ifdef SIMDJSON_THREADS_ENABLED
18+
struct stage1_worker {
19+
stage1_worker() noexcept = default;
20+
stage1_worker(const stage1_worker&) = delete;
21+
stage1_worker(stage1_worker&&) = delete;
22+
stage1_worker operator=(const stage1_worker&) = delete;
23+
~stage1_worker();
24+
/**
25+
* We only start the thread when it is needed, not at object construction, this may throw.
26+
* You should only call this once.
27+
**/
28+
void start_thread();
29+
/**
30+
* Start a stage 1 job. You should first call 'run', then 'finish'.
31+
* You must call start_thread once before.
32+
*/
33+
void run(document_stream * ds, dom::parser * stage1, size_t next_batch_start);
34+
/** Wait for the run to finish (blocking). You should first call 'run', then 'finish'. **/
35+
void finish();
36+
37+
private:
38+
39+
/**
40+
* Normally, we would never stop the thread. But we do in the destructor.
41+
* This function is only safe assuming that you are not waiting for results. You
42+
* should have called run, then finish, and be done.
43+
**/
44+
void stop_thread();
45+
46+
std::thread thread{};
47+
/** These three variables define the work done by the thread. **/
48+
dom::parser * stage1_thread_parser{};
49+
size_t _next_batch_start{};
50+
document_stream * owner{};
51+
/**
52+
* We have two state variables. This could be streamlined to one variable in the future but
53+
* we use two for clarity.
54+
*/
55+
bool has_work{false};
56+
bool can_work{true};
57+
58+
/**
59+
* We lock using a mutex.
60+
*/
61+
std::mutex locking_mutex{};
62+
std::condition_variable cond_var{};
63+
};
64+
#endif
65+
1466
/**
1567
* A forward-only stream of documents.
1668
*
@@ -142,8 +194,8 @@ class document_stream {
142194
/** The error returned from the stage 1 thread. */
143195
error_code stage1_thread_error{UNINITIALIZED};
144196
/** The thread used to run stage 1 against the next batch in the background. */
145-
std::thread stage1_thread{};
146-
197+
friend struct stage1_worker;
198+
std::unique_ptr<stage1_worker> worker{new(std::nothrow) stage1_worker()};
147199
/**
148200
* The parser used to run stage 1 in the background. Will be swapped
149201
* with the regular parser when finished.

0 commit comments

Comments
 (0)