Skip to content

Commit a94d2da

Browse files
committed
Fix memory leak in promise loop by breaking the recursion
It turns out that neither bluebird nor the native Promise implementation in v8 actually perform TCO when a promise is returned. This built up a long chain of promise references in memory while dumping millions of wikipedia articles. The dump would suceed, but for enwiki at least at the end the dumper would use >1G of heap. Since nextTick and its siblings setInterval and setTimeout are all pretty expensive, we only break the chain occasionally. In local tests this construct runs 70000000 iterations in constant memory, while the same loop would exhaust the default 2G heap limit.
1 parent d0d074a commit a94d2da

File tree

1 file changed

+9
-1
lines changed

1 file changed

+9
-1
lines changed

htmlspider.js

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -146,9 +146,17 @@ function makeDump (apiURL, prefix, ns, host) {
146146
var dumpStream = new PromiseStream(dumper.next.bind(dumper),
147147
undefined, 1, maxConcurrency);
148148

149+
var i = 0;
149150
function loop () {
150151
return dumpStream.next()
151-
.then(loop)
152+
.then(function () {
153+
if (i++ === 10000) {
154+
i = 0;
155+
process.nextTick(loop);
156+
} else {
157+
return loop();
158+
}
159+
})
152160
.catch(function(e) {
153161
console.log(e);
154162
});

0 commit comments

Comments
 (0)