I'm a little confused with multithreading and asynchronous in js. What is the difference between a cluster, a stream, a child process, and a worker thread?
2 Answers
The first thing to remember about multithreading in Node.js is that in user-space, there exists no concept of threading, and as such you cannot write any code making use of threads. Any node program is always a single threaded program (in user-space).
Since a node program is a single thread, and runs as a single process, it uses only a single CPU. Most modern processors have multiple CPUs, and in order to make use of all of these CPUs and provide better throughput, you can start the same node program as a cluster.
The cluster module of node, allows you to start a node program, and the first instance launched is launched as the master instance. The master allows you to spawn new workers as separate processes (not threads) using cluster.fork() method. The actual work that is to be done by the node program is done by the workers. The example in the node docs demonstrates this perfectly.
A child process is a process that is spawned from the current process and has an established IPC channel between them to communicate with each other. The master and workers I described in cluster are an example of child processes. the child_process module in node allows you to spawn custom child processes as you require.
Streams are something that is not at all related to multi-threading or multiple processes. Streams are just a way to handle large amounts of data without loading all the data into the working memory at the same time. Ex: Consider you want to read a 10GB log file, and your server only has 4GB of memory. Trying to load the file using fs.readFile will crash your process. Instead you use fs.createReadStream and use that to process the file in smaller chunks that can be loaded into memory.
Hope this explains. For further details you really should read the node docs.
Comments
this is a little vague so I'm just gonna give an overview.
Streams are really just data streams like in any other language. Similar to iostreams in C and where you get user input, or other types of data. They're usually masked by another class so you don't know you're using a stream. You won't mess with these unless you're building a new type usually.
Child processes, worker threads, and clusters are all ways of utilizing multi-core processing in Node applications.
Worker threads are basic multithreading the Node way, with each thread having a way to communicate with the parent, and shared memory possible between each thread. You pass in a function and data, and can provide a callback for when the thread is done processing.
Clusters are more for network sharing. Often used behind a master listener port, a master app will listen for connections, then assign them in a round-robin manner to each cluster thread for use. They share the server port(s) across multiple processors to even out the load.
Child processes are a way to create a new process in a similar way to through popen. These can be asynchronous or synchronous (non-blocking or blocking the Node event loop), and can send to and receive from the parent process via stdout/stderr and stdin, respectively. The parent can register listeners to each child process for updates. You can pass a file, a function, or a module to a child process. Generally do not share memory.
I'd suggest reading the documentation yourself and coming back with any specific questions you have, you won't get much with vague questions like this, makes it seem like you didn't do your own part of the work beforehand.
Documentation: