Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The proposal of using Multi Threads #4355

Open
SukkaW opened this issue Jun 11, 2020 · 15 comments
Open

The proposal of using Multi Threads #4355

SukkaW opened this issue Jun 11, 2020 · 15 comments

Comments

@SukkaW
Copy link
Member

SukkaW commented Jun 11, 2020

Since #550, the original creator of Hexo, @tommy351 want to speed up Hexo with multi core rendering. However, the #550 is never continued due to the difficulties of managing multiple Hexo instance.

Recently I have brought up Node.js worker_threads for a project (OI-wiki/OI-wiki#2288) and learned something about worker_threads. With Node.js add support for worker_threads, it is now possible to bring up multi core rendering for Hexo again.

Limit

Workers Thread is designed to run CPU intensive tasks with simple algorism:

Independent Input => Workers Calculating => Independent Output

Thus we cannot run many difficult functions inside workers.

Design

As creating workers and destroy workers is still expensive (worker_threads are required to contact with main_thread), we should only create limited number of worker_threads (In OI-wiki/OI-wiki#2288 I use the length of CPU Threads). Thus, a WorkerPool util should be made.

The WorkPool is designed to queue the task, manage task and make sure next task would run in an idle worker, thus it should have those method:

  • init(): Init a worker pool with the queue (the queue could be an array). This will be called in constructor.
  • run(input): add a task to the queue, with input passed to the workers. A Promise will be returned (the result could be retrieved by const output = await workerPool.run(input)).
  • destroy(): after all tasks is finished, destroy all the worker_threads created.

And here is an example about how to use WorkPool:

// index.js
const { join } = require('path');
const { WorkerPool } = require('hexo-util');

const workerPath = join(__dirname + '/some_worker.js');
const cpuNums = require('os').cpus().length;

const pool = new WorkerPool(workerPath, cpuNums);

const tasksList = /* some stuff goes here ... */
const result = {};

Promise.all(tasksList.map(async task => {
  const output = await pool.run(task);

  // do something with output, maybe writeFile or push to a resultArray.
  result[taskId] = output;
}).then(() => {
  pool.destroy();

  // do something with result object.
});
// some_worker.js
const { isMainThread, parentPort } = require('worker_threads');

if (isMainThread) {
  throw new Error('It is not a worker, it seems like a Main Thread');
}

async function job(input) {
  // some stuff...
  return output;
}

parentPort.on('message', async input => {
  const output = await job(input);
  parentPort.postMessage(output);
});

As you can see, the example I given is suitable for some of filters (likes meta_generator, backtick_code_filter) that we pass input to the filter and get output from it. But for more complicated job (like post rendering & template rendering) workers_thread still can't help.

cc @hexojs/core @tommy351

@SukkaW
Copy link
Member Author

SukkaW commented Jun 12, 2020

cc @hexojs/core @curbengh @stevenjoezhang @jiangtj @segayuu @yoshinorin @JLHwung

Should we update minimum required Node.js version to 12? Although Hexo 5.0.0 might not require such a high Node.js version, but we could bring up more features during Hexo 5.x development.

@curbengh
Copy link
Contributor

Should we update minimum required Node.js version to 12?
As you can see, the example I given is suitable for some of filters

I'm ok with bumping to Node 12, as long as only filters are affected to minimize the delay 5.0.0. Perhaps only change 1-2 filters for now, then other filters can be updated during 5.x.

@SukkaW
Copy link
Member Author

SukkaW commented Jun 20, 2020

@curbengh We could even release 5.0.0 first, then add multi core support from 5.1.0.

@curbengh
Copy link
Contributor

We could even release 5.0.0 first

It would better to have at least one filter that utilize this API to justify bumping to Node 12 (and demonstrate the benefit of that bump) in 5.0.0.

@SukkaW
Copy link
Member Author

SukkaW commented Jun 20, 2020

@curbengh

We could start with backtick_code filter.

Take a look at the flamegraph: https://29e28e2d8f6f8fdb247ad2c47788857d003fd894-12-hexo.surge.sh/flamegraph.html

It seems to be a long task.

@tuananh
Copy link

tuananh commented Jun 25, 2020

This is nice. I have a very good experience with piscina. it's a nice wrapper (and more) around worker_threads.

https://github.com/piscinajs/piscina

@SukkaW
Copy link
Member Author

SukkaW commented Jun 25, 2020

@tuananh LGTM! It seems definitely better than my WorkerPool: hexojs/hexo-util#212

@tuananh
Copy link

tuananh commented Jun 25, 2020

I gave it a try to optimize backtick_code but got DataCloneError error.

haven't gotten around fixing it yet. Not sure if it has anything to do with the way hexo calls all the filter

return Promise.each(filters, filter => Reflect.apply(Promise.method(filter), ctx, args).then(result => {
      args[0] = result == null ? args[0] : result;
      return args[0];
    })).then(() => args[0]);

@SukkaW
Copy link
Member Author

SukkaW commented Jun 27, 2020

haven't gotten around fixing it yet. Not sure if it has anything to do with the way hexo calls all the filter

@tuananh The entire hexo context just can not be passed to a worker. Only simple objects (like string, number, plain object) can be passed to a worker.

@SukkaW
Copy link
Member Author

SukkaW commented Jun 27, 2020

Here's what we can learn #4368

According to the documents of the worker_threads:

value will be transferred in a way which is compatible with the HTML structured clone algorithm.

Which means:

Function objects cannot be duplicated by the structured clone algorithm; attempting to throws a DATA_CLONE_ERR exception.

structured clone algorithm also means contacting with threads is expensive, just like creating & destroying one.
We should keep the input and output pure and simple (only contains required information) to make structured clone faster.

@tuananh
Copy link

tuananh commented Jun 28, 2020

@SukkaW that's probably it. in order to change that, we need to change the way we pass hexo instance around?

@SukkaW SukkaW changed the title The proposal of using Worker Threads The proposal of using Multi Threads Aug 2, 2020
@SukkaW
Copy link
Member Author

SukkaW commented Aug 2, 2020

Instead of worker_threads, I am considering using cluster API instead.

cluster API is much simpler, and is stable since Node.js 4.0. It has no "structured clone algorithm" things as well.

The only problem is cluster is designed to handle multi http requests. We have to find a way to adopt it to Hexo.

@curbengh @tuananh

@stevenjoezhang
Copy link
Member

From the perspective of 2024, the support for multithreading in Node.js has not improved. The rendering process of posts heavily relies on Hexo's ctx, but without the ability to use shared memory, worker threads cannot directly access the global variables in Hexo.

@SukkaW
Copy link
Member Author

SukkaW commented Apr 14, 2024

From the perspective of 2024, the support for multithreading in Node.js has not improved. The rendering process of posts heavily relies on Hexo's ctx, but without the ability to use shared memory, worker threads cannot directly access the global variables in Hexo.

So this basically leaves us with 2 options:

  • Creating multiple Hexo instances in different worker threads. In every thread, we will read the config and posts.
  • Offloading limited heavy tasks to the worker threads (markdown rendering? nunjucks rendering?) while retaining one main Hexo instance.

@tuananh
Copy link

tuananh commented Apr 16, 2024

From the perspective of 2024, the support for multithreading in Node.js has not improved. The rendering process of posts heavily relies on Hexo's ctx, but without the ability to use shared memory, worker threads cannot directly access the global variables in Hexo.

So this basically leaves us with 2 options:

* Creating multiple Hexo instances in different worker threads. In every thread, we will read the config and posts.

* Offloading limited heavy tasks to the worker threads (markdown rendering? nunjucks rendering?) while retaining one main Hexo instance.

option 2 sounds better to me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants
@tuananh @stevenjoezhang @SukkaW @curbengh and others