Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

阅读源码理解node.js的启动, require和moudle那些事儿 #24

Open
abbshr opened this issue Aug 28, 2014 · 3 comments
Open

阅读源码理解node.js的启动, require和moudle那些事儿 #24

abbshr opened this issue Aug 28, 2014 · 3 comments

Comments

@abbshr
Copy link
Owner

abbshr commented Aug 28, 2014

Node.js启动流程探秘

涉及源码

src/node_main.cc
src/node.h (src/node.cc)
src/node.js
src/env.h

这篇日志的诞生纯属偶然,我当初只是想寻找NPM上处理底层网络的模块用来处理ARP协议,搜索了半天并没有发现合适的,最贴近的也就是raw_socket模块,但它只能用来处理IP协议上层和ICMP数据报.然后我就开始各种Google各种Baidu,未果.于是想自己扩充一下这个底层功能,便查找C/C++ addon的文档,这就一不小心"误入歧途"了,从学习addon到研究模块加载最后成了源码阅读.

也好,在这个时候从设计和编码的角度重审Node也别有一番体会.

拿来Node的源代码,熟悉源码构建编译的童鞋一眼就会发现src,lib目录.这表示Node的源码结构很清晰,以下是源码目录的主要结构:

  • deps/ Node核心功能的依赖,包括V8引擎源码,libuv源码,openssl,npm等等
  • lib/ JavaScript核心模块(*.js),如http.js,net.js
  • src/ Node架构的核心源代码以及C++核心模块/内置模块(*.cc | *.h)
  • tool/ 包含Node的项目构建工具gyp,js2c.py等,用来编译源码生成二进制文件,预处理工作等
  • node.gyp 重要的构建配置文件
  • common.gyp 同样是一个配置文件

为了了解Node工作流程,首先进入src目录,找到node_main.cc文件.整个文件的最后几行包含着令人倍感亲切的int main()主函数,进程就从这里开始了:

  // UNIX
  int main(int argc, char *argv[]) {
    return node::Start(argc, argv);
  }
  #endif

我将按照Node进程的真正流程一步步说明,因此下面代码中的嵌套有些地方并不是真实的代码结构,可以通过阅读我的注释明白情况.

接下来是src/node.cc文件,包含了主要的执行逻辑,node_main.cc中调用的Start(argc, argv)函数就是在这里面实现的:

  // 源码3581行处:Start函数,这个函数做一些初始化主程序环境变量,配置v8环境,libuv事件循环等基本工作
  int Start(int argc, char** argv) {

    // ...
    // ...

    // Hack around with the argv pointer. Used for process.title = "blah".
    argv = uv_setup_args(argc, argv);

    // This needs to run *before* V8::Initialize().  The const_cast is not
    // optional, in case you're wondering.
    int exec_argc;
    const char** exec_argv;
    // 源码 3601行:调用Init.注释里说该函数的调用要在V8::Initialize()之前.
    Init(&argc, const_cast<const char**>(argv), &exec_argc, &exec_argv);

    // 源码 3360行:声明了Init函数,它接受了初始传递的参数长度,参数指针等.这个函数就是具体的初始化函数
    void Init(int* argc,
          const char** argv,
          int* exec_argc,
          const char*** exec_argv) {

      // 这里是一些初始化libuv函数的操作.

      // Initialize prog_start_time to get relative uptime.
      prog_start_time = uv_now(uv_default_loop());

      // Make inherited handles noninheritable.
      uv_disable_stdio_inheritance();

      // init async debug messages dispatching
      // FIXME(bnoordhuis) Should be per-isolate or per-context, not global.
      uv_async_init(uv_default_loop(),
                    &dispatch_debug_messages_async,
                    DispatchDebugMessagesAsyncCallback);
      uv_unref(reinterpret_cast<uv_handle_t*>(&dispatch_debug_messages_async));

      // 还有几个初始化V8以及处理传入参数的函数
      // ...
      // ...

      // 源码3610行:Init函数执行完毕,执行V8::Initialize()函数,并进入启动的最后阶段
      V8::Initialize();
      {
        Locker locker(node_isolate);
        HandleScope handle_scope(node_isolate);
        Local<Context> context = Context::New(node_isolate);
        // 重要的变量env,代码里很多地方都要用到这个变量.
        // 通过createEnvironment函数创建了env对象
        Environment* env = CreateEnvironment(
            node_isolate, context, argc, argv, exec_argc, exec_argv);

        // 源码 3534行:声明了CreateEnvironment函数
        Environment* CreateEnvironment(Isolate* isolate,
                               Handle<Context> context,
                               int argc,
                               const char* const* argv,
                               int exec_argc,
                               const char* const* exec_argv) {
          HandleScope handle_scope(isolate);

          Context::Scope context_scope(context);
          // 其实在这里创建了env对象
          Environment* env = Environment::New(context);

          uv_check_init(env->event_loop(), env->immediate_check_handle());
          uv_unref(
              reinterpret_cast<uv_handle_t*>(env->immediate_check_handle()));
          uv_idle_init(env->event_loop(), env->immediate_idle_handle());

          // Inform V8's CPU profiler when we're idle.  The profiler is sampling-based
          // but not all samples are created equal; mark the wall clock time spent in
          // epoll_wait() and friends so profiling tools can filter it out.  The samples
          // still end up in v8.log but with state=IDLE rather than state=EXTERNAL.
          // TODO(bnoordhuis) Depends on a libuv implementation detail that we should
          // probably fortify in the API contract, namely that the last started prepare
          // or check watcher runs first.  It's not 100% foolproof; if an add-on starts
          // a prepare or check watcher after us, any samples attributed to its callback
          // will be recorded with state=IDLE.
          uv_prepare_init(env->event_loop(), env->idle_prepare_handle());
          uv_check_init(env->event_loop(), env->idle_check_handle());
          uv_unref(reinterpret_cast<uv_handle_t*>(env->idle_prepare_handle()));
          uv_unref(reinterpret_cast<uv_handle_t*>(env->idle_check_handle())); 

          if (v8_is_profiling) {
            StartProfilerIdleNotifier(env);
          }

          Local<FunctionTemplate> process_template = FunctionTemplate::New(isolate);
          // 然后在这里定义了process类
          process_template->SetClassName(FIXED_ONE_BYTE_STRING(isolate, "process"));

          // 这里着重注意.因为后面的调js主文件(src/node.js)时传入的就是这个process对象
          Local<Object> process_object = process_template->GetFunction()->NewInstance();
          // 这里也很重要!以后process对象都是通过env调用的
          env->set_process_object(process_object);
          // 紧接着这里对process对象进行细节配置
          SetupProcessObject(env, argc, argv, exec_argc, exec_argv);

          // ...

          // 源码2586行:声明了SetupProcessObject函数,你会在这个函数中发现熟悉的身影,没错想就是Node环境中的process对象的那些属性和方法
          void SetupProcessObject(Environment* env,
                        int argc,
                        const char* const* argv,
                        int exec_argc,
                        const char* const* exec_argv) {
            HandleScope scope(env->isolate());

            // 获取CreateEnvironment函数中创建的process对象
            Local<Object> process = env->process_object();

            process->SetAccessor(env->title_string(),
                                 ProcessTitleGetter,
                                 ProcessTitleSetter);
            // 后面的应该不用说,大家都能看明白

            // READONLY_PROPERTY函数设置只读属性

            // process.version
            READONLY_PROPERTY(process,
                              "version",
                              FIXED_ONE_BYTE_STRING(env->isolate(), NODE_VERSION));

            // process.moduleLoadList
            READONLY_PROPERTY(process,
                              "moduleLoadList",
                              env->module_load_list_array());

            // process.versions
            Local<Object> versions = Object::New(env->isolate());
            READONLY_PROPERTY(process, "versions", versions);

            const char http_parser_version[] = NODE_STRINGIFY(HTTP_PARSER_VERSION_MAJOR)
                                               "."
                                               NODE_STRINGIFY(HTTP_PARSER_VERSION_MINOR);
            READONLY_PROPERTY(versions,
                              "http_parser",
                              FIXED_ONE_BYTE_STRING(env->isolate(), http_parser_version));

            // +1 to get rid of the leading 'v'
            READONLY_PROPERTY(versions,
                              "node",
                              OneByteString(env->isolate(), NODE_VERSION + 1));
            READONLY_PROPERTY(versions,
                              "v8",
                              OneByteString(env->isolate(), V8::GetVersion()));
            READONLY_PROPERTY(versions,
                              "uv",
                              OneByteString(env->isolate(), uv_version_string()));
            READONLY_PROPERTY(versions,
                              "zlib",
                              FIXED_ONE_BYTE_STRING(env->isolate(), ZLIB_VERSION));

            const char node_modules_version[] = NODE_STRINGIFY(NODE_MODULE_VERSION);
            READONLY_PROPERTY(
                versions,
                "modules",
                FIXED_ONE_BYTE_STRING(env->isolate(), node_modules_version));

          #if HAVE_OPENSSL
            // Stupid code to slice out the version string.
            {  // NOLINT(whitespace/braces)
              size_t i, j, k;
              int c;
              for (i = j = 0, k = sizeof(OPENSSL_VERSION_TEXT) - 1; i < k; ++i) {
                c = OPENSSL_VERSION_TEXT[i];
                if ('0' <= c && c <= '9') {
                  for (j = i + 1; j < k; ++j) {
                    c = OPENSSL_VERSION_TEXT[j];
                    if (c == ' ')
                      break;
                  }
                  break;
                }
              }
              READONLY_PROPERTY(
                  versions,
                  "openssl",
                  OneByteString(env->isolate(), &OPENSSL_VERSION_TEXT[i], j - i));
            }
          #endif

            // process.arch
            READONLY_PROPERTY(process, "arch", OneByteString(env->isolate(), ARCH));

            // process.platform
            READONLY_PROPERTY(process,
                              "platform",
                              OneByteString(env->isolate(), PLATFORM));

            // 通过进程最开始传入的参数变量argc,argv设置process.argv

            // process.argv
            Local<Array> arguments = Array::New(env->isolate(), argc);
            for (int i = 0; i < argc; ++i) {
              arguments->Set(i, String::NewFromUtf8(env->isolate(), argv[i]));
            }
            process->Set(env->argv_string(), arguments);

            // process.execArgv
            Local<Array> exec_arguments = Array::New(env->isolate(), exec_argc);
            for (int i = 0; i < exec_argc; ++i) {
              exec_arguments->Set(i, String::NewFromUtf8(env->isolate(), exec_argv[i]));
            }
            process->Set(env->exec_argv_string(), exec_arguments);

            // create process.env
            Local<ObjectTemplate> process_env_template =
                ObjectTemplate::New(env->isolate());
            process_env_template->SetNamedPropertyHandler(EnvGetter,
                                                          EnvSetter,
                                                          EnvQuery,
                                                          EnvDeleter,
                                                          EnvEnumerator,
                                                          Object::New(env->isolate()));
            Local<Object> process_env = process_env_template->NewInstance();
            process->Set(env->env_string(), process_env);

            READONLY_PROPERTY(process, "pid", Integer::New(env->isolate(), getpid()));
            READONLY_PROPERTY(process, "features", GetFeatures(env));
            process->SetAccessor(env->need_imm_cb_string(),
                NeedImmediateCallbackGetter,
                NeedImmediateCallbackSetter);

            // 根据初始传入参数配置process

            // -e, --eval
            if (eval_string) {
              READONLY_PROPERTY(process,
                                "_eval",
                                String::NewFromUtf8(env->isolate(), eval_string));
            }

            // -p, --print
            if (print_eval) {
              READONLY_PROPERTY(process, "_print_eval", True(env->isolate()));
            }

            // -i, --interactive
            if (force_repl) {
              READONLY_PROPERTY(process, "_forceRepl", True(env->isolate()));
            }

            // --no-deprecation
            if (no_deprecation) {
              READONLY_PROPERTY(process, "noDeprecation", True(env->isolate()));
            }

            // --throw-deprecation
            if (throw_deprecation) {
              READONLY_PROPERTY(process, "throwDeprecation", True(env->isolate()));
            }

            // --trace-deprecation
            if (trace_deprecation) {
              READONLY_PROPERTY(process, "traceDeprecation", True(env->isolate()));
            }

            size_t exec_path_len = 2 * PATH_MAX;
            char* exec_path = new char[exec_path_len];
            Local<String> exec_path_value;
            if (uv_exepath(exec_path, &exec_path_len) == 0) {
              exec_path_value = String::NewFromUtf8(env->isolate(),
                                                    exec_path,
                                                    String::kNormalString,
                                                    exec_path_len);
            } else {
              exec_path_value = String::NewFromUtf8(env->isolate(), argv[0]);
            }
            process->Set(env->exec_path_string(), exec_path_value);
            delete[] exec_path;

            process->SetAccessor(env->debug_port_string(),
                                 DebugPortGetter,
                                 DebugPortSetter);

            // 定义一系列process的方法
            // define various internal methods
            NODE_SET_METHOD(process,
                            "_startProfilerIdleNotifier",
                            StartProfilerIdleNotifier);
            NODE_SET_METHOD(process,
                            "_stopProfilerIdleNotifier",
                            StopProfilerIdleNotifier);
            NODE_SET_METHOD(process, "_getActiveRequests", GetActiveRequests);
            NODE_SET_METHOD(process, "_getActiveHandles", GetActiveHandles);
            NODE_SET_METHOD(process, "reallyExit", Exit);
            NODE_SET_METHOD(process, "abort", Abort);
            NODE_SET_METHOD(process, "chdir", Chdir);
            NODE_SET_METHOD(process, "cwd", Cwd);

            NODE_SET_METHOD(process, "umask", Umask);

          #if defined(__POSIX__) && !defined(__ANDROID__)
            NODE_SET_METHOD(process, "getuid", GetUid);
            NODE_SET_METHOD(process, "setuid", SetUid);

            NODE_SET_METHOD(process, "setgid", SetGid);
            NODE_SET_METHOD(process, "getgid", GetGid);

            NODE_SET_METHOD(process, "getgroups", GetGroups);
            NODE_SET_METHOD(process, "setgroups", SetGroups);
            NODE_SET_METHOD(process, "initgroups", InitGroups);
          #endif  // __POSIX__ && !defined(__ANDROID__)

            NODE_SET_METHOD(process, "_kill", Kill);

            NODE_SET_METHOD(process, "_debugProcess", DebugProcess);
            NODE_SET_METHOD(process, "_debugPause", DebugPause);
            NODE_SET_METHOD(process, "_debugEnd", DebugEnd);

            NODE_SET_METHOD(process, "hrtime", Hrtime);

            // process.dlopen在此绑定,用于加载编译C++ addon模块(动态链接库)
            NODE_SET_METHOD(process, "dlopen", DLOpen);

            NODE_SET_METHOD(process, "uptime", Uptime);
            NODE_SET_METHOD(process, "memoryUsage", MemoryUsage);

            // process.binding方法,用于加载C++核心模块
            NODE_SET_METHOD(process, "binding", Binding);

            NODE_SET_METHOD(process, "_setupAsyncListener", SetupAsyncListener);
            NODE_SET_METHOD(process, "_setupNextTick", SetupNextTick);
            NODE_SET_METHOD(process, "_setupDomainUse", SetupDomainUse);

            // pre-set _events object for faster emit checks
            process->Set(env->events_string(), Object::New(env->isolate()));
          }

          // ...

          // SetupProcessObject之后,回到CreateEnvironment函数中,执行Load函数
          Load(env);

          // 源码 2836行:声明了Load函数,这个函数相当于一个C++和JavaScript环境切换的接口,
          // 它加载并解释了src/node.js文件
          void Load(Environment* env) {
            HandleScope handle_scope(env->isolate());

            // Compile, execute the src/node.js file. (Which was included as static C
            // string in node_natives.h. 'natve_node' is the string containing that
            // source code.)

            // The node.js file returns a function 'f'
            atexit(AtExit);

            TryCatch try_catch;

            // Disable verbose mode to stop FatalException() handler from trying
            // to handle the exception. Errors this early in the start-up phase
            // are not safe to ignore.
            try_catch.SetVerbose(false);

            // 这里开始准备转向src/node.js文件
            Local<String> script_name = FIXED_ONE_BYTE_STRING(env->isolate(), "node.js");
            // 获取node.js的源码字符串
            Local<Value> f_value = ExecuteString(env, MainSource(env), script_name);
            if (try_catch.HasCaught())  {
              ReportException(env, try_catch);
              exit(10);
            }
            assert(f_value->IsFunction());
            // 将f_value字符串转换为C++函数,就是将JavaScript函数编译成C++函数
            Local<Function> f = Local<Function>::Cast(f_value);

            // Now we call 'f' with the 'process' variable that we've built up with
            // all our bindings. Inside node.js we'll take care of assigning things to
            // their places.

            // We start the process this way in order to be more modular. Developers
            // who do not like how 'src/node.js' setups the module system but do like
            // Node's I/O bindings may want to replace 'f' with their own function.

            // Add a reference to the global object
            Local<Object> global = env->context()->Global();
            // ...
            // ...
            // 注释里已经说的清清楚楚,用前面提到的process对象为参数调用这个编译后的C++函数
            Local<Value> arg = env->process_object();
            // 下面这段代码的意思是:将f函数作为global对象的方法调用,等价于JavaScript中的.call()
            // 从这里开始,进程进入了JavaScript的作用域.
            f->Call(global, 1, &arg);
          }

          // ...
          // ...

          // 最后CreateEnvironment函数返回新创建的env对象
          return env;
        }


        // ...
        // 源码 3626行:再次回到Start函数体,执行下面的代码块,启动事件循环
        // This Context::Scope is here so EnableDebug() can look up the current
        // environment with Environment::GetCurrentChecked().
        // TODO(bnoordhuis) Reorder the debugger initialization logic so it can
        // be removed.
        {
          Context::Scope context_scope(env->context());
          bool more;
          do {
            more = uv_run(env->event_loop(), UV_RUN_ONCE);
            if (more == false) {
              EmitBeforeExit(env);

              // Emit `beforeExit` if the loop became alive either after emitting
              // event, or after running some callbacks.
              more = uv_loop_alive(env->event_loop());
              if (uv_run(env->event_loop(), UV_RUN_NOWAIT) != 0)
                more = true;
            }
          } while (more == true);
          code = EmitExit(env);
          RunAtExit(env);
        }
        env->Dispose();
        env = NULL;
      }

      // 如果事件循环引用计数为0,即没有活跃的watchers,就退出事件循环.进程开始善后工作.
      // 例如注销事件处理函数,销毁对象/变量,释放系统资源等等.
      CHECK_NE(node_isolate, NULL);
      node_isolate->Dispose();
      node_isolate = NULL;
      V8::Dispose();

      delete[] exec_argv;
      exec_argv = NULL;

      // 最后返回结束码,结束进程.
      return code;
    }

ok,还记得上面的整个流程中有一处代码是调用JavaScript文件src/node.js?

  f->Call(global, 1, &arg);

已经说它是作为global对象的方法调用的,下面来看离我们最近的src/node.js源码:

// 为什么要将整个程序的执行分为两阶段?换句话说,为何偏偏将这部分提取出来?
// Node中可谓是处处体现模块化思想,遵循Unix设计哲学,
// 这么做的目的一是为了遵循模块化设计,二是将这部分分离出来,便于JavaScript开发者"私人定制":
// 允许用低门槛的JavaScript重写默认的模块建立流程.
// 用原话说:"Developers who do not like how 'src/node.js' setups the module system but do like
//         Node's I/O bindings may want to replace 'f' with their own function."
// 也就是说, 独立出来的这部分JavaScript代码并不包含低层次I/O设计,仅暴露出模块导入系统的设计

(function(process) {
  // C++中的global对象编程函数的this
  // 这段代码将gloabl变为可循环调用,即global.gloabl.global...
  this.global = this;

  // 这份源码的核心逻辑,搭建JavaScript执行环境
  function startup() {
    var EventEmitter = NativeModule.require('events').EventEmitter;

    process.__proto__ = Object.create(EventEmitter.prototype, {
      constructor: {
        value: process.constructor
      }
    });
    EventEmitter.call(process);

    process.EventEmitter = EventEmitter; // process.EventEmitter is deprecated

    // Setup the tracing module
    NativeModule.require('tracing')._nodeInitialization(process);

    // do this good and early, since it handles errors.
    startup.processFatal();

    startup.globalVariables();
    startup.globalTimeouts();
    startup.globalConsole();

    startup.processAssert();
    startup.processConfig();
    startup.processNextTick();
    startup.processStdio();
    startup.processKillAndExit();
    startup.processSignalHandlers();

    startup.processChannel();

    startup.processRawDebug();

    startup.resolveArgv0();

    // There are various modes that Node can run in. The most common two
    // are running from a script and running the REPL - but there are a few
    // others like the debugger or running --eval arguments. Here we decide
    // which mode we run in.
    if (NativeModule.exists('_third_party_main')) {
      // 注意,如果仅仅想扩展node的功能,那么尽量别在这个地方添加你的私人扩展模块
      // 因为这个if里仅有一个nextTick,执行完整个代码就结束了,除非重写这部分

      // To allow people to extend Node in different ways, this hook allows
      // one to drop a file lib/_third_party_main.js into the build
      // directory which will be executed instead of Node's normal loading.
      process.nextTick(function() {
        NativeModule.require('_third_party_main');
      });

    } else if (process.argv[1] == 'debug') {
      // Start the debugger agent
      var d = NativeModule.require('_debugger');
      d.start();

    } else if (process._eval != null) {
      // User passed '-e' or '--eval' arguments to Node.
      evalScript('[eval]');
    } else if (process.argv[1]) {
      // 这里就是正常启动模式,执行你的js文件
      // make process.argv[1] into a full path
      var path = NativeModule.require('path');
      process.argv[1] = path.resolve(process.argv[1]);

      // If this is a worker in cluster mode, start up the communication
      // channel.
      if (process.env.NODE_UNIQUE_ID) {
        var cluster = NativeModule.require('cluster');
        cluster._setupWorker();

        // Make sure it's not accidentally inherited by child processes.
        delete process.env.NODE_UNIQUE_ID;
      }

      // 为使标准的模块加载系统:require可用,
      // 这里通过核心模块加载系统NativeModule.require预先加载了核心模块lib/module.js
      var Module = NativeModule.require('module');

      if (global.v8debug &&
          process.execArgv.some(function(arg) {
            return arg.match(/^--debug-brk(=[0-9]*)?$/);
          })) {

        // XXX Fix this terrible hack!
        //
        // Give the client program a few ticks to connect.
        // Otherwise, there's a race condition where `node debug foo.js`
        // will not be able to connect in time to catch the first
        // breakpoint message on line 1.
        //
        // A better fix would be to somehow get a message from the
        // global.v8debug object about a connection, and runMain when
        // that occurs.  --isaacs

        var debugTimeout = +process.env.NODE_DEBUG_TIMEOUT || 50;
        setTimeout(Module.runMain, debugTimeout);

      } else {
        // Main entry point into most programs:
        Module.runMain();
      }

    } else {
      // 最后的选择,也就是什么参数也不加的REPL交互模式
      var Module = NativeModule.require('module');

      // If -i or --interactive were passed, or stdin is a TTY.
      if (process._forceRepl || NativeModule.require('tty').isatty(0)) {
        // REPL
        var opts = {
          useGlobal: true,
          ignoreUndefined: false
        };
        if (parseInt(process.env['NODE_NO_READLINE'], 10)) {
          opts.terminal = false;
        }
        if (parseInt(process.env['NODE_DISABLE_COLORS'], 10)) {
          opts.useColors = false;
        }
        var repl = Module.requireRepl().start(opts);
        repl.on('exit', function() {
          process.exit();
        });

      } else {
        // Read all of stdin - execute it.
        process.stdin.setEncoding('utf8');

        var code = '';
        process.stdin.on('data', function(d) {
          code += d;
        });

        process.stdin.on('end', function() {
          process._eval = code;
          evalScript('[stdin]');
        });
      }
    }
  }

  startup.globalVariables = function() {
    // 这里有常见的全局变量定义
    global.process = process;
    global.global = global;
    global.GLOBAL = global;
    global.root = global;
    global.Buffer = NativeModule.require('buffer').Buffer;
    process.domain = null;
    process._exiting = false;
  };

  startup.globalTimeouts = function() {
    global.setTimeout = function() {
      var t = NativeModule.require('timers');
      return t.setTimeout.apply(this, arguments);
    };

    global.setInterval = function() {
      var t = NativeModule.require('timers');
      return t.setInterval.apply(this, arguments);
    };

    global.clearTimeout = function() {
      var t = NativeModule.require('timers');
      return t.clearTimeout.apply(this, arguments);
    };

    global.clearInterval = function() {
      var t = NativeModule.require('timers');
      return t.clearInterval.apply(this, arguments);
    };

    global.setImmediate = function() {
      var t = NativeModule.require('timers');
      return t.setImmediate.apply(this, arguments);
    };

    global.clearImmediate = function() {
      var t = NativeModule.require('timers');
      return t.clearImmediate.apply(this, arguments);
    };
  };

  startup.globalConsole = function() {
    global.__defineGetter__('console', function() {
      return NativeModule.require('console');
    });
  };


  startup._lazyConstants = null;

  startup.lazyConstants = function() {
    if (!startup._lazyConstants) {
      startup._lazyConstants = process.binding('constants');
    }
    return startup._lazyConstants;
  };

  // 以下省略了一些源码,包括process.nextTick,stream处理,信号接收等初始化函数
  // 还有后面提到的核心模块加载系统
  // ...
  // ...

  最后调用startup函数执行这份源码的核心任务
  startup();
});

以上,就是Node进程的启动流程,接下来的主题是Node中模块(module)的加载过程

模块化设计理念

涉及源码

src/module.js
src/node.js
src/node_extensions.h
src/node_extensions.cc

模块引用是写Node程序时必有(可以这么说)的一个环节.先来看看高层模块加载系统lib/module.js的一部分源码:

  // 先加载了native_module模块,
  // 你会发现这个native_module并不存在于lib目录下
  // 还有require函数哪里来?
  var NativeModule = require('native_module');

下面详细分析lib/node.js中的模块加载部分:

  // 从这里开始,定义的就是Node的JavaScript核心模块加载系统了
  function NativeModule(id) {
    this.filename = id + '.js';
    this.id = id;
    this.exports = {};
    this.loaded = false;
  }

  // 将所有位于lib目录下的js核心模块源代码加载到字符串数组里
  // 注意此时这些模块并没有经过编译
  NativeModule._source = process.binding('natives');
  NativeModule._cache = {};

  // require函数的底层调用
  NativeModule.require = function(id) {
    // 这段代码解决了我们的第一个问题,
    // 调用NativeModule.require('native_module')会返回NativeModule本身
    if (id == 'native_module') {
      return NativeModule;
    }

    var cached = NativeModule.getCached(id);
    if (cached) {
      // 如果有该模块的缓存,就直接使用缓存
      return cached.exports;
    }

    if (!NativeModule.exists(id)) {
      throw new Error('No such native module ' + id);
    }

    // 加入已加载模块列表
    process.moduleLoadList.push('NativeModule ' + id);

    // 新建一个这个模块的NativeModule对象
    var nativeModule = new NativeModule(id);

    // 缓存这个模块
    nativeModule.cache();
    // 并对模块源码进行编译
    nativeModule.compile();

    // 最后返回这个模块内部的导出对象
    return nativeModule.exports;
  };

  NativeModule.getCached = function(id) {
    return NativeModule._cache[id];
  }

  NativeModule.exists = function(id) {
    return NativeModule._source.hasOwnProperty(id);
  }

  NativeModule.getSource = function(id) {
    return NativeModule._source[id];
  }

  NativeModule.wrap = function(script) {
    return NativeModule.wrapper[0] + script + NativeModule.wrapper[1];
  };

  NativeModule.wrapper = [
    '(function (exports, require, module, __filename, __dirname) { ',
    '\n});'
  ];

  // 模块编译
  NativeModule.prototype.compile = function() {
    // 先获取模块的源码字符串
    var source = NativeModule.getSource(this.id);
    // 这一步很重要.将源码字符串进行包装:
    // 头部添加:
    // "(function (exports, require, module, __filename, __dirname) { "
    // 末尾添加:
    // "\n});"
    // 将源码包在一个函数里
    // 函数的参数包含exports,require,module
    // 是不是恍然大悟呢?我们的第二个问题解决了:
    // 模块中看似全局变量的require函数其实是通过包装函数的参数引入的
    source = NativeModule.wrap(source);

    // 接下来调用runInThisContext函数解析包装后的源码字符串,返回真正的JavaScript函数
    var fn = runInThisContext(source, { filename: this.filename });
    // 最后调用这个函数
    // 注意传入函数的前三个参数:
    // exports: nativemodule.exports
    // require: NativeModule.require
    // module: nativemodule
    // 因此每个模块之间的module被隔离开,而require函数始终是同一个
    fn(this.exports, NativeModule.require, this, this.filename);

    this.loaded = true;
  };

  NativeModule.prototype.cache = function() {
    NativeModule._cache[this.id] = this;
  };

看完上面这段源码,我想你会清楚Node环境下module对象的那些成员的来历了.下面同样来自src/node.js,是模块编译中掉用的runInThisContext函数的声明:

  var ContextifyScript = process.binding('contextify').ContextifyScript;
  // 该函数是NativeModule.require的模块编译过程中调用的重要函数
  // 以此函数再次回到C++领域
  // 由于使用了process.binding('contextify'),我们就要到src目录下寻找相关文件
  function runInThisContext(code, options) {
    var script = new ContextifyScript(code, options);
    return script.runInThisContext();
  }

我们再次回到C++的作用域,contextify内置模块的所属文件为src/node_contextify.cc:

  // ...
  // 哈,在源码433行找到了我们要找的函数定义
  class ContextifyScript : public BaseObject {
    // ...
    // ...
    NODE_SET_PROTOTYPE_METHOD(script_tmpl, "runInContext", RunInContext);
    // 在这里runInThisContext函数被绑定到了ContextifyScript对象上
    // 所以我们要找到RunInThisContext函数/方法的定义!
    NODE_SET_PROTOTYPE_METHOD(script_tmpl,
                              "runInThisContext",
                              RunInThisContext);
    // ...
    // ...
    // 继续向下,在源码501行找到了RunInThisContext这个函数
    // ok,来观摩一下这个函数吧
    static void RunInThisContext(const FunctionCallbackInfo<Value>& args) {
      Isolate* isolate = args.GetIsolate();
      HandleScope handle_scope(isolate);

      // Assemble arguments
      TryCatch try_catch;
      uint64_t timeout = GetTimeoutArg(args, 0);
      bool display_errors = GetDisplayErrorsArg(args, 0);
      if (try_catch.HasCaught()) {
        try_catch.ReThrow();
        return;
      }

      // Do the eval within this context
      Environment* env = Environment::GetCurrent(isolate);
      // 这就是真正编译我们引入模块的函数!
      EvalMachine(env, timeout, display_errors, args, try_catch);
    }

通过阅读前一部分src/node.js源码不难发现,在初始化环境时(startup函数最后的if分支部分)调用了:

var Module = NativeModule.require('module');

也就是说在你的JavaScript代码执行之前,就已经存在了经过编译之后的module模块.

已经分析到这一步了,但是我们的"主模块",也就是通过node app.js执行的app.js是如何加载的呢?

注意src/node.js在条件分支的普通模式最后执行了:

Module.runMain();

下面我们继续解读lib/module.js的源码剩余部分:

  // module.js模块导出的是Module对象
  module.exports = Module;

  // 再来看下Module对象的定义,源码38行:
  function Module(id, parent) {
    this.id = id;
    this.exports = {};
    this.parent = parent;
    if (parent && parent.children) {
      parent.children.push(this);
    }

    this.filename = null;
    this.loaded = false;
    this.children = [];
  }

  // Set the environ variable NODE_MODULE_CONTEXTS=1 to make node load all
  // modules in their own context.
  Module._contextLoad = (+process.env['NODE_MODULE_CONTEXTS'] > 0);

  // 将NativeModule的两个`wrap`方法赋值给Module
  Module.wrapper = NativeModule.wrapper;
  Module.wrap = NativeModule.wrap;

  // Module.runMain方法在源码499行定义
  Module.runMain = function() {
    // Load the main module--the command line argument.
    // 加载process.argv[1]提供的模块,也就是你的主模块
    // 在刚刚进入普通运行模式时,执行了这么一段代码:
    // process.argv[1] = path.resolve(process.argv[1]);
    // 因此现在的参数是经过路径解析之后的
    Module._load(process.argv[1], null, true);
    // Handle any nextTicks added in the first tick of the program
    process._tickCallback();
  };

  // Module._load方法定义在源码的273行
  // 由参数可知,Module.runMain方法中调用的确实就是主模块:
  // 它被参数isMain标记,而runMain中传入_load的是true,
  // parent参数的值为null
  Module._load = function(request, parent, isMain) {
    if (parent) {
      debug('Module._load REQUEST  ' + (request) + ' parent: ' + parent.id);
    }

    // 解析模块的文件名
    var filename = Module._resolveFilename(request, parent);

    var cachedModule = Module._cache[filename];
    if (cachedModule) {
      return cachedModule.exports;
    }

    if (NativeModule.exists(filename)) {
      // REPL is a special case, because it needs the real require.
      if (filename == 'repl') {
        var replModule = new Module('repl');
        replModule._compile(NativeModule.getSource('repl'), 'repl.js');
        NativeModule._cache.repl = replModule;
        return replModule.exports;
      }

      debug('load native module ' + request);
      return NativeModule.require(filename);
    }

    // 新建一个该模块的module对象
    var module = new Module(filename, parent);

    // 如果待加载的该模块是主模块
    if (isMain) {
      // 设置process对象的mainProcess属性
      process.mainModule = module;
      // 并将主模块的id重置为"."
      module.id = '.';
    }

    Module._cache[filename] = module;

    var hadException = true;

    try {
      // 开始加载这个模块
      module.load(filename);
      hadException = false;
    } finally {
      if (hadException) {
        delete Module._cache[filename];
      }
    }

    // 最后返回模块内部的导出对象
    return module.exports;
  };

  // 下面是Module.prototype.load原型中方法的定义,源码345行:
  // 这个方法将给定的文件名追加合适的扩展名
  Module.prototype.load = function(filename) {
    debug('load ' + JSON.stringify(filename) +
          ' for module ' + JSON.stringify(this.id));

    assert(!this.loaded);
    // 设置module的文件名
    this.filename = filename;
    // 获取这个模块所在文件的路径
    this.paths = Module._nodeModulePaths(path.dirname(filename));

    // 获取文件的扩展名,如果没有的话就追加一个.js
    var extension = path.extname(filename) || '.js';
    // 如果文件扩展名不规范,同样将扩展名定位.js
    if (!Module._extensions[extension]) extension = '.js';
    // 根据不同扩展名,调用合适的方法加载/编译该模块
    Module._extensions[extension](this, filename);
    // 最后将该模块的loaded属性设为true
    this.loaded = true;
  };

  // Module._extensions在源码475行定义:
  // 对不同种类的模块有不同的加载方法
  // Native extension for .js
  Module._extensions['.js'] = function(module, filename) {
    // .js文件
    // 先读取,再编译
    var content = fs.readFileSync(filename, 'utf8');
    // 编译方式和NativeModule的编译方式基本相同
    module._compile(stripBOM(content), filename);
  };


  // Native extension for .json
  Module._extensions['.json'] = function(module, filename) {
    var content = fs.readFileSync(filename, 'utf8');
    try {
      // .json
      // 用JSON.parse解析
      module.exports = JSON.parse(stripBOM(content));
    } catch (err) {
      err.message = filename + ': ' + err.message;
      throw err;
    }
  };


  //Native extension for .node
  // C++ addon扩展模块,又process.dlopen方法加载
  Module._extensions['.node'] = process.dlopen;

  // 最后看下Module.prototype._compile方法的源码,第378行定义:
  Module.prototype._compile = function(content, filename) {
    var self = this;
    // remove shebang
    content = content.replace(/^\#\!.*/, '');

    // 注意在module.js这个文件,
    // 这里重新定义了require方法,
    // 因此今后调用的require全是该方法的引用,
    // 而不是NativeModule.require了!
    function require(path) {
      // 这个是Module.prototype.require
      return self.require(path);
    }

  // 源码 364行
  // 这里定义了普通模块中的require方法
  // Loads a module at the given file path. Returns that module's
  // `exports` property.
  Module.prototype.require = function(path) {
    assert(path, 'missing path');
    assert(util.isString(path), 'path must be a string');
    return Module._load(path, this);
  };
...
...
    // 回到Module.prototype._compile函数的作用域
    require.resolve = function(request) {
      return Module._resolveFilename(request, self);
    };

    Object.defineProperty(require, 'paths', { get: function() {
      throw new Error('require.paths is removed. Use ' +
                      'node_modules folders, or the NODE_PATH ' +
                      'environment variable instead.');
    }});

    require.main = process.mainModule;

    // Enable support to add extra extension types
    require.extensions = Module._extensions;
    require.registerExtension = function() {
      throw new Error('require.registerExtension() removed. Use ' +
                      'require.extensions instead.');
    };

    require.cache = Module._cache;

    var dirname = path.dirname(filename);

    // 如果设置了环境变量NODE_MODULE_CONTEXTS=1, 各模块将在自己的上下文加载.
    if (Module._contextLoad) {
      // 如果加载的并非主模块,(别忘了主模块的id为".")
      // 则在sandbox环境中运行代码
      if (self.id !== '.') {
        debug('load submodule');
        // not root module
        var sandbox = {};
        for (var k in global) {
          sandbox[k] = global[k];
        }
        sandbox.require = require;
        sandbox.exports = self.exports;
        sandbox.__filename = filename;
        sandbox.__dirname = dirname;
        sandbox.module = self;
        sandbox.global = sandbox;
        sandbox.root = root;

        return runInNewContext(content, sandbox, { filename: filename });
      }

      // 否则就是主模块
      debug('load root module');
      // root module
      global.require = require;
      global.exports = self.exports;
      global.__filename = filename;
      global.__dirname = dirname;
      global.module = self;

      return runInThisContext(content, { filename: filename });
    }

    // 正常启动时, 这里包装编译普通模块, 和NativeModule的包装方法一样.
    // create wrapper function
    var wrapper = Module.wrap(content);

    var compiledWrapper = runInThisContext(wrapper, { filename: filename });
    if (global.v8debug) {
      if (!resolvedArgv) {
        // we enter the repl if we're not given a filename argument.
        if (process.argv[1]) {
          resolvedArgv = Module._resolveFilename(process.argv[1], null);
        } else {
          resolvedArgv = 'repl';
        }
      }

      // Set breakpoint on module start
      if (filename === resolvedArgv) {
        global.v8debug.Debug.setBreakPoint(compiledWrapper, 0, 0);
      }
    }
    // 设置模块wrapper函数的参数
    var args = [self.exports, require, self, filename, dirname];
    // 调用wrapper函数.
    return compiledWrapper.apply(self.exports, args);
  };

到此为止,整个流程分析完毕,Node布置好一切并执行了程序.

@abbshr abbshr self-assigned this Aug 28, 2014
@abbshr
Copy link
Owner Author

abbshr commented Mar 16, 2015

  • 修正了几处错误
  • Module.prototype.require添加进去, 不然module.require无法解释.
  • 新增几处注释

@terry-fei
Copy link

👍

@abbshr
Copy link
Owner Author

abbshr commented Mar 17, 2015

NODE_MODULE_CONTEXTS环境变量启动有个bug, 昨天给io.js提交了一个pull request nodejs/node#1160, 最后他们决定把NODE_MODULE_CONTEXTS=1这个选项砍掉了.
nodejs/node#1162

所以下面这段代码删掉.

    // 如果设置了环境变量NODE_MODULE_CONTEXTS=1, 各模块将在自己的上下文加载.
    if (Module._contextLoad) {
      // 如果加载的并非主模块,(别忘了主模块的id为".")
      // 则在sandbox环境中运行代码
      if (self.id !== '.') {
        debug('load submodule');
        // not root module
        var sandbox = {};
        for (var k in global) {
          sandbox[k] = global[k];
        }
        sandbox.require = require;
        sandbox.exports = self.exports;
        sandbox.__filename = filename;
        sandbox.__dirname = dirname;
        sandbox.module = self;
        sandbox.global = sandbox;
        sandbox.root = root;

        return runInNewContext(content, sandbox, { filename: filename });
      }

      // 否则就是主模块
      debug('load root module');
      // root module
      global.require = require;
      global.exports = self.exports;
      global.__filename = filename;
      global.__dirname = dirname;
      global.module = self;

      return runInThisContext(content, { filename: filename });
    }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants