node内存溢出之懵逼了 - CNode技术社区

最近看内存溢出相关的问题，有问题及现象如下，求解图一如下意料之中结果如下：

懵逼的是图二如下：结果如下：

问题：图二中的settimout是怎么影响图一的运行的

参考部分实现 https://github.com/nodejs/node/blob/1d2fd8b65bacaf4401450edc8ed529106cbcfc67/lib/timers.js 上面的 setTimeout(noop, 100) 和下面的 setTimeout(noop, 100) 都使用了 100 作为到期时间, 使用相同到期时间的 Timeout 会被放进到同一个双向循环链表

如果有上面那句, 那么, 上面的 setTimeout(noop, 100), 执行的时候, 这个链表会被创建, 后面的 setTimeout(noop, 100)执行的时候, 由于这个链表已经存在, 所以后面的 Timeout 对象都会被放进这个链表, 也就是, 只有一个链表
如果没有上面那句, 那么, setTimeout(noop, 100) 执行的时候会创建一个链表并把当前的 Timeout 对象放进链表, clearTimeout执行的时候会把该 Timeout 对象从链表移除, 由于链表这时候为空, 所以这个链表也会被 delete, 所以下一次执行, 又会创建一个新的链表, 也就是说, 每次迭代都会创建一个链表和 delete 掉

当链表被 delete 之后, 虽然你调用了 gc, 不过这里我个人推测是使用增量式的GC, 所以, 那些被 delete 的链表所占用的内存并不马上被回收

你可以在后面加一句看看过一会内存使用情况, 我这边大约过 10 秒内存会降下来

setInterval(function() {
  console.log((process.memoryUsage().heapUsed/1024/1024).toFixed(2))
}, 1000)

liangtongzhuo 2楼•7 年前

@William17 👏鼓掌

a631807682 3楼•7 年前

@William17 鼓掌鼓掌

AnzerWall 4楼•7 年前

@William17 666666666666666，瞅了一会源码，只看到链表的添加和移除。完全不知道还有不完全gc的问题。。。。。结果我也蒙蔽，你这么一说好像是对了。。。。

AviorAlong 5楼•7 年前作者

@William17 谢大佬

hyj1991 6楼•7 年前

一楼的回答说的比较完善了，只是内存没有释放的原因没有提到，for 循环前没有创建 setTimeout(fn, 100) 的话，每次会新建一个 list:

// lib/timers.js
function insert(item, unrefed, start) {
	//...
	const lists = unrefed === true ? unrefedLists : refedLists;
	var list = lists[msecs];
  if (list === undefined) {
    debug('no %d list was found in insert, creating a new one', msecs);
	// 这里会每次新建一个 TimersList 链表
    lists[msecs] = list = new TimersList(msecs, unrefed);
  }
  //...
}

而这个 TimersList 链表中会在底层的 libuv 里面映射一个真正的 Timer:

function TimersList(msecs, unrefed) {
  this._idleNext = this; // Create the list with the linkedlist properties to
  this._idlePrev = this; // prevent any unnecessary hidden class changes.
  this._unrefed = unrefed;
  this.msecs = msecs;
  // 这里的 TimerWrap 来自 process.binding('timer_wrap')，即 src/timer_wrap.cc
  const timer = this._timer = new TimerWrap();
  timer._list = this;

  if (unrefed === true)
    timer.unref();
  timer.start(msecs);
}

其实发展到现在，不管是 scavange 还是 marksweep/markcompacting 都是多线程的模式并行 gc，也就是这种大的 for 循环并不会阻止掉 gc 导致 OOM，这一点可以通过加上 --trace_gc 的 flag 看到:

[64569:0x102801c00]       30 ms: Scavenge 2.7 (3.8) -> 2.4 (4.8) MB, 1.1 / 0.0 ms  allocation failure 
[64569:0x102801c00]       38 ms: Scavenge 2.9 (4.8) -> 2.8 (5.8) MB, 1.3 / 0.0 ms  allocation failure 
[64569:0x102801c00]       57 ms: Scavenge 4.1 (6.3) -> 4.3 (8.8) MB, 0.7 / 0.0 ms  allocation failure 
4.872184753417969M
[64569:0x102801c00]       91 ms: Scavenge 6.2 (10.3) -> 5.3 (11.3) MB, 1.2 / 0.0 ms  allocation failure 
[64569:0x102801c00]      103 ms: Scavenge 6.8 (11.3) -> 5.9 (11.3) MB, 1.1 / 0.1 ms  allocation failure 
[64569:0x102801c00]      117 ms: Scavenge 7.3 (11.3) -> 6.7 (17.3) MB, 3.4 / 0.1 ms  allocation failure 
[64569:0x102801c00]      134 ms: Scavenge 9.8 (17.3) -> 8.4 (17.3) MB, 3.0 / 0.2 ms  allocation failure 
[64569:0x102801c00]      150 ms: Scavenge 10.6 (17.3) -> 9.5 (19.3) MB, 5.6 / 0.2 ms  allocation failure 
[64569:0x102801c00]      165 ms: Scavenge 12.2 (19.3) -> 11.0 (27.8) MB, 3.1 / 0.2 ms  allocation failure 
[64569:0x102801c00]      200 ms: Scavenge 17.3 (27.8) -> 14.5 (29.8) MB, 7.0 / 0.3 ms  allocation failure 
[64569:0x102801c00]      226 ms: Scavenge 18.7 (29.8) -> 16.6 (33.8) MB, 8.6 / 0.4 ms  allocation failure 
[64569:0x102801c00]      254 ms: Scavenge 22.1 (33.8) -> 19.6 (50.8) MB, 7.3 / 0.3 ms  allocation failure 
[64569:0x102801c00]      344 ms: Scavenge 32.3 (50.8) -> 26.5 (54.8) MB, 13.7 / 0.6 ms  allocation failure 
[64569:0x102801c00]      415 ms: Scavenge 35.2 (54.8) -> 31.0 (61.8) MB, 16.7 / 0.9 ms  allocation failure 
[64569:0x102801c00]      499 ms: Scavenge 41.9 (61.8) -> 36.9 (65.8) MB, 18.2 / 0.7 ms  allocation failure 
[64569:0x102801c00]      568 ms: Mark-sweep 45.5 (65.8) -> 44.6 (71.8) MB, 11.1 / 2.1 ms  (+ 14.6 ms in 26 steps since start of marking, biggest step 4.3 ms, walltime since start of marking 314 ms) finalize incremental marking via stack guard GC in old space requested
50.10138702392578M
[64569:0x102801c00]      606 ms: Scavenge 51.7 (71.8) -> 44.5 (76.3) MB, 10.5 / 0.6 ms  allocation failure 
[64569:0x102801c00]      666 ms: Scavenge 56.3 (76.3) -> 50.9 (80.3) MB, 13.7 / 0.7 ms  allocation failure 
[64569:0x102801c00]      718 ms: Scavenge 60.0 (80.3) -> 55.7 (86.8) MB, 14.3 / 0.7 ms  allocation failure 
[64569:0x102801c00]      773 ms: Scavenge 66.3 (86.8) -> 61.4 (91.8) MB, 13.9 / 0.7 ms  allocation failure 
[64569:0x102801c00]      824 ms: Scavenge 71.1 (91.8) -> 66.5 (97.3) MB, 13.9 / 0.7 ms  allocation failure 
[64569:0x102801c00]      879 ms: Scavenge 76.8 (97.3) -> 72.0 (102.8) MB, 14.5 / 0.7 ms  allocation failure 
[64569:0x102801c00]      935 ms: Scavenge 82.0 (102.8) -> 77.3 (108.3) MB, 12.9 / 0.8 ms  allocation failure 
87.4111099243164M
[64569:0x102801c00]     1017 ms: Scavenge 87.4 (108.3) -> 82.7 (113.8) MB, 37.5 / 1.0 ms  allocation failure 
[64569:0x102801c00]     1101 ms: Scavenge 92.7 (113.8) -> 88.0 (119.3) MB, 17.0 / 0.7 ms  allocation failure 
[64569:0x102801c00]     1211 ms: Scavenge 98.1 (119.3) -> 93.4 (125.3) MB, 19.9 / 0.7 ms  allocation failure 
[64569:0x102801c00]     1300 ms: Mark-sweep 98.6 (125.3) -> 98.3 (129.8) MB, 21.8 / 5.0 ms  (+ 45.9 ms in 134 steps since start of marking, biggest step 4.7 ms, walltime since start of marking 199 ms) finalize incremental marking via stack guard GC in old space requested
[64569:0x102801c00]     1349 ms: Scavenge 108.8 (129.8) -> 101.7 (132.8) MB, 11.8 / 0.5 ms  allocation failure 
[64569:0x102801c00]     1400 ms: Scavenge 111.6 (132.8) -> 107.0 (139.3) MB, 13.6 / 0.7 ms  allocation failure 
[64569:0x102801c00]     1451 ms: Scavenge 117.2 (139.3) -> 112.4 (145.3) MB, 13.5 / 0.7 ms  allocation failure 
[64569:0x102801c00]     1505 ms: Scavenge 122.4 (145.3) -> 117.7 (151.8) MB, 13.2 / 1.1 ms  allocation failure 
124.96233367919922M
[64569:0x102801c00]     1561 ms: Scavenge 127.8 (151.8) -> 123.1 (154.3) MB, 14.2 / 0.7 ms  allocation failure 
[64569:0x102801c00]     1612 ms: Scavenge 133.1 (154.3) -> 128.4 (160.8) MB, 12.8 / 0.7 ms  allocation failure 
[64569:0x102801c00]     1666 ms: Scavenge 138.5 (160.8) -> 133.8 (165.8) MB, 15.2 / 0.7 ms  allocation failure

那么 for 循环前没有创建 setTimeout(fn, 100) 导致内存释放不掉真正的原因是 clearTimeout 的操作会调用到 reuse 方法:

function reuse(item) {
  // timer 和构造的 list 解绑
  L.remove(item);
  const list = refedLists[item._idleTimeout];
  // if empty - reuse the watcher
  if (list !== undefined && L.isEmpty(list)) {
    debug('reuse hit');
	// 关闭底层的 libuv 上注册的定时器
    list._timer.stop();
    delete refedLists[item._idleTimeout];
    return list._timer;
  }
  return null;
}

这里的 list._timer.stop();，这个方法会调用 timer_wrap.cc 中的 uv_timer_stop，而 uv_timer_stop 只有在事件循环的下一个循环才有机会执行释放掉，也就是必须等你编写的 for 循环执行完毕后才能释放掉每次创建 list 注册到 libuv 上的 Timer 实例，而 10e7 次的大循环，显然等不到释放的时机就会因为注册了过多的 uv_timer 而 OOM 掉了。

因此总结下就是你的例子中，gc 是会在 10e7 次的大循环中间穿插执行的，因此两种写法下每次 setTimeout/clearTimeout 创建的 Timeout 实例都会被穿插 gc 掉不会影响到堆内内存大小，但是不加 setTimeout(fn, 100) 的情况下每次创建 list 而注册到 libuv 上的定时器只有等到 10e7 次的大循环执行完毕后才有机会释放掉，这样就造成了内存溢出的现象；相比下在大循环前加上了 setTimeout(fn, 100)，只会注册 1 个 libuv 上的定时器，这样就不会溢出。