egg-scripts start 的egg 项目有上限吗？突然某些项目会自动关闭？为什么？

我在一台测试的阿里云服务器上（1核cpu, 2 G内存, CentOS 7.4 64位）使用egg-scripts start 运行了 13个 egg.js 的项目。然后就开始持续的某些项目会自动关闭。

$netstat -tunlp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
tcp        0      0 127.0.0.1:40954         0.0.0.0:*               LISTEN      14117/node          
tcp        0      0 127.0.0.1:38938         0.0.0.0:*               LISTEN      14133/node          
tcp        0      0 127.0.0.1:7002          0.0.0.0:*               LISTEN      1686/node           
tcp        0      0 127.0.0.1:7101          0.0.0.0:*               LISTEN      21578/node          
tcp        0      0 127.0.0.1:3101          0.0.0.0:*               LISTEN      4616/node           
tcp        0      0 127.0.0.1:3102          0.0.0.0:*               LISTEN      24280/node          
tcp        0      0 127.0.0.1:7102          0.0.0.0:*               LISTEN      30380/node          
tcp        0      0 127.0.0.1:32000         0.0.0.0:*               LISTEN      16685/java          
tcp        0      0 127.0.0.1:43392         0.0.0.0:*               LISTEN      15767/node          
tcp        0      0 127.0.0.1:7008          0.0.0.0:*               LISTEN      5790/node           
tcp        0      0 127.0.0.1:33761         0.0.0.0:*               LISTEN      20628/node          
tcp        0      0 127.0.0.1:7009          0.0.0.0:*               LISTEN      8403/node           
tcp        0      0 127.0.0.1:35042         0.0.0.0:*               LISTEN      8595/node           
tcp        0      0 127.0.0.1:7011          0.0.0.0:*               LISTEN      6817/node           
tcp        0      0 127.0.0.1:43237         0.0.0.0:*               LISTEN      14823/node          
tcp        0      0 127.0.0.1:37381         0.0.0.0:*               LISTEN      13771/node          
tcp        0      0 127.0.0.1:7013          0.0.0.0:*               LISTEN      8537/node           
tcp        0      0 127.0.0.1:45445         0.0.0.0:*               LISTEN      24286/node          
tcp        0      0 127.0.0.1:44006         0.0.0.0:*               LISTEN      10339/node          
tcp        0      0 127.0.0.1:44294         0.0.0.0:*               LISTEN      13705/node          
tcp        0      0 127.0.0.1:7015          0.0.0.0:*               LISTEN      5008/node           
tcp        0      0 127.0.0.1:6379          0.0.0.0:*               LISTEN      1578/redis-server 1 
tcp        0      0 127.0.0.1:39084         0.0.0.0:*               LISTEN      32417/node          
tcp        0      0 127.0.0.1:32846         0.0.0.0:*               LISTEN      27467/node          
tcp        0      0 0.0.0.0:80              0.0.0.0:*               LISTEN      3152/nginx: master  
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      1571/sshd           
tcp        0      0 127.0.0.1:3000          0.0.0.0:*               LISTEN      14126/node          
tcp        0      0 127.0.0.1:7000          0.0.0.0:*               LISTEN      18806/node          
udp        0      0 172.18.163.223:123      0.0.0.0:*                           456/ntpd            
udp        0      0 127.0.0.1:123           0.0.0.0:*                           456/ntpd            
udp        0      0 0.0.0.0:123             0.0.0.0:*                           456/ntpd            
udp6       0      0 :::123                  :::*                                456/ntpd

查看

$free
              total        used        free      shared  buff/cache   available
Mem:        1882736     1639432      110384         584      132920       91400
Swap:             0           0           0

查看

$top
 PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND                                                                                                                                    
16685 root      20   0 2057540  58664   1748 S  1.3  3.1   1:10.59 java                                                                                                                                       
 1225 root      20   0  129760   3152    944 S  0.3  0.2  42:45.29 AliYunDun                                                                                                                                  
 8420 root      20   0 1106180  44192   2088 S  0.3  2.3   0:04.17 node                                                                                                                                       
 8595 root      20   0 1222192  41316   4748 S  0.3  2.2   0:19.50 node                                                                                                                                       
17687 root      20   0 1229388  55084   4516 S  0.3  2.9   0:03.79 node                                                                                                                                       
    1 root      20   0   43388   2312   1128 S  0.0  0.1   0:13.30 systemd                                                                                                                                    
    2 root      20   0       0      0      0 S  0.0  0.0   0:00.85 kthreadd                                                                                                                                   
    3 root      20   0       0      0      0 S  0.0  0.0   0:16.02 ksoftirqd/0                                                                                                                                
    5 root       0 -20       0      0      0 S  0.0  0.0   0:00.00 kworker/0:0H

看资源也不是占得比较多，各位大大请问问题是在哪里？

atian25 1楼•7 年前

egg-scripts stop 默认是关闭所有的应用，你看看是不是这个原因，需要加 title 参数来启动。

egg 本身没有做任何限制，如果不是上述原因，那就是系统内存或 CPU 方面的，检查下错误日志。

thomas0836 2楼•7 年前作者

@atian25 启动和stop 都有带 title的

thomas0836 3楼•7 年前作者

@atian25

2018-12-05 11:41:01,892 ERROR 22667 nodejs.ClusterClientNoResponseError: client no response in 61140ms exceeding maxIdleTime 60000ms, maybe the connection is close on other side.
    at Timeout.Leader._heartbeatTimer.setInterval [as _onTimeout] (/home/egg/u_system/node_modules/cluster-client/lib/leader.js:74:23)
    at ontimeout (timers.js:466:11)
    at tryOnTimeout (timers.js:304:5)
    at Timer.listOnTimeout (timers.js:267:5)
name: "ClusterClientNoResponseError"
pid: 22667
hostname: xxxxxxxxxxxxx

这个是，因为资源不足才这样吗？在common-error.log里面看到的错误，基本就只有这个了

thomas0836 4楼•7 年前作者

@atian25 还有，我用top 和 free 命令去看，资源占用也不是说特别厉害。如果真的是系统内存或 CPU 方面的，就现在这种情况是加内存还是加cpu？

atian25 5楼•7 年前

这个报错看起来是 CPU 很满，导致连心跳都发不了。你做了什么特别消耗 CPU 的事么？接入 alinode 分析吧

thomas0836 6楼•7 年前作者

@atian25 全部都有接入 alinode 分析的了。就是运行了 10+ 的egg 项目。但是我用top 和阿里云上服务器的cpu监控，都不觉得cpu 给占用了很多呢。

atian25 7楼•7 年前

那就不知道了，我们很少用一个核跑这么多应用，会竞争的。