User Tools

Site Tools


new_server

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
new_server [2017/07/16 20:41] joshnew_server [2018/01/16 10:35] (current) josh
Line 1: Line 1:
 ====== Old Hardware ====== ====== Old Hardware ======
 +
 +  * AMD Athlon 64 X2 6000+ 3.0GHz 125W
 +  * 8GB DDR3
  
 ====== New Hardware ====== ====== New Hardware ======
 +
 +  * AMD Ryzen 1600 3.2GHz 65W
 +  * 32GB DDR4 2400
 +  * APC BN700MC Back-UPS 700 VA (420W)
 +  * ATI Radeon X600 ($4.99 cheap PCIe graphics card)
  
 ===== Configurations ===== ===== Configurations =====
Line 17: Line 25:
 ====== Benchmarks ====== ====== Benchmarks ======
  
-Test Old New (US) New (FS) | +Test Old New (US) New (FS) ^ 
-| Idle Power | 65W | | | +| Idle Power | 65W | | 56W (w/ video card) 
-| Full Load Power | 187W | | |+| Full Load Power | 187W | | 132W |
 | dd if=/dev/urandom of=outfile bs=1000000 count=1000 | 1m1.045s (16.4 MB/s) | 14.425s (69.3 MB/s) | 14.543s (68.8 MB/s) | | dd if=/dev/urandom of=outfile bs=1000000 count=1000 | 1m1.045s (16.4 MB/s) | 14.425s (69.3 MB/s) | 14.543s (68.8 MB/s) |
 | dd if=outfile of=/dev/null bs=1000000 \\ (after reboot to clear disk cache) | 6.556s (153 MB/s) | 1.058s (947 MB/s) | 0.944s (1.1 GB/s) | | dd if=outfile of=/dev/null bs=1000000 \\ (after reboot to clear disk cache) | 6.556s (153 MB/s) | 1.058s (947 MB/s) | 0.944s (1.1 GB/s) |
Line 81: Line 89:
   Latency              2731us     267us     257us     122us      25us     122us   Latency              2731us     267us     257us     122us      25us     122us
   1.97,1.97,hathor,1,1500205275,3G,,1591,99,195864,9,192590,10,+++++,+++,2287940,56,+++++,+++,16,,,,,+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++,13792us,3537us,3623us,3168us,1356us,1846us,2731us,267us,257us,122us,25us,122us   1.97,1.97,hathor,1,1500205275,3G,,1591,99,195864,9,192590,10,+++++,+++,2287940,56,+++++,+++,16,,,,,+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++,13792us,3537us,3623us,3168us,1356us,1846us,2731us,267us,257us,122us,25us,122us
- 
- 
-====== Issues ====== 
- 
-===== APC UPS not killing power ===== 
- 
-I think this is a Fedora / systemd problem. apcupsd will detect the AC power failure and initiate a shutdown (shutdown -h -H now via apccontrol). But apccontrol killpower does not seem to be called. Also, during shutdown, systemd reports "A stop job is running for APC UPS Power Control Daemon for Linux". Not sure if this is because the apcupsd service was what kicked off the shutdown command, so it wasn't able to be stopped. 
  
 ====== Installation ====== ====== Installation ======
Line 99: Line 100:
   qemu-kvm \   qemu-kvm \
   apcupsd \   apcupsd \
-  nfs-utils+  nfs-utils 
 +  libvirt-daemon-qemu \ 
 +  libvirt-client \ 
 +  ssmtp
 </code> </code>
   * Edit ''/etc/default/grub'':   * Edit ''/etc/default/grub'':
Line 119: Line 123:
   * ''chkconfig network on''   * ''chkconfig network on''
   * edit ''/etc/apcupsd/apcupsd.conf'' and ''/etc/apcupsd/apccontrol''   * edit ''/etc/apcupsd/apcupsd.conf'' and ''/etc/apcupsd/apccontrol''
-  * systemctl enable apcupsd.service +  * ''systemctl enable apcupsd.service'' 
-  * systemctl enable nfs-server.service +  * ''systemctl enable libvirt-guests'' 
-  * firewall-cmd --add-service=nfs+  * ''systemctl enable nfs-server.service'' 
 +  * ''systemctl enable rpcbind'' 
 +  * ''firewall-cmd --add-service=nfs --permanent'' 
 +  * Do fixes in [[https://bugzilla.redhat.com/show_bug.cgi?id=1472062]] (disable SELinux and reduce stop timeout of apcupsd service to 10s) to allow apcupsd to work properly 
 +  * Configure ssmtp in ''/etc/ssmtp/ssmtp.conf'' 
 + 
 +====== Services ====== 
 + 
 +^ Server ^ Services ^ OS ^ Hardware ^ 
 +| anubis (host) | <WRAP> 
 +  * apcupsd 
 +  * NFS 
 +  * KVM (libvirt-guests) 
 +  * gmail backups 
 +  * NFS backups 
 +</WRAP> | Fedora Server 26 | 32GB RAM; 12 CPU threads | 
 +| oneill (VM) | <WRAP> 
 +  * HTTP 
 +    * wiki 
 +    * tt-rss 
 +    * mythweb (proxy) 
 +    * cameras (proxy) 
 +    * cgit (proxy) 
 +</WRAP> | Ubuntu Server 16.04 | 2GB RAM; 1 VCPU | 
 +| carter (VM) | <WRAP> 
 +  * git-daemon 
 +  * gitolite git hosting over ssh 
 +  * cgit 
 +</WRAP> | Ubuntu Server 16.04 | 2GB RAM; 1 VCPU | 
 +| baal (VM) | <WRAP> 
 +  * openvpn 
 +  * squid proxy 
 +</WRAP> | Fedora Server 26 | 1GB RAM; 1 VCPU | 
 +| hathor (VM) | <WRAP> 
 +  * minetest 
 +</WRAP> | Ubuntu Server 16.04 | 2GB RAM; 1 VCPU | 
 +| ra (VM) | <WRAP> 
 +  * mythtv backend 
 +  * mythweb 
 +</WRAP> | Mythbuntu 16.04 | 4GB RAM; 2 VCPUs | 
 + 
 +====== Issues ====== 
 + 
 +===== APC UPS not killing power ===== 
 + 
 +I think this is a Fedora / systemd problem. apcupsd will detect the AC power failure and initiate a shutdown (shutdown -h -H now via apccontrol). But apccontrol killpower does not seem to be called. Also, during shutdown, systemd reports "A stop job is running for APC UPS Power Control Daemon for Linux". Not sure if this is because the apcupsd service was what kicked off the shutdown command, so it wasn't able to be stopped. 
 + 
 +Bug report filed: [[https://bugzilla.redhat.com/show_bug.cgi?id=1472062]] Problem of not cutting UPS power seems to be due to ''/etc/apcupsd/powerfail'' file not getting created which might be because of SELinux. Other part of the problem (hanging on stopping the apcupsd service) still remains but is worked around by changing the service timeout to 10s. 
 + 
 +===== M.2 PCIe NVMe drive disappearing ===== 
 + 
 +Three times now my server has stopped responding and shows a blank console. After resetting and going into the UEFI setup the Western Digital M.2 drive no longer shows up. After a poweroff and cold boot, the drive reappears in UEFI setup and I have to reset my boot settings to select it as the default boot entry. The system appears to run properly after this until the next time. 
 + 
 +The problem I'm observing seems very similar to this: [[https://superuser.com/questions/1194478/ssd-suddenly-becomes-unreadable-how-to-diagnose]] 
 + 
 +After the third time this happened, on 2017-12-13, I updated the UEFI firmware on the ASRock motherboard to version 3.30. The upgrade was successful, and after resetting my options (particularly re-enabling SVM and power on after AC loss), the system appears to be running properly again now. 
 + 
 +On 2017-12-15 the system froze again. I disabled "C6 Mode" in the UEFI setup and started up again. 
 + 
 +2017-12-17: I have not observed the problem since disabling "C6 Mode", but I have a feeling it will still come back and could be related to NVMe APST modes. Similar problem here: [[https://bbs.archlinux.org/viewtopic.php?id=232692]]. I added kernel parameter: 
 + 
 +<code> 
 +nvme_core.default_ps_max_latency_us=0 
 +</code> 
 + 
 +Now ''nvme get-feature -f 0x0c -H /dev/nvme0n1'' shows APST is disabled. 
 + 
 +Turned "C6 Mode" back on but have not observed the M.2 drive disappearing problem again with APST disabled. 
 + 
 +===== BUG: soft lockup ===== 
 + 
 +A few times after fixing the M.2 drive disappearing issue, my server has frozen. One of the times I caught the console output which listed several messages like "watchdog: BUG: soft lockup - CPU#10 stuck for 22s! [worker:14788]", and one "INFO: rcu_sched detected stalls on CPUs/tasks:" ... "rcu_sched kthread starved for 19150377 jiffies!". I'm not sure what is causing this. 
 + 
 +On 2018-01-08, I upgraded from kernel 4.14.4 to 4.14.11 and added "consoleblank=0" to the kernel command line so if this happens again hopefully I will not lose console output. 
 + 
 +2018-01-09: Igor seems to be experiencing the same bug and pointed me to https://bugzilla.kernel.org/show_bug.cgi?id=196683. I will probably disable C6 in UEFI setup again. 
 + 
 +2018-01-15: Got the freeze again after disabling C6 mode in setup. I looked deeper in setup options and found the buried "Global C-State Control" option that Igor and Mike had disabled so I disabled that as well. No freezes since then.
new_server.1500252065.txt.gz · Last modified: by josh