Wednesday 18 June 2014

Centos 6.4 kernel: swapper: page allocation failure. order:5, mode:0x20

Morning prod db server encounter error message as:


Jun 18 09:56:43 pdb06 kernel: swapper: page allocation failure. order:5, mode:0x20
Jun 18 09:56:43 pdb06 kernel: Pid: 0, comm: swapper Not tainted 2.6.32-358.11.1.el6.x86_64 #1
Jun 18 09:56:43 pdb06 kernel: Call Trace:
Jun 18 09:56:43 pdb06 kernel: <IRQ>  [<ffffffff8112c157>] ? __alloc_pages_nodemask+0x757/0x8d0
Jun 18 09:56:43 pdb06 kernel: [<ffffffff810a23d9>] ? ktime_get+0x69/0xf0
Jun 18 09:56:43 pdb06 kernel: [<ffffffff81166b02>] ? kmem_getpages+0x62/0x170
Jun 18 09:56:43 pdb06 kernel: [<ffffffff8116771a>] ? fallback_alloc+0x1ba/0x270
Jun 18 09:56:43 pdb06 kernel: [<ffffffff8116716f>] ? cache_grow+0x2cf/0x320
Jun 18 09:56:43 pdb06 kernel: [<ffffffff81167499>] ? ____cache_alloc_node+0x99/0x160
Jun 18 09:56:43 pdb06 kernel: [<ffffffff81168660>] ? kmem_cache_alloc_node_trace+0x90/0x200
Jun 18 09:56:43 pdb06 kernel: [<ffffffff8116887d>] ? __kmalloc_node+0x4d/0x60
Jun 18 09:56:43 pdb06 kernel: [<ffffffff8143da8d>] ? __alloc_skb+0x6d/0x190
Jun 18 09:56:43 pdb06 kernel: [<ffffffff8143ebb0>] ? skb_copy+0x40/0xb0
Jun 18 09:56:43 pdb06 kernel: [<ffffffffa01d527c>] ? tg3_start_xmit+0xa8c/0xd50 [tg3]
Jun 18 09:56:43 pdb06 kernel: [<ffffffff81449098>] ? dev_hard_start_xmit+0x308/0x530
Jun 18 09:56:43 pdb06 kernel: [<ffffffff8146742a>] ? sch_direct_xmit+0x15a/0x1c0
Jun 18 09:56:43 pdb06 kernel: [<ffffffff8144cda0>] ? dev_queue_xmit+0x3b0/0x550
Jun 18 09:56:43 pdb06 kernel: [<ffffffffa02ffed7>] ? bond_dev_queue_xmit+0x67/0x200 [bonding]
Jun 18 09:56:43 pdb06 kernel: [<ffffffffa0300361>] ? bond_start_xmit+0x2f1/0x5d0 [bonding]
Jun 18 09:56:43 pdb06 kernel: [<ffffffff81449098>] ? dev_hard_start_xmit+0x308/0x530
Jun 18 09:56:43 pdb06 kernel: [<ffffffff8122b949>] ? selinux_ipv4_postroute+0x19/0x20
Jun 18 09:56:43 pdb06 kernel: [<ffffffff8144cbf5>] ? dev_queue_xmit+0x205/0x550
Jun 18 09:56:43 pdb06 kernel: [<ffffffff814854e8>] ? ip_finish_output+0x148/0x310
Jun 18 09:56:43 pdb06 kernel: [<ffffffff81485768>] ? ip_output+0xb8/0xc0
Jun 18 09:56:43 pdb06 kernel: [<ffffffff81484a2f>] ? __ip_local_out+0x9f/0xb0
Jun 18 09:56:43 pdb06 kernel: [<ffffffff81484a65>] ? ip_local_out+0x25/0x30
Jun 18 09:56:43 pdb06 kernel: [<ffffffff81484f40>] ? ip_queue_xmit+0x190/0x420
Jun 18 09:56:43 pdb06 kernel: [<ffffffff81499c2e>] ? tcp_transmit_skb+0x40e/0x7b0
Jun 18 09:56:43 pdb06 kernel: [<ffffffff8149c03b>] ? tcp_write_xmit+0x1fb/0xa20
Jun 18 09:56:43 pdb06 kernel: [<ffffffff8149c9f0>] ? __tcp_push_pending_frames+0x30/0xe0
Jun 18 09:56:43 pdb06 kernel: [<ffffffff81494483>] ? tcp_data_snd_check+0x33/0x100
Jun 18 09:56:43 pdb06 kernel: [<ffffffff814980cd>] ? tcp_rcv_established+0x3ed/0x800
Jun 18 09:56:43 pdb06 kernel: [<ffffffff814a00c3>] ? tcp_v4_do_rcv+0x2e3/0x430
Jun 18 09:56:43 pdb06 kernel: [<ffffffff814a00c3>] ? tcp_v4_do_rcv+0x2e3/0x430
Jun 18 09:56:43 pdb06 kernel: [<ffffffff81497f74>] ? tcp_rcv_established+0x294/0x800
Jun 18 09:56:43 pdb06 kernel: [<ffffffff814a194e>] ? tcp_v4_rcv+0x4fe/0x8d0
Jun 18 09:56:43 pdb06 kernel: [<ffffffff8147f6ed>] ? ip_local_deliver_finish+0xdd/0x2d0
Jun 18 09:56:43 pdb06 kernel: [<ffffffff8147f978>] ? ip_local_deliver+0x98/0xa0
Jun 18 09:56:43 pdb06 kernel: [<ffffffff8147ee3d>] ? ip_rcv_finish+0x12d/0x440
Jun 18 09:56:43 pdb06 kernel: [<ffffffff81167380>] ? cache_alloc_refill+0x1c0/0x240
Jun 18 09:56:43 pdb06 kernel: [<ffffffff8147f3c5>] ? ip_rcv+0x275/0x350
Jun 18 09:56:43 pdb06 kernel: [<ffffffff8144858b>] ? __netif_receive_skb+0x4ab/0x750
Jun 18 09:56:43 pdb06 kernel: [<ffffffff8149ec6a>] ? tcp4_gro_receive+0x5a/0xd0
Jun 18 09:56:43 pdb06 kernel: [<ffffffff8144a968>] ? netif_receive_skb+0x58/0x60
Jun 18 09:56:43 pdb06 kernel: [<ffffffff8144aa70>] ? napi_skb_finish+0x50/0x70
Jun 18 09:56:43 pdb06 kernel: [<ffffffff8144d019>] ? napi_gro_receive+0x39/0x50
Jun 18 09:56:43 pdb06 kernel: [<ffffffffa01d1c18>] ? tg3_poll_work+0x788/0xe50 [tg3]
Jun 18 09:56:43 pdb06 kernel: [<ffffffffa01d232c>] ? tg3_poll_msix+0x4c/0x150 [tg3]
Jun 18 09:56:43 pdb06 kernel: [<ffffffff8144d133>] ? net_rx_action+0x103/0x2f0
Jun 18 09:56:43 pdb06 kernel: [<ffffffff81076fb1>] ? __do_softirq+0xc1/0x1e0
Jun 18 09:56:43 pdb06 kernel: [<ffffffff810e1670>] ? handle_IRQ_event+0x60/0x170
Jun 18 09:56:43 pdb06 kernel: [<ffffffff8100c1cc>] ? call_softirq+0x1c/0x30
Jun 18 09:56:43 pdb06 kernel: [<ffffffff8100de05>] ? do_softirq+0x65/0xa0
Jun 18 09:56:43 pdb06 kernel: [<ffffffff81076d95>] ? irq_exit+0x85/0x90
Jun 18 09:56:43 pdb06 kernel: [<ffffffff815171c5>] ? do_IRQ+0x75/0xf0
Jun 18 09:56:43 pdb06 kernel: [<ffffffff8100b9d3>] ? ret_from_intr+0x0/0x11
Jun 18 09:56:43 pdb06 kernel: <EOI>  [<ffffffff812d39fe>] ? intel_idle+0xde/0x170
Jun 18 09:56:43 pdb06 kernel: [<ffffffff812d39e1>] ? intel_idle+0xc1/0x170
Jun 18 09:56:43 pdb06 kernel: [<ffffffff814152d7>] ? cpuidle_idle_call+0xa7/0x140
Jun 18 09:56:43 pdb06 kernel: [<ffffffff81009fc6>] ? cpu_idle+0xb6/0x110
Jun 18 09:56:43 pdb06 kernel: [<ffffffff8150704c>] ? start_secondary+0x2ac/0x2ef


[root@pdb06 log]# free -m
             total       used       free     shared    buffers     cached
Mem:         64386      64062        324          0        238      21831
-/+ buffers/cache:      41992      22394
Swap:       127999        991     127008


only 324 M free memory, check the system variable zone_reclaim_mode
[root@pdb06 ~]# cat /proc/sys/vm/zone_reclaim_mode
0


0 means no zone reclaim occurs


try to reset zone_reclaim_mode
sysctl -w vm.zone_reclaim_mode=1


check again:
[root@pdb06 ~]# sysctl -w vm.zone_reclaim_mode=1
vm.zone_reclaim_mode = 1
[root@pdb06 ~]# cat /proc/sys/vm/zone_reclaim_mode
1


and vi /etc/sysctl.conf, add vm.zone_reclaim_mode=1 to the file.


#Zone_reclaim_mode allows someone to set more or less aggressive approaches to
#reclaim memory when a zone runs out of memory. If it is set to zero then no
#zone reclaim occurs. Allocations will be satisfied from other zones / nodes
#in the system.

#This is value ORed together of

#1 = Zone reclaim on
#2 = Zone reclaim writes dirty pages out
#4 = Zone reclaim swaps pages




save and exit;


check db server free memory again:


[root@pdb06 ~]# free -m
             total       used       free     shared    buffers     cached
Mem:         64386      51486      12900          0        223       9348
-/+ buffers/cache:      41914      22472
Swap:       127999       1000     126999


now it got 12900M free memory alr, looks good.











Tuesday 17 June 2014

mytop Error in option spec: "long|!"

[root@localhost ~]# mytop
Error in option spec: "long|!"
[root@localhost ~]# vi /usr/local/bin/mytop
[root@localhost ~]# find / -name mytop
/usr/bin/mytop
[root@localhost ~]# vi /usr/bin/mytop


find the line
"long|!"              => \$config{long_nums},


and comment it with #




GetOptions(
    "color!"              => \$config{color},
    "user|u=s"            => \$config{user},
    "pass|password|p=s"   => \$config{pass},
    "database|db|d=s"     => \$config{db},
    "host|h=s"            => \$config{host},
    "port|P=i"            => \$config{port},
    "socket|S=s"          => \$config{socket},
    "delay|s=i"           => \$config{delay},
    "batch|batchmode|b"   => \$config{batchmode},
    "header!"             => \$config{header},
    "idle|i"              => \$config{idle},
    "resolve|r"           => \$config{resolve},
    "prompt!"             => \$config{prompt},
    #"long|!"              => \$config{long_nums},
);


save and exit, it works.