Percona-Tookit工具包之pt-stalk

 

Preface

 

    We have a lot of methods to diagnose
problems in our system such as strace,pstack,gstack,gdb,pt-pmp,etc.But
sometimes there will be some fitful performance issues which are not so
easy to trace.Thus,pt-stalk may help us in diagnosing these kind of
problems.

 

Introduce

 

    pt-stalk is a tool to collect detail
diagnostic data base on triggers you specified such as
gdb,oprofile,strace,tcpdump.The trigger is not the conception of trigger
in database.They’re different at all.pt-stalk provides various options
to collect comprehensive data you need.It’s really useful and
helpful in performance diagnosing.Let’s see some details of it.

 

Procedure

 

Usage

1 pt-stalk [OPTIONS]

 

**Common Parameters**

 1 --collect //Collect diagostic data based on spedified triggers.
 2 --collect-gdb //Collect gdb data.It prints stack traces from all threads.
 3 --collect-oprofile //Collect oprofile data.
 4 --collect-strace //Collect strace data.Do not specify it when "--collect-gdb" has been used.
 5 --collect-tcpdump //Collect tcpdump data.
 6 --cycles //Limite the times when triggering condition occurs(default "5").
 7 --dest //Specify the position to store diagnostic data(default "/var/lib/pt-stalk").
 8 --disk-bytes-free //Prevent the lack of least disk space.It does not collect data until it has the necessary disk space(default "100M",valid surffixes of unit is k,M,G and T).
 9 --disk-pct-free //It's similar with "--disk-bytes-free" but specify the percentage of disk space.
10 --function //Specify the contents to watch for the triggers(default "status",other value is "processlist" or <yourfilename>).
11 --iterations //Limit the times to collect.It will run forever if not give a specific value.
12 --log //Specify the postion to record logs(default "/var/log/pt-stalk.log").It only generates when deamonized.
13 --match //The pattern to specify when using "--function processlist".
14 --mysql-only //Merely collect the MySQL relevant diagnostic data but disk space is the exception.
15 --retention-time //Specify the purge days of diagnostic data(default "30").
16 --run-time //Specify how many seconds to collect diagnostic data(default "30").It should not be longer than the value of "--sleep".
17 --sleep //Specify how many seconds to sleep after last collection.It used to prevent too many operations of collection(default "300").
18 --stalk //Watch and wait for trigger occur(default "yes").
19 --no_stalk //The option can be specify if you want to collect diagnostic data immediately without waiting for trigger to occur.
20 --threshold //Specify the maximum value for collecting(default "25").
21 --variable //Specify the variables to compared against(default "Threads_running").

 

Examples

 

Generate benchmark with
sysbench.

 1 [root@zlm2 08:07:54 ~/sysbench-1.0/src/lua]
 2 #sysbench oltp_read_write.lua --mysql-host=192.168.1.101 --mysql-port=3306 --mysql-user=zlm --mysql-password=aaron8219 --mysql-db=sysbench --tables=10 --table-size=100000 --mysql-storage-engine=innodb cleanup
 3 sysbench 1.0.15 (using bundled LuaJIT 2.1.0-beta2)
 4 
 5 Dropping table 'sbtest1'...
 6 Dropping table 'sbtest2'...
 7 Dropping table 'sbtest3'...
 8 Dropping table 'sbtest4'...
 9 Dropping table 'sbtest5'...
10 Dropping table 'sbtest6'...
11 Dropping table 'sbtest7'...
12 Dropping table 'sbtest8'...
13 Dropping table 'sbtest9'...
14 Dropping table 'sbtest10'...
15 
16 [root@zlm2 08:08:08 ~/sysbench-1.0/src/lua]
17 #sysbench oltp_read_write.lua --mysql-host=192.168.1.101 --mysql-port=3306 --mysql-user=zlm --mysql-password=aaron8219 --mysql-db=sysbench --tables=10 --table-size=100000 --mysql-storage-engine=innodb prepare
18 sysbench 1.0.15 (using bundled LuaJIT 2.1.0-beta2)
19 
20 Creating table 'sbtest1'...
21 Inserting 100000 records into 'sbtest1'
22 Creating a secondary index on 'sbtest1'...
23 Creating table 'sbtest2'...
24 Inserting 100000 records into 'sbtest2'
25 Creating a secondary index on 'sbtest2'...
26 Creating table 'sbtest3'...
27 Inserting 100000 records into 'sbtest3'
28 Creating a secondary index on 'sbtest3'...
29 Creating table 'sbtest4'...
30 Inserting 100000 records into 'sbtest4'
31 Creating a secondary index on 'sbtest4'...
32 Creating table 'sbtest5'...
33 Inserting 100000 records into 'sbtest5'
34 Creating a secondary index on 'sbtest5'...
35 Creating table 'sbtest6'...
36 Inserting 100000 records into 'sbtest6'
37 Creating a secondary index on 'sbtest6'...
38 Creating table 'sbtest7'...
39 Inserting 100000 records into 'sbtest7'
40 Creating a secondary index on 'sbtest7'...
41 Creating table 'sbtest8'...
42 Inserting 100000 records into 'sbtest8'
43 Creating a secondary index on 'sbtest8'...
44 Creating table 'sbtest9'...
45 Inserting 100000 records into 'sbtest9'
46 Creating a secondary index on 'sbtest9'...
47 Creating table 'sbtest10'...
48 Inserting 100000 records into 'sbtest10'
49 Creating a secondary index on 'sbtest10'...
50 
51 [root@zlm2 08:09:23 ~/sysbench-1.0/src/lua]
52 #sysbench oltp_read_write.lua --mysql-host=192.168.1.101 --mysql-port=3306 --mysql-user=zlm --mysql-password=aaron8219 --mysql-db=sysbench --threads=3 --time=3600 --report-interval=10 --rand-type=uniform run
53 sysbench 1.0.15 (using bundled LuaJIT 2.1.0-beta2)
54 
55 Running the test with following options:
56 Number of threads: 3
57 Report intermediate results every 10 second(s)
58 Initializing random number generator from current time
59 
60 
61 Initializing worker threads...
62 
63 Threads started!
64 
65 [ 10s ] thds: 3 tps: 258.68 qps: 5176.49 (r/w/o: 3624.11/1034.82/517.56) lat (ms,95%): 15.83 err/s: 0.00 reconn/s: 0.00
66 [ 20s ] thds: 3 tps: 286.25 qps: 5726.15 (r/w/o: 4008.67/1144.89/572.60) lat (ms,95%): 15.00 err/s: 0.00 reconn/s: 0.00
67 [ 30s ] thds: 3 tps: 270.82 qps: 5416.13 (r/w/o: 3790.80/1083.69/541.64) lat (ms,95%): 16.12 err/s: 0.00 reconn/s: 0.00
68 [ 40s ] thds: 3 tps: 280.98 qps: 5619.75 (r/w/o: 3934.26/1123.53/561.97) lat (ms,95%): 16.12 err/s: 0.00 reconn/s: 0.00
69 [ 50s ] thds: 3 tps: 298.42 qps: 5968.38 (r/w/o: 4177.83/1193.70/596.85) lat (ms,95%): 14.46 err/s: 0.00 reconn/s: 0.00
70 [ 60s ] thds: 3 tps: 278.88 qps: 5578.20 (r/w/o: 3904.92/1115.52/557.76) lat (ms,95%): 15.83 err/s: 0.00 reconn/s: 0.00
71 [ 70s ] thds: 3 tps: 280.91 qps: 5617.66 (r/w/o: 3932.21/1123.63/561.82) lat (ms,95%): 15.83 err/s: 0.00 reconn/s: 0.00
72 [ 80s ] thds: 3 tps: 281.68 qps: 5632.83 (r/w/o: 3942.77/1126.71/563.35) lat (ms,95%): 16.12 err/s: 0.00 reconn/s: 0.00
73 [ 90s ] thds: 3 tps: 281.60 qps: 5631.55 (r/w/o: 3942.07/1126.39/563.10) lat (ms,95%): 16.12 err/s: 0.00 reconn/s: 0.00
74 [ 100s ] thds: 3 tps: 287.62 qps: 5753.86 (r/w/o: 4028.02/1150.49/575.35) lat (ms,95%): 15.55 err/s: 0.00 reconn/s: 0.00
75 [ 110s ] thds: 3 tps: 308.99 qps: 6180.45 (r/w/o: 4326.12/1236.35/617.97) lat (ms,95%): 13.95 err/s: 0.00 reconn/s: 0.00
76 ... //Omitted.

 

**Collect diagnostic data using
pt-stalk.**

 1 [root@zlm2 08:14:36 /data/mysql/mysql3306]
 2 #pt-stalk --host localhost --port 3306 --user root --password Passw0rd --collect-gdb --cycles 1 --variable Threads_connect --threshold 2
 3 mysql: [Warning] Using a password on the command line interface can be insecure.
 4 2018_07_09_08_15_46 Starting /usr/bin/pt-stalk --function=status --variable=Threads_connect --threshold=2 --match= --cycles=1 --interval=1 --iterations= --run-time=30 --sleep=300 --dest=/var/lib/pt-stalk --prefix= --notify-by-email= --log=/var/log/pt-stalk.log --pid=/var/run/pt-stalk.pid --plugin=
 5 mysqladmin: [Warning] Using a password on the command line interface can be insecure.
 6 2018_07_09_08_15_46 Detected value is empty; something failed?  Trigger exit status: 0
 7 mysqladmin: [Warning] Using a password on the command line interface can be insecure.
 8 2018_07_09_08_15_47 Detected value is empty; something failed?  Trigger exit status: 0
 9 mysqladmin: [Warning] Using a password on the command line interface can be insecure.
10 2018_07_09_08_15_48 Detected value is empty; something failed?  Trigger exit status: 0
11 mysqladmin: [Warning] Using a password on the command line interface can be insecure.
12 2018_07_09_08_15_50 Detected value is empty; something failed?  Trigger exit status: 0
13 mysqladmin: [Warning] Using a password on the command line interface can be insecure.
14 2018_07_09_08_15_51 Detected value is empty; something failed?  Trigger exit status: 0
15 mysqladmin: [Warning] Using a password on the command line interface can be insecure.
16 2018_07_09_08_15_52 Detected value is empty; something failed?  Trigger exit status: 0
17 mysqladmin: [Warning] Using a password on the command line interface can be insecure.
18 2018_07_09_08_15_53 Detected value is empty; something failed?  Trigger exit status: 0
19 mysqladmin: [Warning] Using a password on the command line interface can be insecure.
20 2018_07_09_08_15_54 Detected value is empty; something failed?  Trigger exit status: 0
21 mysqladmin: [Warning] Using a password on the command line interface can be insecure.
22 2018_07_09_08_15_55 Detected value is empty; something failed?  Trigger exit status: 0
23 mysqladmin: [Warning] Using a password on the command line interface can be insecure.
24 2018_07_09_08_15_56 Detected value is empty; something failed?  Trigger exit status: 0
25 ^C2018_07_09_08_15_57 Caught signal, exiting
26 2018_07_09_08_15_57 All subprocesses have finished
27 2018_07_09_08_15_57 Exiting because OKTORUN is false
28 2018_07_09_08_15_57 /usr/bin/pt-stalk exit status 1
29 
30 //The value of "--variable" should be "Threads_connected".
31 //Modify the correct value and run it again.
32 
33 [root@zlm2 08:15:57 /data/mysql/mysql3306]
34 #pt-stalk --host localhost --port 3306 --user root --password Passw0rd --collect-gdb --cycles 1 --variable Threads_connected --threshold 2
35 mysql: [Warning] Using a password on the command line interface can be insecure.
36 2018_07_09_08_19_39 Starting /usr/bin/pt-stalk --function=status --variable=Threads_connected --threshold=2 --match= --cycles=1 --interval=1 --iterations= --run-time=30 --sleep=300 --dest=/var/lib/pt-stalk --prefix= --notify-by-email= --log=/var/log/pt-stalk.log --pid=/var/run/pt-stalk.pid --plugin=
37 mysqladmin: [Warning] Using a password on the command line interface can be insecure.
38 2018_07_09_08_19_39 Check results: status(Threads_connected)=5, matched=yes, cycles_true=1
39 2018_07_09_08_19_39 Collect 1 triggered
40 2018_07_09_08_19_39 Collect 1 PID 5921
41 2018_07_09_08_19_39 Collect 1 done
42 2018_07_09_08_19_39 Sleeping 300 seconds after collect
43 mysqladmin: [Warning] Using a password on the command line interface can be insecure.
44 2018_07_09_08_24_39 Check results: status(Threads_connected)=6, matched=yes, cycles_true=1
45 2018_07_09_08_24_39 Collect 2 triggered
46 2018_07_09_08_24_39 Collect 2 PID 7398
47 2018_07_09_08_24_39 Collect 2 done
48 2018_07_09_08_24_40 Sleeping 300 seconds after collect
49 ^C2018_07_09_08_25_56 Caught signal, exiting //Terminate to collect diagnostic data by "Ctrl+C"
50 2018_07_09_08_25_56 Waiting up to 90 seconds for subprocesses to finish...
51 2018_07_09_08_25_56 Exiting because OKTORUN is false
52 2018_07_09_08_25_56 /usr/bin/pt-stalk exit status 1

 

**Check the value of “show global status
… “.**

 1 (root@localhost mysql3306.sock)[sysbench]>show global status like '%Threads%';
 2 +------------------------+-------+
 3 | Variable_name          | Value |
 4 +------------------------+-------+
 5 | Delayed_insert_threads | 0     |
 6 | Slow_launch_threads    | 0     |
 7 | Threads_cached         | 1     |
 8 | Threads_connected      | 5     | //Threads_connected has been up to 5.
 9 | Threads_created        | 5     |
10 | Threads_running        | 1     |
11 +------------------------+-------+
12 6 rows in set (0.04 sec)
13 
14 
15 (root@localhost mysql3306.sock)[sysbench]>show global status like '%Threads%';
16 +------------------------+-------+
17 | Variable_name          | Value |
18 +------------------------+-------+
19 | Delayed_insert_threads | 0     |
20 | Slow_launch_threads    | 0     |
21 | Threads_cached         | 3     |
22 | Threads_connected      | 6     | //Second time,Threads_connected has been up to 6.
23 | Threads_created        | 9     |
24 | Threads_running        | 1     |
25 +------------------------+-------+
26 6 rows in set (0.01 sec)

 

**Check the output files in default
directory(“/var/lib/pt-stalk”).**

 1 [root@zlm2 08:31:41 /var/lib/pt-stalk]
 2 #ls -lrt
 3 total 3928
 4 -rw-r--r-- 1 root root     383 Jul  9 08:19 2018_07_09_08_19_39-trigger
 5 -rw-r--r-- 1 root root    9845 Jul  9 08:19 2018_07_09_08_19_39-pmap
 6 -rw-r--r-- 1 root root   16554 Jul  9 08:19 2018_07_09_08_19_39-variables
 7 -rw-r--r-- 1 root root    4352 Jul  9 08:19 2018_07_09_08_19_39-innodbstatus1
 8 -rw-r--r-- 1 root root     384 Jul  9 08:19 2018_07_09_08_19_39-mutex-status1
 9 -rw-r--r-- 1 root root    8507 Jul  9 08:19 2018_07_09_08_19_39-ps
10 -rw-r--r-- 1 root root   10356 Jul  9 08:19 2018_07_09_08_19_39-lsof
11 -rw-r--r-- 1 root root    5280 Jul  9 08:19 2018_07_09_08_19_39-opentables1
12 -rw-r--r-- 1 root root    8204 Jul  9 08:19 2018_07_09_08_19_39-top
13 -rw-r--r-- 1 root root   28343 Jul  9 08:19 2018_07_09_08_19_39-sysctl
14 -rw-r--r-- 1 root root     139 Jul  9 08:20 2018_07_09_08_19_39-disk-space
15 -rw-r--r-- 1 root root 1084769 Jul  9 08:20 2018_07_09_08_19_39-mysqladmin
16 -rw-r--r-- 1 root root    2782 Jul  9 08:20 2018_07_09_08_19_39-vmstat
17 -rw-r--r-- 1 root root   24750 Jul  9 08:20 2018_07_09_08_19_39-procstat
18 -rw-r--r-- 1 root root   14403 Jul  9 08:20 2018_07_09_08_19_39-diskstats
19 -rw-r--r-- 1 root root   72469 Jul  9 08:20 2018_07_09_08_19_39-procvmstat
20 -rw-r--r-- 1 root root   51960 Jul  9 08:20 2018_07_09_08_19_39-netstat_s
21 -rw-r--r-- 1 root root  296850 Jul  9 08:20 2018_07_09_08_19_39-slabinfo
22 -rw-r--r-- 1 root root   33210 Jul  9 08:20 2018_07_09_08_19_39-interrupts
23 -rw-r--r-- 1 root root   48390 Jul  9 08:20 2018_07_09_08_19_39-netstat
24 -rw-r--r-- 1 root root   37260 Jul  9 08:20 2018_07_09_08_19_39-meminfo
25 -rw-r--r-- 1 root root   17100 Jul  9 08:20 2018_07_09_08_19_39-df
26 -rw-r--r-- 1 root root   40753 Jul  9 08:20 2018_07_09_08_19_39-processlist
27 -rw-r--r-- 1 root root    4555 Jul  9 08:20 2018_07_09_08_19_39-innodbstatus2
28 -rw-r--r-- 1 root root   74476 Jul  9 08:20 2018_07_09_08_19_39-transactions
29 -rw-r--r-- 1 root root       5 Jul  9 08:20 2018_07_09_08_19_39-hostname
30 -rw-r--r-- 1 root root     385 Jul  9 08:20 2018_07_09_08_19_39-mutex-status2
31 -rw-r--r-- 1 root root    5280 Jul  9 08:20 2018_07_09_08_19_39-opentables2
32 -rw-r--r-- 1 root root     325 Jul  9 08:20 2018_07_09_08_19_39-vmstat-overall
33 -rw-r--r-- 1 root root   24763 Jul  9 08:20 2018_07_09_08_19_39-output
34 
35 //There're a series of files(starts with "tigger" and ends with "output") each time we collect the diagnostic data.
36 
37 -rw-r--r-- 1 root root     383 Jul  9 08:24 2018_07_09_08_24_39-trigger
38 -rw-r--r-- 1 root root   10251 Jul  9 08:24 2018_07_09_08_24_39-pmap
39 -rw-r--r-- 1 root root   16554 Jul  9 08:24 2018_07_09_08_24_39-variables
40 -rw-r--r-- 1 root root    4353 Jul  9 08:24 2018_07_09_08_24_39-innodbstatus1
41 -rw-r--r-- 1 root root    8583 Jul  9 08:24 2018_07_09_08_24_39-ps
42 -rw-r--r-- 1 root root     386 Jul  9 08:24 2018_07_09_08_24_39-mutex-status1
43 -rw-r--r-- 1 root root   28343 Jul  9 08:24 2018_07_09_08_24_39-sysctl
44 -rw-r--r-- 1 root root   10534 Jul  9 08:24 2018_07_09_08_24_39-lsof
45 -rw-r--r-- 1 root root    5280 Jul  9 08:24 2018_07_09_08_24_39-opentables1
46 -rw-r--r-- 1 root root    8053 Jul  9 08:24 2018_07_09_08_24_39-top
47 -rw-r--r-- 1 root root     139 Jul  9 08:25 2018_07_09_08_24_39-disk-space
48 -rw-r--r-- 1 root root    2783 Jul  9 08:25 2018_07_09_08_24_39-vmstat
49 -rw-r--r-- 1 root root 1084769 Jul  9 08:25 2018_07_09_08_24_39-mysqladmin
50 -rw-r--r-- 1 root root   24899 Jul  9 08:25 2018_07_09_08_24_39-procstat
51 -rw-r--r-- 1 root root   48390 Jul  9 08:25 2018_07_09_08_24_39-netstat
52 -rw-r--r-- 1 root root  296850 Jul  9 08:25 2018_07_09_08_24_39-slabinfo
53 -rw-r--r-- 1 root root   52230 Jul  9 08:25 2018_07_09_08_24_39-netstat_s
54 -rw-r--r-- 1 root root   33210 Jul  9 08:25 2018_07_09_08_24_39-interrupts
55 -rw-r--r-- 1 root root   37260 Jul  9 08:25 2018_07_09_08_24_39-meminfo
56 -rw-r--r-- 1 root root   14460 Jul  9 08:25 2018_07_09_08_24_39-diskstats
57 -rw-r--r-- 1 root root   17100 Jul  9 08:25 2018_07_09_08_24_39-df
58 -rw-r--r-- 1 root root   72914 Jul  9 08:25 2018_07_09_08_24_39-procvmstat
59 -rw-r--r-- 1 root root   44370 Jul  9 08:25 2018_07_09_08_24_39-processlist
60 -rw-r--r-- 1 root root    4445 Jul  9 08:25 2018_07_09_08_24_39-innodbstatus2
61 -rw-r--r-- 1 root root   76282 Jul  9 08:25 2018_07_09_08_24_39-transactions
62 -rw-r--r-- 1 root root       5 Jul  9 08:25 2018_07_09_08_24_39-hostname
63 -rw-r--r-- 1 root root     387 Jul  9 08:25 2018_07_09_08_24_39-mutex-status2
64 -rw-r--r-- 1 root root    5280 Jul  9 08:25 2018_07_09_08_24_39-opentables2
65 -rw-r--r-- 1 root root     325 Jul  9 08:25 2018_07_09_08_24_39-vmstat-overall
66 -rw-r--r-- 1 root root   24763 Jul  9 08:25 2018_07_09_08_24_39-output
67 
68 [root@zlm2 08:33:37 /var/lib/pt-stalk]
69 #cat 2018_07_09_08_24_39-trigger
70 2018_07_09_08_24_39 Check results: status(Threads_connected)=6, matched=yes, cycles_true=1
71 2018_07_09_08_24_39 pt-stalk ran with --function=status --variable=Threads_connected --threshold=2 --match= --cycles=1 --interval=1 --iterations= --run-time=30 --sleep=300 --dest=/var/lib/pt-stalk --prefix= --notify-by-email= --log=/var/log/pt-stalk.log --pid=/var/run/pt-stalk.pid --plugin=
72 
73 //The trigger file records the options we have used.
74 
75 [root@zlm2 08:34:17 /var/lib/pt-stalk]
76 #cat 2018_07_09_08_24_39-vmstat-overall
77 procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
78  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
79 18  2   9692  60716      0 583316    0    0    11   139   92  597  4  1 96  0  0
80  4  0  10440  78828      0 569672    0   25    89  1965 1455 12046 80 20  0  0  0
81  
82  //The vmstat-overall records the vmstat output.
83  //each file contains the relevant diagnostic data about what their name called.I'm not going to demonstrate all of them.

 

**Take care of you remain disk
space,pt-stalk won’t run anymore if you’re out of
space.**

 1 [root@zlm2 08:25:56 /data/mysql/mysql3306]
 2 #pt-stalk --host localhost --port 3306 --user root --password Passw0rd --collect --cycles 1 --variable Threads_connected --threshold 2 --sleep 30 --demonize
 3 Cannot open /tmp/pt-stalk.8940.FSboRq/po/daemonize: No space left on device at -e line 13, <$fh> chunk 274.
 4 No long attribute in option spec /tmp/pt-stalk.8940.FSboRq/po/ask-pass
 5 
 6 [root@zlm2 08:51:27 /data/mysql/mysql3306]
 7 #df -h
 8 ]Filesystem               Size  Used Avail Use% Mounted on
 9 /dev/mapper/centos-root  8.4G  8.4G   20K 100% / //The root of linux has been out of space.
10 devtmpfs                 488M     0  488M   0% /dev
11 tmpfs                    497M     0  497M   0% /dev/shm
12 tmpfs                    497M  6.6M  491M   2% /run
13 tmpfs                    497M     0  497M   0% /sys/fs/cgroup
14 /dev/sda1                497M  118M  379M  24% /boot
15 none                      87G   81G  5.9G  94% /vagrant

 

Summary

  • pt-stalk
    is another tool in diagnosing system performance and similar with
    pt-pmp but not the same.
  • pt-stalk
    will generate many statistic files which can really help you
    collecting information in almost every aspect.
  • pt-stalk
    has four main triggers:gdb,oprofile,strace,tcpdump.It’s flexible to
    collect data with diffrent dimensionality.