给精简的 shell 循环遍历代码只传两个参数,处理时间差距 10 倍是为何?内详

2022-07-12 22:25:45 +08:00
 mylovesaber

我今天在写一个 ssh 免密脚本,然后发现执行效率极低,换了种写法,发现一秒以内运行完成,添加提示信息发现了一个没想明白的情况,我提炼了具体情况,还请各位老哥解个惑

精简的代码贴上:

#!/bin/bash
declare -a EXTRAARG=("$@")

MySQLKeywords=(accessible account action add after against aggregate algorithm all alter always analyze and any as asc ascii asensitive at autoextend_size auto_increment avg avg_row_length backup before begin between bigint binary binlog bit blob block bool boolean both btree by byte cache call cascade cascaded case catalog_name chain change changed channel char character charset check checksum cipher class_origin client close coalesce code collate collation column columns column_format column_name comment commit committed compact completion component compressed compression concurrent condition connection consistent constraint constraint_catalog constraint_name constraint_schema contains context continue convert cpu create cross current current_date current_time current_timestamp current_user cursor cursor_name data database databases datafile date datetime day day_hour day_microsecond day_minute day_second deallocate dec decimal declare default default_auth definer delayed delay_key_write delete desc describe deterministic diagnostics directory disable discard disk distinct distinctrow div do double drop dual dumpfile duplicate dynamic each else elseif enable enclosed encryption end ends engine engines enum error errors escape escaped event events every except exchange execute exists exit expansion expire explain export extended extent_size false fast faults fetch fields file file_block_size filter first fixed float float4 float8 flush follows for force foreign format found from full fulltext general generated geometry geometrycollection get get_format global grant grants group group_replication handler hash having help high_priority host hosts hour hour_microsecond hour_minute hour_second identified if ignore ignore_server_ids import in index indexes infile initial_size inner inout insensitive insert insert_method install instance int int1 int2 int3 int4 int8 integer interval into invisible invoker io io_after_gtids io_before_gtids io_thread ipc is isolation issuer iterate join json key keys key_block_size kill language last leading leave leaves left less level like limit linear lines linestring list load local localtime localtimestamp lock locks logfile logs long longblob longtext loop low_priority master master_auto_position master_bind master_connect_retry master_delay master_heartbeat_period master_host master_log_file master_log_pos master_password master_port master_retry_count master_ssl master_ssl_ca master_ssl_capath master_ssl_cert master_ssl_cipher master_ssl_crl master_ssl_crlpath master_ssl_key master_ssl_verify_server_cert master_tls_version master_user match maxvalue max_connections_per_hour max_queries_per_hour max_rows max_size max_updates_per_hour max_user_connections medium mediumblob mediumint mediumtext memory merge message_text microsecond middleint migrate minute minute_microsecond minute_second min_rows mod mode modifies modify month multilinestring multipoint multipolygon mutex mysql_errno name names national natural nchar ndb ndbcluster never new next no nodegroup none not no_wait no_write_to_binlog null number numeric nvarchar offset on one only open optimize optimizer_costs option optionally options or order out outer outfile owner pack_keys page parser partial partition partitioning partitions password phase plugin plugins plugin_dir point polygon port precedes precision prepare preserve prev primary privileges procedure processlist profile profiles proxy purge quarter query quick range read reads read_only read_write real rebuild recover redo_buffer_size redundant references regexp relay relaylog relay_log_file relay_log_pos relay_thread release reload remove rename reorganize repair repeat repeatable replace replicate_do_db replicate_do_table replicate_ignore_db replicate_ignore_table replicate_rewrite_db replicate_wild_do_table replicate_wild_ignore_table replication require reset resignal restore restrict resume return returned_sqlstate returns reverse revoke right rlike rollback rollup rotate routine row_count row_format rtree savepoint schedule schema schemas schema_name second second_microsecond security select sensitive separator serial serializable server session set share show shutdown signal signed simple slave slow smallint snapshot socket some soname sounds source spatial specific sql sqlexception sqlstate sqlwarning sql_after_gtids sql_after_mts_gaps sql_before_gtids sql_big_result sql_buffer_result sql_calc_found_rows sql_no_cache sql_small_result sql_thread sql_tsi_day sql_tsi_hour sql_tsi_minute sql_tsi_month sql_tsi_quarter sql_tsi_second sql_tsi_week sql_tsi_year ssl stacked start starting starts stats_auto_recalc stats_persistent stats_sample_pages status stop storage stored straight_join string subclass_origin subject subpartition subpartitions super suspend swaps switches table tables tablespace table_checksum table_name temporary temptable terminated text than then time timestamp timestampadd timestampdiff tinyblob tinyint tinytext to trailing transaction trigger triggers true truncate type types uncommitted undefined undo undofile undo_buffer_size unicode uninstall union unique unknown unlock unsigned until update upgrade usage use user user_resources use_frm using utc_date utc_time utc_timestamp validation value values varbinary varchar varcharacter variables varying view virtual visible wait warnings week weight_string when where while with without work wrapper write x509 xa xid xml xor year year_month zerofill zone)


for GROUP_NAME in "${EXTRAARG[@]}";do
    GROUP_NAME=$(echo "${GROUP_NAME}"|tr "[:upper:]" "[:lower:]")
    # echo "tr 转换的数组元素: ${GROUP_NAME}"

    # 0.7s 一个纯字母,一个字母中有一个冒号
    if [[ "${GROUP_NAME}" =~ ":" ]]; then
        GROUP_NAME1=$(echo "${GROUP_NAME}"|cut -d':' -f 1)
        GROUP_NAME2=$(echo "${GROUP_NAME}"|cut -d':' -f 2)
        if [ -n "$(echo "${GROUP_NAME}"|cut -d':' -f 3)" ];then
            _error "非法字符: $(echo "${GROUP_NAME}"|cut -d':' -f 3)" && exit 1
        else
            for j in "${MySQLKeywords[@]}";do
                if [ "${GROUP_NAME1}" = "$j" ] || [ "${GROUP_NAME2}" = "$j" ];then
                    _error "参数不能为 MySQL 的关键字或保留字!退出中" && exit 1
                fi
            done
        fi
    elif [[ ! "${GROUP_NAME}" =~ ":" ]]; then
        for j in "${MySQLKeywords[@]}";do
                if [ "${GROUP_NAME}" = "$j" ];then
                    _error "参数不能为 MySQL 的关键字或保留字!退出中" && exit 1
                fi
        done
    fi


    # 7s 一个纯字母,一个字母中有一个冒号
    for j in "${MySQLKeywords[@]}";do
        if [[ "${GROUP_NAME}" =~ ":" ]]; then
            GROUP_NAME1=$(echo "${GROUP_NAME}"|cut -d':' -f 1)
            GROUP_NAME2=$(echo "${GROUP_NAME}"|cut -d':' -f 2)
            if [ -n "$(echo "${GROUP_NAME}"|cut -d':' -f 3)" ];then
                _error "非法字符: $(echo "${GROUP_NAME}"|cut -d':' -f 3)" && exit 1
            elif [ "${GROUP_NAME1}" = "$j" ] || [ "${GROUP_NAME2}" = "$j" ];then
                _error "参数不能为 MySQL 的关键字或保留字!退出中" && exit 1
            fi
        elif [[ ! "${GROUP_NAME}" =~ ":" ]]; then
            if [ "${GROUP_NAME}" = "$j" ];then
                _error "参数不能为 MySQL 的关键字或保留字!退出中" && exit 1
            fi
        fi
    done
    
    
    
done

脚本功能

  1. 以上脚本声明了两个数组变量,一个是从用户输入获取的参数,另一个相当长的是 mysql 的默认关键字和保留字
  2. 脚本读取并遍历用户输入的信息,每一个信息都进 mysql 关键字数组遍历对比,发现是 mysql 保留字就报错退出。
  3. 脚本中有两种循环写法,运行时间差距极大,对比测试运行时需要注释掉一组

脚本流程

用户给脚本提供两种格式的变量,一个是纯大小写字母,一个是字母中有一个英文冒号分隔符,但只能有一个冒号存在,即之后利用冒号进行变量字符的拆分的时候,最多只会拆成两个字符串。

我输入了两个变量:asdf 和 asd:f

后文加粗代表操作相同 简单来说,同样有外部遍历循环套着,区别在于:

7 秒龟速循环流程

采用循环内使用 if 进行分支判断

  1. 获取两个变量的数组后对两个变量进行遍历,为外部循环
  2. 每获取一个外部循环进来的变量就立即进入对 sql 关键字数组遍历的内部循环
  3. 对外部循环进来的单个变量检查是否包含冒号
  4. 如果有冒号,就根据冒号进行分割成两个字符串
  5. 检测按冒号分割,会不会有第三个非空字符串,如果存在,则报错退出
  6. 将拆分出来的两个字符串分别对比关键字数组遍历时的那轮关键字,如果任意一个字符串与该轮循环的 sql 关键字相同则报错退出,到此带冒号的遍历流程完毕
  7. 如果没冒号,则将变量直接对比关键字数组遍历时的那轮关键字,如果俩字符串相同则报错退出

0.7 秒高速循环流程

采用 if 进行分支判断后送入各自的循环

  1. 获取两个变量的数组后对两个变量进行遍历,为外部循环
  2. 每获取一个外部循环进来的变量就检查是否包含冒号
  3. 如果有冒号,就根据冒号进行分割成两个字符串
  4. 检测按冒号分割,会不会有第三个非空字符串,如果存在,则报错退出
  5. 否则进入对 sql 关键字数组遍历的内部循环
  6. 将拆分出来的两个字符串分别对比关键字数组遍历时的那轮关键字,如果拆分出来的两个字符串中任意一个与该轮循环的 sql 关键字相同则报错退出,到此带冒号的遍历完毕
  7. 如果没冒号,则将变量直接对关键字数组遍历,如果数组中存在相同字符串则报错退出
3205 次点击
所在节点    Linux
30 条回复
mylovesaber
2022-07-13 16:28:30 +08:00
@wxf666 非常感谢大佬的教程,我先琢磨下先
wxf666
2022-07-13 16:46:38 +08:00
@mylovesaber 不是大佬,只是被 bash 折磨过一阵

原文有些错误,后面也有人纠正了,注意甄别

1. 对于 7s 部分的代码,每测试一个含 : 的参数,你就要开启 2 + 614 * 6 = 3686 次子进程。。。

2. awk 的初始化部分,应该在外部传递参数。如 -v keywords="$(IFS=:; echo "${MySQLKeywords[*]}")"

不能 xxx.sh 是因为没有执行权限?好奇你 bash xxx.sh 会发生啥
haoliang
2022-07-13 17:06:15 +08:00
@wxf666 我 strace 看了 bash 下两种写法,确实都调用 pipe2 ,差别在输入的长度不够 64k 时,`|` 多创建了个进程写入数据到 pipe; 超过 64k 时,我没看 `<<<str` 的 strace log ,毕竟这个数量的数据不像是直接字符串形式存在脚本里的;如果用 <file 、 <(cat file),我日等等,写个脚本还得深入到 syscall 级别,我觉得意义不大,这块我也不熟。等个较真的大佬吧

我还写了个 benchmark 针对这种情况分别给 sh 、bash 、zsh ,其中 sh 的 pipeline 比 redirect 快 64%,不过 cpu 占用高 31%;其他两个 shell 中 redirect 比 pipeline 高效。

```
# sh pipeline*100
0.10user 0.09system 0:00.14elapsed 136%CPU (0avgtext+0avgdata 4208maxresident)k
0inputs+0outputs (0major+22025minor)pagefaults 0swaps
# sh redirect*100
0.13user 0.10system 0:00.22elapsed 104%CPU (0avgtext+0avgdata 4152maxresident)k
0inputs+0outputs (0major+13876minor)pagefaults 0swaps
# bash pipeline*100
0.19user 0.16system 0:00.27elapsed 132%CPU (0avgtext+0avgdata 4236maxresident)k
0inputs+0outputs (0major+21421minor)pagefaults 0swaps
# bash redirect*100
0.11user 0.12system 0:00.22elapsed 104%CPU (0avgtext+0avgdata 4164maxresident)k
0inputs+0outputs (0major+13683minor)pagefaults 0swaps
# zsh pipeline*100
0.17user 0.17system 0:00.29elapsed 120%CPU (0avgtext+0avgdata 3992maxresident)k
0inputs+0outputs (0major+20326minor)pagefaults 0swaps
# zsh redirect*100
0.11user 0.14system 0:00.24elapsed 105%CPU (0avgtext+0avgdata 4020maxresident)k
0inputs+0outputs (0major+14989minor)pagefaults 0swaps
```

用到的 makefile

```

strace:
# bash
@ strace -f bash -c 'echo "abc:def" | /usr/bin/cut -d: -f 1 >/dev/null' 2> bash.pipeline.strace
@ strace -f bash -c '/usr/bin/cut -d: -f 1 <<<"abc:def" >/dev/null' 2> bash.redirect.strace
# sh
@ strace -f sh -c 'echo "abc:def" | /usr/bin/cut -d: -f 1 >/dev/null' 2> sh.pipeline.strace
@ strace -f sh -c '/usr/bin/cut -d: -f 1 <<<"abc:def" >/dev/null' 2> sh.redirect.strace

benchmark:
# sh pipeline*100
@ time sh -c 'for _ in {0..100}; do echo "abc:def" | /usr/bin/cut -d: -f 1 >/dev/null; done'
# sh redirect*100
@ time sh -c 'for _ in {0..100}; do /usr/bin/cut -d: -f 1 <<<"abc:def" >/dev/null; done'
# bash pipeline*100
@ time bash -c 'for _ in {0..100}; do echo "abc:def" | /usr/bin/cut -d: -f 1 >/dev/null; done'
# bash redirect*100
@ time bash -c 'for _ in {0..100}; do /usr/bin/cut -d: -f 1 <<<"abc:def" >/dev/null; done'
# zsh pipeline*100
@ time zsh -c 'for _ in {0..100}; do echo "abc:def" | /usr/bin/cut -d: -f 1 >/dev/null; done'
# zsh redirect*100
@ time zsh -c 'for _ in {0..100}; do /usr/bin/cut -d: -f 1 <<<"abc:def" >/dev/null; done'
```
novolunt
2022-07-13 17:13:59 +08:00
烽火奇安信还是深信服
lolizeppelin
2022-07-13 17:18:10 +08:00
核心是减少 fork 和数据复制
创建 pipe 本身有消耗,pipe 两端的数据又是通过复制传递

python 不装包你把代码全传上去不就行了....包不就是个文件夹而已,不是非要装的

你服务器上都能折腾这么乱起八糟的 shell 代码, 传几个文件反而不行么

话说不让 dnf,但是能传文件就能 rpm -ivh 啊...
mylovesaber
2022-07-13 18:24:53 +08:00
@lolizeppelin rpm 好像被禁了,之前看公司同事操作过,rsync 都装不上
mylovesaber
2022-07-13 18:25:56 +08:00
@novolunt 都不是,和反贪腐有关,面向纪委的
novolunt
2022-07-14 13:50:56 +08:00
@mylovesaber 就剩一家了 mybk 这家确实有做纪委的
mylovesaber
2022-07-14 16:40:15 +08:00
@novolunt 不是这家,不用猜了,我就是来问个问题的,这情况以前没遇到过感觉有点奇怪 orz
mylovesaber
2022-07-23 22:46:13 +08:00
@wxf666 直接 bash xx.sh 会提示 permission denied. 我运行脚本靠的是进程替换实现: bash < <(cat xx.sh),或者 cat xx.sh|bash 也行。生产机上奇怪的限制很多,比如说不登录进去靠启动 live 系统把密码改了,重新进正常系统密码恢复原样。

这是一个专为移动设备优化的页面(即为了让你能够在 Google 搜索结果里秒开这个页面),如果你希望参与 V2EX 社区的讨论,你可以继续到 V2EX 上打开本讨论主题的完整版本。

https://www.v2ex.com/t/865773

V2EX 是创意工作者们的社区,是一个分享自己正在做的有趣事物、交流想法,可以遇见新朋友甚至新机会的地方。

V2EX is a community of developers, designers and creative people.

© 2021 V2EX