Python 生成一段随机字符串的两种写法

ltux

2017-09-30 20:16:50 +08:00

不知道这两者在性能上有没有差异？
请自己测试，不要让别人帮你测试。

jingniao

2017-09-30 20:21:43 +08:00

1，第一种方式更推荐，速度快一点
2，第一段你少了个括号

jingniao

2017-09-30 20:22:16 +08:00

补充：理论上……

crab

2017-09-30 20:41:22 +08:00

100W 次
timeit
第一个 18s
第二个 17.3s

Daniel65536

2017-09-30 22:13:13 +08:00

Python 3.6 开始可以使用：
''.join(random.choices(string.ascii_letters + string.digits, k=15)

chace

2017-09-30 22:16:18 +08:00

@jingniao 没注意，谢谢提醒

chace

2017-09-30 22:18:15 +08:00

@Daniel65536 谢谢。目测跟我一样少打了一个括号^_^

chace

2017-09-30 22:34:02 +08:00

@Daniel65536 这个内置函数的速度是最快的，100W 个字符 3.15765118598938s

workwonder

2017-09-30 22:54:04 +08:00

https://gist.github.com/wonderbeyond/1806c7b43d3e642e5ad0aee7052b8e8f

这是我记的笔记，搬 Django 的实现，为什么写这么复杂，大家可以发表下看法。

Kilerd

2017-09-30 23:22:17 +08:00

@workwonder ide 没跟你说 for i 里面的 i 没用到吗？

ryd994

2017-10-01 02:39:57 +08:00

可以对比一下 choice 和 choices 的源码
https://hg.python.org/cpython/file/tip/Lib/random.py#l252
https://hg.python.org/cpython/file/tip/Lib/random.py#l340
choice 是生成一个随机的整数索引
choices 是把分布比重（默认等比重）转换成 0-1 的数轴，然后 random()生成 0-1 小数，对应到数轴上
大家底层都是用的 random()，choices 更复杂，理应更慢才对

使用 cProfile 测试
>>> cProfile.run('"".join(random.choice(string.ascii_letters + string.digits) for _ in range(10**7))')
60321941 function calls in 21.869 seconds

Ordered by: standard name

ncalls tottime percall cumtime percall filename:lineno(function)
10000001 5.516 0.000 20.772 0.000 <string>:1(<genexpr>)
1 0.000 0.000 21.869 21.869 <string>:1(<module>)
10000000 6.283 0.000 8.918 0.000 random.py:222(_randbelow)
10000000 5.381 0.000 15.256 0.000 random.py:252(choice)
1 0.000 0.000 21.869 21.869 {built-in method builtins.exec}
10000000 0.956 0.000 0.956 0.000 {built-in method builtins.len}
10000000 0.785 0.000 0.785 0.000 {method 'bit_length' of 'int' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
10321936 1.851 0.000 1.851 0.000 {method 'getrandbits' of '_random.Random' objects}
1 1.097 1.097 21.869 21.869 {method 'join' of 'str' objects}

>>> cProfile.run('"".join(random.choices(string.ascii_letters + string.digits, k=10**7))')
10000007 function calls in 3.463 seconds

Ordered by: standard name

ncalls tottime percall cumtime percall filename:lineno(function)
1 0.014 0.014 3.463 3.463 <string>:1(<module>)
1 0.000 0.000 3.374 3.374 random.py:340(choices)
1 2.780 2.780 3.374 3.374 random.py:352(<listcomp>)
1 0.000 0.000 3.463 3.463 {built-in method builtins.exec}
1 0.000 0.000 0.000 0.000 {built-in method builtins.len}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
1 0.075 0.075 0.075 0.075 {method 'join' of 'str' objects}
10000000 0.594 0.000 0.594 0.000 {method 'random' of '_random.Random' objects}

可以看到：
1. choice 法到底层用的是 getrandbits
# Only call self.getrandbits if the original random() builtin method
# has not been overridden or if a new getrandbits() was supplied.
说明 getrandbits 应该是比 random 更快的，否则官方不会这么用

2. choice 法的 function calls 是 choices 法的 6 倍，而正好时间也是将近 6 倍，很可能这两者是有关联的

3.看 tottime，choices 的时间主要是在 random.py:352
return [population[_int(random() * total)] for i in range(k)]
这里构建 list 消耗大可以理解

choice 的时间主要是在<string>:1，random.py:222，random.py:252 上
choice 一个 5 行的函数，吃这么多时间，很难理解

happlebao

2017-10-01 03:42:50 +08:00

``` python
λ python -m timeit -n 1000 -r 10 -s "import random, string" "''.join(r andom.choices(string.ascii_letters + string.digits, k=10000))"
1000 loops, best of 10: 1.53 msec per loop

λ python -m timeit -n 1000 -r 10 -s "import os" "os.urandom(10000)"
1000 loops, best of 10: 2.91 usec per loop
```

note:　 1 msec (milliseconds) = 1000 usec (microseconds)

chace

2017-10-01 11:22:43 +08:00

@happlebao
厉害，还有这个方法
>>> cProfile.run('binascii.hexlify(os.urandom(10**7)).decode()')
6 function calls in 0.789 seconds

Ordered by: standard name

ncalls tottime percall cumtime percall filename:lineno(function)
1 0.005 0.005 0.789 0.789 <string>:1(<module>)
1 0.032 0.032 0.032 0.032 {built-in method binascii.hexlify}
1 0.000 0.000 0.789 0.789 {built-in method builtins.exec}
1 0.747 0.747 0.747 0.747 {built-in method posix.urandom}
1 0.005 0.005 0.005 0.005 {method 'decode' of 'bytes' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}

chace

2017-10-01 11:23:56 +08:00

@workwonder 中间一段代码都是在生成 random.seed()。可能是为了随机效果更好吧。

chace

2017-10-01 11:46:26 +08:00

@ryd994
应该是因为<string>:1，random.py:222，random.py:252 都是 py 实现的，而且调用次数都是最多的，所以耗时吧。
choice:
10000001 5.516 0.000 20.772 0.000 <string>:1(<genexpr>)
10000000 6.283 0.000 8.918 0.000 random.py:222(_randbelow)
10000000 5.381 0.000 15.256 0.000 random.py:252(choice)

而在 choices 中，调用最多的是 C 语言实现的，所以不耗时。
choices:
10000000 0.594 0.000 0.594 0.000 {method 'random' of '_random.Random' objects}

楼上有 XD 提到 os.urandom()调用 syscall(such as /dev/urandom on Unix or CryptGenRandom on Windows)生成一段随机的 bytes 速度更快。

mayne95

2017-10-01 14:32:44 +08:00

3.6 有个 secrets 模块，不用自己写了😂

nannanziyu

2017-10-02 00:48:53 +08:00

试了下 C#
var r = new Random(Environment.TickCount);
var randomstring = new string(Enumerable.Range(0,1000000).Select(i=>(char)(r.Next(33,127))).ToArray());
00:00:00.0247641

shn7798

2017-10-07 14:06:25 +08:00

In [42]: %time s = binascii.b2a_hex(os.urandom(10**7/2));
CPU times: user 34.8 ms, sys: 318 ms, total: 353 ms
Wall time: 353 ms