re unicode 范围报错

2017-06-01 14:27:02 +08:00
 binjjam

https://repl.it/languages/python 使用 python 和 python3,执行这个 re 都没问题

import re;re.findall(u'[\U00010000-\U0001FFFFF]', u'\U0001f61b',re.U)

但是在 Ubuntu 14.04 LTS 的 python 和 python3.4 执行

Python 3.4.0 (default, Jun 19 2015, 14:20:21) 
[GCC 4.8.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import re;re.findall(u'[\U00010000-\U0001FFFFF]', u'\U0001f61b',re.U)
['']
>>> 



Python 2.7.6 (default, Jun 22 2015, 17:58:13) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import re;re.findall(u'[\U00010000-\U0001FFFFF]', u'\U0001f61b',re.U)
[u'\U0001f61b']
>>> 

在 CentOS 执行

Python 2.7.10 (default, Oct 21 2015, 19:55:03) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-11)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import re 
>>> re.findall(u'[\U00010000-\U0001FFFFF]', u'\U0001f61b',re.U)  
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/python2.7/lib/python2.7/re.py", line 181, in findall
	return _compile(pattern, flags).findall(string)
  File "/usr/local/python2.7/lib/python2.7/re.py", line 251, in _compile
	raise error, v # invalid expression
sre_constants.error: bad character range
>>> 

Python 2.6.6 (r266:84292, Jul 23 2015, 15:22:56) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-11)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import re;re.findall(u'[\U00010000-\U0001FFFFF]', u'\U0001f61b',re.U)  
[u'\U0001f61b']
>>> 

想请教下各位大侠的是长什么样的?对比了下,2.7 的 re 源码是一样的,而 GCC 版本明显不同,但是同个 CentOS 上 Python 2.6 是正常的

1796 次点击
所在节点    Python
1 条回复
wwqgtxx
2017-06-01 15:51:40 +08:00
wwq@ubuntu:~$ python3.5
Python 3.5.2 (default, Nov 17 2016, 17:05:23)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import re;re.findall(u'[\U00010000-\U0001FFFFF]', u'\U0001f61b',re.U)
['😛']
>>>
wwq@ubuntu:~$ python3.6
Python 3.6.1 (default, Apr 22 2017, 20:17:23)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import re;re.findall(u'[\U00010000-\U0001FFFFF]', u'\U0001f61b',re.U)
['😛']
>>>
wwq@ubuntu:~$ python2.7
Python 2.7.12 (default, Nov 19 2016, 06:48:10)
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import re;re.findall(u'[\U00010000-\U0001FFFFF]', u'\U0001f61b',re.U)
[u'\U0001f61b']
>>>

这是一个专为移动设备优化的页面(即为了让你能够在 Google 搜索结果里秒开这个页面),如果你希望参与 V2EX 社区的讨论,你可以继续到 V2EX 上打开本讨论主题的完整版本。

https://www.v2ex.com/t/365196

V2EX 是创意工作者们的社区,是一个分享自己正在做的有趣事物、交流想法,可以遇见新朋友甚至新机会的地方。

V2EX is a community of developers, designers and creative people.

© 2021 V2EX