请教一个 Python 中字段切割问题

2020-02-11 23:02:34 +08:00
 huyinjie

想要将如下这个字段按照逗号分离,但是在某些字符串中带有逗号,希望能不要分离字符串中的逗号

6898,"RAAF Williams, Laverton Base","Laverton","Australia",\N,"YLVT",-37.86360168457031,144.74600219726562,18,10,"O","Australia/Hobart","airport","OurAirports"

希望能分离成如下形式

6898

RAAF Williams, Laverton Base

Laverton

Australia

直接使用 Python 中的 split 方法会将"RAAF Williams"与"Laverton Base"分离开,请问有什么办法可以避免

3068 次点击
所在节点    Python
27 条回复
qwjhb
2020-02-11 23:15:20 +08:00
exec('a=[6898,"RAAF Williams, Laverton Base","Laverton","Australia","YLVT",-37.86360168457031,144.74600219726562,18,10,"O","Australia/Hobart","airport","OurAirports"]')
retanoj
2020-02-11 23:23:41 +08:00
好方法,命令执行 /代码执行严重漏洞都你这么写出来的
retanoj
2020-02-11 23:25:21 +08:00
wuwukai007
2020-02-11 23:27:34 +08:00
正则表达式,如果单词是有引号的话
huyinjie
2020-02-11 23:35:59 +08:00
@wuwukai007 #4 看来只能用 ^(\d+),(.+),(.+),(.+),(.+),(.+),(.+),(.+),(.+),(.+),(.+),(.+),(.+)$ 这种来分离了
retanoj
2020-02-11 23:39:20 +08:00
retanoj
2020-02-11 23:41:26 +08:00
不好意思,贴乱了
试试这个
import csv
>>> list(csv.reader([your_string]))
yuanhego
2020-02-11 23:52:02 +08:00
noreply69
2020-02-11 23:56:00 +08:00
import csv
s = '6898,"RAAF Williams, Laverton Base","Laverton","Australia",\\N,"YLVT",-37.86360168457031,144.74600219726562,18,10,"O","Australia/Hobart","airport","OurAirports"'
splitted = list(csv.reader([s], delimiter=',', quotechar='"'))[0]
print(splitted)
noreply69
2020-02-11 23:56:32 +08:00
```
import csv

s = '6898,"RAAF Williams, Laverton Base","Laverton","Australia",\\N,"YLVT",-37.86360168457031,144.74600219726562,18,10,"O","Australia/Hobart","airport","OurAirports"'

splitted = list(csv.reader([s], delimiter=',', quotechar='"'))[0]

print(splitted)
```
qwjhb
2020-02-12 00:02:24 +08:00
@retanoj 能不能用? 脚本处理现成格式化文本怕人插命令进来? 干脆删了 exec 好了
Akkuman
2020-02-12 00:13:53 +08:00
ast.literal_eval('[6898,"RAAF Williams, Laverton Base","Laverton","Australia","YLVT",-37.86360168457031,144.74600219726562,18,10,"O","Australia/Hobart","airport","OurAirports"]')
huyinjie
2020-02-12 00:20:54 +08:00
@noreply69 #9
@retanoj #7

感谢 这两段代码都能解决
huyinjie
2020-02-12 00:24:56 +08:00
@Akkuman #12
@qwjhb #11
因为是从文本文件一行一行读取 你们括号里的部分相当于已经成功分离了==
Akkuman
2020-02-12 00:38:16 +08:00
@huyinjie 不是成功分离,只是前后拼接了方括号
retanoj
2020-02-12 08:18:02 +08:00
@qwjhb
是的,能用能用。
你知道的,一句“能不能用?”已经能说明很多问题了。
levelworm
2020-02-12 08:43:21 +08:00
当中那个\N 能去掉吗?不去掉的话好像报错?
noqwerty
2020-02-12 10:17:17 +08:00
直接整个 csv 文件也可以读进来的
smallpython
2020-02-12 10:25:05 +08:00
s = your_str
shuangyinhao_count = 0
result = []
temp = ''

for i in s:
if i == '"':
shuangyinhao_count += 1
elif i == ',':
if shuangyinhao_count == 1: # 当双引号数量为 1 时,继续添加字符而不做处理
temp += i
else:
result.append(temp)
temp = ''
else:
temp += i

if shuangyinhao_count == 2:
shuangyinhao_count = 0

result.append(temp)

print(result)
araraloren
2020-02-12 11:05:17 +08:00
正则分隔
import re

str = '6898,"RAAF Williams, Laverton Base","Laverton","Australia",\\N,"YLVT",-37.86360168457031,144.74600219726562,18,10,"O","Australia/Hobart","airport","OurAirports"'
pattern = re.compile(r'\"[^\"]+\"\,|[^\"\,]+\,');
print(pattern.findall(str))

这是一个专为移动设备优化的页面(即为了让你能够在 Google 搜索结果里秒开这个页面),如果你希望参与 V2EX 社区的讨论,你可以继续到 V2EX 上打开本讨论主题的完整版本。

https://www.v2ex.com/t/643884

V2EX 是创意工作者们的社区,是一个分享自己正在做的有趣事物、交流想法,可以遇见新朋友甚至新机会的地方。

V2EX is a community of developers, designers and creative people.

© 2021 V2EX