Python 处理文件内容的正确姿势该怎样?

2017-03-21 10:37:29 +08:00
 chaos0x0000007b
大神们:

我想把 htm 文件中的第一个<link 到第二个<link 之间的所有内容另存为一个 htm 该怎么写比较简洁。



<meta http-equiv="X-UA-Compatible" content="IE=edge">

<link rel="prefetch" href="https://ajax.googleapis.com/ajax/libs/jquery/1.8.2/jquery.min.js">

<meta name="application-name" content="Python.org">
<meta name="msapplication-tooltip" content="The official home of the Python Programming Language">
<meta name="apple-mobile-web-app-title" content="Python.org">
<meta name="apple-mobile-web-app-capable" content="yes">
<meta name="apple-mobile-web-app-status-bar-style" content="black">

<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="HandheldFriendly" content="True">
<meta name="format-detection" content="telephone=no">
<meta http-equiv="cleartype" content="on">
<meta http-equiv="imagetoolbar" content="false">

<script type="text/javascript" async="" src="https://ssl.google-analytics.com/ga.js"></script><script src="./Welcome to Python.org_files/modernizr.js.下载"></script><style type="text/css" adt="123"></style>

<link href="./Welcome to Python.org_files/style.css" rel="stylesheet" type="text/css" title="default">
<link href="./Welcome to Python.org_files/mq.css" rel="stylesheet" type="text/css" media="not print, braille, embossed, speech, tty">






提取的内容应该是:



<link rel="prefetch" href="https://ajax.googleapis.com/ajax/libs/jquery/1.8.2/jquery.min.js">

<meta name="application-name" content="Python.org">
<meta name="msapplication-tooltip" content="The official home of the Python Programming Language">
<meta name="apple-mobile-web-app-title" content="Python.org">
<meta name="apple-mobile-web-app-capable" content="yes">
<meta name="apple-mobile-web-app-status-bar-style" content="black">

<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="HandheldFriendly" content="True">
<meta name="format-detection" content="telephone=no">
<meta http-equiv="cleartype" content="on">
<meta http-equiv="imagetoolbar" content="false">

<script type="text/javascript" async="" src="https://ssl.google-analytics.com/ga.js"></script><script src="./Welcome to Python.org_files/modernizr.js.下载"></script><style type="text/css" adt="123"></style>

<link
1403 次点击
所在节点    Python
2 条回复
Kilerd
2017-03-21 10:44:03 +08:00
xpath or regex
cdwyd
2017-03-21 10:50:23 +08:00
'<link' + html.split('<link')[1]
手机打的没测试

这是一个专为移动设备优化的页面(即为了让你能够在 Google 搜索结果里秒开这个页面),如果你希望参与 V2EX 社区的讨论,你可以继续到 V2EX 上打开本讨论主题的完整版本。

https://www.v2ex.com/t/349050

V2EX 是创意工作者们的社区,是一个分享自己正在做的有趣事物、交流想法,可以遇见新朋友甚至新机会的地方。

V2EX is a community of developers, designers and creative people.

© 2021 V2EX