php 中 curl 配合 simple_html_dom 进行页面数据抓取的问题

2015-09-30 09:58:49 +08:00

有一个页面需要登录才能读取，我已经用 crul 模拟登录成功获取了页面内容

页面的数据是表格，我想用 simple_html_dom 来分析获取需要的数据

但是 curl 使用的 curl_exec
simple_html_dom 使用的是 file_get_html

curl_exec 获取到的数据，没办法直接在 simple_html_dom 中使用啊

比如 $content = file_get_html(网址)

foreach($content->find('a') as $row)
echo $row;

这是正常的
-----------------------------------------------------------
但是 $content = curl_exec(网址)

foreach($content->find('a') as $row)
echo $row;

这是没有内容的

怎么办？

3922 次点击

所在节点

PHP

13 条回复

Paranoid

2015-09-30 10:05:44 +08:00

str_get_html

haiyang416

2015-09-30 10:05:56 +08:00

str_get_html

towser

2015-09-30 10:07:18 +08:00

$content = file_get_html(curl_exec(url));

有用的话记得给我打钱。

towser

2015-09-30 10:08:17 +08:00

$content = str_get_html(curl_exec(url));

手快了，但打钱的事你别忘了。

eoo

2015-09-30 10:10:32 +08:00

$content = curl_exec

试过$html->load($content); 没？

wjfz

2015-09-30 10:21:42 +08:00

https://github.com/samacs/simple_html_dom/blob/master/simple_html_dom.php

看源码可知， file_get_html 返回了一个 simple_html_dom 对象，对象下有 find 方法。
然而 curl_exec 并不是一个对象，他下面没有 find 方法，所以你不能用。

file_get_html 函数下的处理是$contents = file_get_contents($url, $use_include_path, $context, $offset); 所以这里必须传 URL 。
str_get_html 函数下的处理是直接对字符串进行处理。

记得给 4 楼打钱。

2015-09-30 10:29:51 +08:00

str_get_html ，正解！
多谢大家

$html->load($content);
没试成功。。

ruchee

2015-09-30 10:33:12 +08:00

是妹子就不用打钱了吧， 4 楼

LeoQ

2015-09-30 10:34:50 +08:00

打钱了么，哈哈哈哈哈

towser

2015-09-30 11:18:16 +08:00

@ruchee 不管是妹是姐，多少也得打点。现在经济危机，就是地主家也没有余粮。

mactaew

2015-09-30 13:18:42 +08:00

学艺不精，已经放弃用 php 处理 html dom 。 node 的 cheerio 不错！

zonghua

2015-09-30 23:20:28 +08:00

@towser PHP 还真是面向对象，竟然有这么多函数。我之前还一直用正则匹配出来。

towser

2015-10-01 00:54:29 +08:00

@zonghua 是的， PHP 5 之后已经逐步完善面向对象了。 dom 解析配合正则提取做爬虫也是呱呱叫。

第 1 页／共 1 页

这是一个专为移动设备优化的页面（即为了让你能够在 Google 搜索结果里秒开这个页面），如果你希望参与 V2EX 社区的讨论，你可以继续到 V2EX 上打开本讨论主题的完整版本。

https://www.v2ex.com/t/224692

V2EX 是创意工作者们的社区，是一个分享自己正在做的有趣事物、交流想法，可以遇见新朋友甚至新机会的地方。

V2EX is a community of developers, designers and creative people.