技术渣,谁给写个爬虫,感谢

2019-05-21 16:34:24 +08:00
 onecode

最新 http://adr.meizitu.net/wp-json/wp/v2/posts?page=1&per_page=20 图片详情 http://adr.meizitu.net/wp-json/wp/v2/i?id=152201

16903 次点击
所在节点    Python
137 条回复
Constellation39
2019-05-21 19:42:12 +08:00
疑车有据
0x4F5DA2
2019-05-21 19:47:42 +08:00
我怀疑你在搞黄色,我好像也有证据
iwishing
2019-05-21 20:03:55 +08:00
@yearliny
改了一下你的
$json=Invoke-WebRequest "http://adr.meizitu.net/wp-json/wp/v2/posts?page=1&per_page=20" -Method Get -UseBasicParsing| ConvertFrom-Json
$wc = New-Object System.Net.WebClient
foreach ($i in $json) {
$output = split-path -Leaf $i.thumb_src
$wc.DownloadFile($i.thumb_src,$output)
}
keith1126
2019-05-21 20:08:34 +08:00
你这个是假车,我这有辆真车: https://paste.ubuntu.com/p/2nxbtRtqFX/

用法自寻(逃
Atukey
2019-05-21 20:11:44 +08:00
极速车
Shiyq
2019-05-21 20:19:20 +08:00
没意思(我好了)
claysec
2019-05-21 20:27:45 +08:00
空手套白狼?(滑稽)
zzh1224
2019-05-21 20:31:24 +08:00
你的开车技术真的一流
bld2018
2019-05-21 21:08:14 +08:00
不是有成品发布过吗?
canwex
2019-05-21 21:41:57 +08:00
import requests
import json

url = 'https://adr.meizitu.net/wp-json/wp/v2/posts?page={}&per_page={}'
per_page = 100
page = 52

print('##### spider start #####')
while True:
page += 1
json_data = requests.get(url.format(page, per_page))
data = json.loads(json_data.text)

if not isinstance(data, list):
print('##### spider end #####')
break

for item in data:
thumb_src = item['thumb_src']
title = item['title'] + '.jpg'
print('[+] downloading {} ...'.format(title))
meizi = requests.get(thumb_src)
if meizi.status_code == 200:
with open(title, 'wb') as f:
f.write(meizi.content)
fuchunliu
2019-05-21 21:58:17 +08:00
@keith1126 拿来就可以开,不用加油嘛😏
zzzzzzzzzp
2019-05-21 22:21:52 +08:00
[{"id":181372,"title":"辣妹温心怡情趣内衣 SM 诱惑 丰胸美臀身材热辣销魂","img_num":46,"thumb_src":"https:\/\/i2.meizitu.net\/2019\/04\/23a25.jpg","thumb_src_min":"https:\/\/i2.meizitu.net\/thumbs\/2019\/05\/181372_23a25_236.jpg"},{"id":180743,"title":"大胸女神恩一雪白玉兔诱人呈现 手捧巨乳再掀性感狂潮".....
wpzero
2019-05-21 22:34:04 +08:00
😄
calebx
2019-05-21 22:34:46 +08:00
早说吗!
tt0411
2019-05-21 22:47:56 +08:00
要啥爬虫, 一行命令的事情

curl -s http://adr.meizitu.net/wp-json/wp/v2/posts?page=1&per_page=20 | jq -r '.[] | .thumb_src' | xargs -IX curl -s -O X
azh7138m
2019-05-21 22:52:49 +08:00
@niknik 好人一生平安?(
harvies
2019-05-21 23:34:03 +08:00
import json
import os

import requests

if __name__ == '__main__':
flag = True
page = 1
while flag:
print("page:" + str(page))
list_html = requests.get("http://adr.meizitu.net/wp-json/wp/v2/posts?page=" + str(page) + "&per_page=20")
content = list_html.content
json_loads = json.loads(content)
if isinstance(json_loads, list):
print(json_loads)
for list_item in json_loads:
id_ = list_item['id']
title = list_item['title']
print(title)
detail_html = requests.get("http://adr.meizitu.net/wp-json/wp/v2/i?id=" + str(id_))
detail_json = json.loads(detail_html.content)
print(detail_json)
str_content_ = detail_json['content']
content__split = str_content_.split(',')
print('downloading ' + str(content__split))
for detail_item in content__split:
print(detail_item)
rfind = detail_item.rfind('/')
file_name = detail_item[rfind + 1:len(detail_item)]
folder_path = "./images/" + title + '/'
if not os.path.exists(folder_path):
os.makedirs(folder_path)
requests_get = requests.get(detail_item)
with open(folder_path + file_name, "wb") as f:
f.write(requests_get.content)
else:
code_ = json_loads['code']
if code_ != 'rest_post_invalid_page_number':
print(code_)
else:
print(code_ + " exit")
flag = False
page += 1
lrigi
2019-05-21 23:37:20 +08:00
@tt0411 ios 捷径也可以的
xiaobai987
2019-05-21 23:40:01 +08:00
图片地址都好爬 关键是 怎么样把地址全部快速爬下来 网站反爬很坑爹
A1321A
2019-05-22 00:00:25 +08:00
车个屁,你们这也配叫车?笑出前列腺液....https://github.com/94se/94se---/wiki

这是一个专为移动设备优化的页面(即为了让你能够在 Google 搜索结果里秒开这个页面),如果你希望参与 V2EX 社区的讨论,你可以继续到 V2EX 上打开本讨论主题的完整版本。

https://www.v2ex.com/t/566261

V2EX 是创意工作者们的社区,是一个分享自己正在做的有趣事物、交流想法,可以遇见新朋友甚至新机会的地方。

V2EX is a community of developers, designers and creative people.

© 2021 V2EX