🔥 掘金小册爬虫

2019-01-26 23:50:06 +08:00
 billyangg

github 仓库欢迎 star

采用 node https 模块,获取已购买小册 html 代码,并将 html 代码转换为 markdown 格式文件保存本地。

注意:目前本项目有两个版本,v2 不需要使用 chromium 作为无头浏览器; v1 则使用 chromi 作为无头浏览器模拟用户登录网站;

根据需要选择不同版本

使用方法

⚠️ 注意:掘金不支持境外网络访问,因此不要使用代理

方法一:npx 直接执行

在本地某目录中执行 npx @oliyg/juejinxiaoce 按照提示输入用户名密码以及小册 ID 当提示 all done 完成

➜  Desktop npx @oliyg/juejinxiaoce
npx: 98 安装成功,用时 10.748 秒
email: 输入你的用户名密码
password: 输入你的用户名密码
bookId: 小册 ID
===navagating to main page
===login...
===getting book section list
===getting book HTML content
面试常用技巧
===writing html...
===getting book HTML content
===write html file success
===writing markdown...
===write markdown file success
前方的路,让我们结伴同行
===writing html...
===write html file success
===writing markdown...
===write markdown file success

======
All Done...Enjoy.
======

在执行命令的这个目录中可以找到一个名为 md xxx 的文件夹,内包含 md 文档;在上面这个例子中,我们在 Desktop 桌面目录执行命令,因此在桌面目录中会生成这个文件夹:

➜  md 1548483715543 ls -al
total 40
drwxr-xr-x  4 oli  staff   128  1 26 14:22 .
drwx------+ 9 oli  staff   288  1 26 14:21 ..
-rw-r--r--  1 oli  staff  4915  1 26 14:21 面试常用技巧.md
-rw-r--r--  1 oli  staff  8465  1 26 14:22 前方的路,让我们结伴同行.md

方法二:npm i 命令

使用 npm i -g 安装,并使用 juejinxiaoce 命令执行:

➜  Desktop npm i -g @oliyg/juejinxiaoce
/Users/oli/.nvm/versions/node/v8.12.0/bin/juejinxiaoce -> /Users/oli/.nvm/versions/node/v8.12.0/lib/node_modules/@oliyg/juejinxiaoce/bin/juejinxiaoce
+ @oliyg/juejinxiaoce@2.2.1
added 98 packages from 201 contributors in 5.89s
➜  Desktop juejinxiaoce
email:
password:
bookId:
===navagating to main page
===login...
...
...

小册 ID 见 URL 链接:

执行后等待出现消息 all done. enjoy. 完成转换,效果如下:

更新日志

常见问题

免责

隐私

License

The MIT License (MIT) Copyright (c) 2019 OliverYoung

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

4230 次点击
所在节点    Node.js
0 条回复

这是一个专为移动设备优化的页面(即为了让你能够在 Google 搜索结果里秒开这个页面),如果你希望参与 V2EX 社区的讨论,你可以继续到 V2EX 上打开本讨论主题的完整版本。

https://www.v2ex.com/t/530928

V2EX 是创意工作者们的社区,是一个分享自己正在做的有趣事物、交流想法,可以遇见新朋友甚至新机会的地方。

V2EX is a community of developers, designers and creative people.

© 2021 V2EX