Gitbook2pdf :抓取 Gitbook 生成的网站生成 pdf 文件的工具

2019-03-07 10:21:36 +08:00
 fuergaosi

介绍

经常发现很多用gitbook生成的书籍质量很高
就想离线下来看
但是gitbook生成的pdf都无法复制且体积很大
有的网站甚至不提供下载的选项
就和小伙伴一起做了个工具
对于gitbook生成的网站进行抓取
解析以后使用weasyprint进行生成文件

特性

6819 次点击
所在节点    分享创造
33 条回复
fuergaosi
2019-03-07 10:25:20 +08:00
求 star
magicZ
2019-03-07 10:28:11 +08:00
给个链接呀
fuergaosi
2019-03-07 10:31:40 +08:00
忘记放链接了
gitbook2pdf: https://github.com/fuergaosi233/gitbook2pdf
22k
2019-03-07 10:32:00 +08:00
昨天还在想着有没有能下载 gitbook 的书籍,mark 一下,楼主可以分享的话更新下原帖。谢谢大佬
fuergaosi
2019-03-07 10:50:09 +08:00
@22k 看 3 楼
changjiangzzZ
2019-03-07 11:22:48 +08:00
已 star :)
newmind
2019-03-07 11:27:17 +08:00
效果很不错, 已赞
newmind
2019-03-07 11:28:13 +08:00
要是能有个在线版就更好了
jasonslyvia
2019-03-07 11:55:25 +08:00
赞,一直想要一个这样的工具,希望能持续打磨!
FakeLeung
2019-03-07 11:59:18 +08:00
没有 usage 吗?
看代码貌似是直接修改 main 里面那个 run 的 url ?

ps:github 地址可以 append。
fffflyfish
2019-03-07 12:19:46 +08:00
点赞!终于看到有人做了
mseasons
2019-03-07 14:31:23 +08:00
aiohttp.client_exceptions.ClientConnectorCertificateError: Cannot connect to host wizardforcel.gitbooks.io:443 ssl:True [SSLCertVerificationError: (1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1056)')]
d5
2019-03-07 14:34:09 +08:00
楼主可以考虑做一个在线版,后端放在外地主机上~
privil
2019-03-07 16:32:32 +08:00
……好像比较吃内存,被 kill 掉了
tongdongdong
2019-03-07 18:59:15 +08:00
C:\Users\TDD\Desktop>python -m weasyprint https://ts.xcatliu.com ts.pdf
WARNING: Ignored `text-rendering:auto` at 4:620, unknown property.
WARNING: Ignored `filter:none` at 4:2882, unknown property.
WARNING: Expected a media type, got (max-width:600px)
WARNING: Invalid media type " (max-width:600px)" the whole @media rule was ignored at 9:83.
WARNING: Expected a media type, got (max-width:600px)
WARNING: Invalid media type " (max-width:600px)" the whole @media rule was ignored at 9:669.
WARNING: Ignored `box-shadow:none` at 9:1092, unknown property.
WARNING: Ignored `text-overflow:ellipsis` at 9:1686, unknown property.
WARNING: Expected a media type, got (max-width:1000px)
WARNING: Invalid media type " (max-width:1000px)" the whole @media rule was ignored at 9:1805.
WARNING: Ignored `box-shadow:0 6px 12px rgba(0,0,0,.175)` at 9:2336, unknown property.
WARNING: Ignored `overflow-y:auto` at 9:3908, unknown property.
WARNING: Ignored `text-overflow:ellipsis` at 9:4934, unknown property.
WARNING: Expected a media type, got (max-width:600px)
WARNING: Invalid media type " (max-width:600px)" the whole @media rule was ignored at 9:5254.
WARNING: Expected a media type, got (min-width:600px)
WARNING: Invalid media type " (min-width:600px)" the whole @media rule was ignored at 9:5583.
WARNING: Expected a media type, got (max-width:600px)
WARNING: Invalid media type " (max-width:600px)" the whole @media rule was ignored at 9:5650.
WARNING: Ignored `overflow-y:auto` at 9:6180, unknown property.
WARNING: Ignored `overflow-y:auto` at 9:6418, unknown property.
WARNING: Expected a media type, got (max-width:1240px)
WARNING: Invalid media type " (max-width:1240px)" the whole @media rule was ignored at 9:6434.
WARNING: Ignored `text-size-adjust:100%` at 9:7377, unknown property.
WARNING: Expected a media type, got (max-width:1240px)
WARNING: Invalid media type " (max-width:1240px)" the whole @media rule was ignored at 9:11595.
WARNING: Ignored `box-shadow:none` at 9:12111, unknown property.
WARNING: Ignored `text-size-adjust:100%` at 9:12512, unknown property.
WARNING: Ignored `text-rendering:optimizeLegibility` at 9:20972, unknown property.
WARNING: Ignored `font-smoothing:antialiased` at 9:21006, unknown property.
WARNING: Ignored `text-size-adjust:100%` at 9:21124, unknown property.
WARNING: Ignored `box-shadow: none` at 235:3, unknown property.
WARNING: Ignored `box-shadow: none` at 272:3, unknown property.
然后只有首页转成功了!!!
changjiangzzZ
2019-03-07 19:02:54 +08:00
@tongdongdong 老哥麻烦看看文档先~
changjiangzzZ
2019-03-07 19:04:38 +08:00
@mseasons 国内网络环境不太好,连接的时候 timeout 了,添加个代理试试
fuergaosi
2019-03-07 19:13:55 +08:00
@privil 吃内存是因为`weasyprint`的问题 正在尝试分片输出
@tongdongdong 出门左转`weasyprint`的 issues 区
@mseasons 我无法访问这个 url 不知道你是怎么访问的 希望你可以把问题以及抓取的 url 发在`issues`区
@FakeLeung 感谢提醒 之前没找到 append 的按钮╮(╯_╰)╭ 另外目前是修改 url 使用 等下改一下使用方法 之前一直这样测试 就没注意这些方面
Ahs
2019-03-07 19:14:26 +08:00
已 Star
fuergaosi
2019-03-07 19:21:27 +08:00
@d5 @newmind 这个东西有点吃内存 解决这个问题以后会考虑做个在线版的

这是一个专为移动设备优化的页面(即为了让你能够在 Google 搜索结果里秒开这个页面),如果你希望参与 V2EX 社区的讨论,你可以继续到 V2EX 上打开本讨论主题的完整版本。

https://www.v2ex.com/t/541999

V2EX 是创意工作者们的社区,是一个分享自己正在做的有趣事物、交流想法,可以遇见新朋友甚至新机会的地方。

V2EX is a community of developers, designers and creative people.

© 2021 V2EX