关于 URL 去重的想法

贴上渣渣算法：
```
def hash(self):
"""
URL 去重
:return: hash
:rtype: int
"""
# 请求方法|协议|域名|一级目录|目录深度|尾部特征|QUERY|DATA
# http://a.com/p1/p2/p3/f.php?a=1&b=2&c
# GET|http|a.com|p1|4|php|abc|

first_dir = self.__parsed.path.split("/")[1] \
        if len(self.__parsed.path.split("/")) > 1 else ""
    depth = str(self.__parsed.path.count("/")) \
        if self.__parsed.path.count("/") else "1"
    suffix = self.__parsed.path.split("/")[-1].split(".")[-1] \
        if "." in self.__parsed.path.split("/")[-1] else ""
    query = "".join(sorted(self.query.keys()))
    data = "".join(sorted(self.__data.keys()))

    feather = "|".join((
        self.__method,
        self.__parsed.scheme,
        self.__parsed.netloc,
        first_dir,
        depth,
        suffix,
        query,
        data
    ))

    hash_ = int(md5(feather.encode()).hexdigest(), 16)

    return hash_

这是一个专为移动设备优化的页面（即为了让你能够在 Google 搜索结果里秒开这个页面），如果你希望参与 V2EX 社区的讨论，你可以继续到 V2EX 上打开本讨论主题的完整版本。

https://www.v2ex.com/t/173756

V2EX 是创意工作者们的社区，是一个分享自己正在做的有趣事物、交流想法，可以遇见新朋友甚至新机会的地方。

V2EX is a community of developers, designers and creative people.