allenwuli

kmeans 算法,距离改用余弦相似度, Python 实现。

  •  
  •   allenwuli · May 6, 2019 · 7894 views
    This topic created in 2588 days ago, the information mentioned may be changed or developed.

    如题,有大佬写过这个算法吗? sklearn 中的 kmeans 算法用的是欧式距离,且不支持修改。

    4 replies    2019-05-07 09:50:19 +08:00
    allenwuli
        2
    allenwuli  
    OP
       May 6, 2019
    @SleipniR 先谢谢大佬,我咋打不开链接呢。
    SleipniR
        3
    SleipniR  
       May 6, 2019   ❤️ 1
    好像需要梯子:

    from sklearn.cluster import k_means_
    from sklearn.metrics.pairwise import cosine_similarity, pairwise_distances
    from sklearn.preprocessing import StandardScaler


    def create_cluster(sparse_data, nclust = 10):

    # Manually override euclidean
    def euc_dist(X, Y = None, Y_norm_squared = None, squared = False):
    #return pairwise_distances(X, Y, metric = 'cosine', n_jobs = 10)
    return cosine_similarity(X, Y)
    k_means_.euclidean_distances = euc_dist

    scaler = StandardScaler(with_mean=False)
    sparse_data = scaler.fit_transform(sparse_data)
    kmeans = k_means_.KMeans(n_clusters = nclust, n_jobs = 20, random_state = 3425)
    _ = kmeans.fit(sparse_data)
    return kmeans.labels_
    allenwuli
        4
    allenwuli  
    OP
       May 7, 2019
    @SleipniR 大佬,我看了下这个算法
    def euc_dist(x, y=None):
    return cosine_similarity(x, y)
    k_means_.euclidean_distances = euc_dist
    这个算法在这步改成用余弦相似度。计算聚类中心是怎么计算的我点进源码也没看明白。能帮我解释一下他是如何计算类中心的吗?谢谢大佬
    About   ·   Help   ·   Advertise   ·   Blog   ·   API   ·   FAQ   ·   Solana   ·   2973 Online   Highest 6679   ·     Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 · 28ms · UTC 03:08 · PVG 11:08 · LAX 20:08 · JFK 23:08
    ♥ Do have faith in what you're doing.