4000-520-616
欢迎来到免疫在线!(蚂蚁淘生物旗下平台)  请登录 |  免费注册 |  询价篮
主营:原厂直采,平行进口,授权代理(蚂蚁淘为您服务)
咨询热线电话
4000-520-616
当前位置: 首页 > 新闻动态 >
新闻详情
Python代码过滤最接近的距离对-python黑洞网
来自 : www.pythonheidong.com/blog/art 发布时间:2021-03-25

我想要的是另一个表格,该表格过滤了最接近邻居的城市。例如,对于“ Neigh1”,City4是最近的(距离最小)。所以我想要下表

city_A neigh_B Dist(km)0 City4 Neigh1 5.561 City3 Neigh2 4.322 City1 Neigh3 7.933 City2 Neigh4 3.214 City4 Neigh5 4.565 City5 Neigh6 6.676 City3 Neigh7 6.16 ..... and so on

城市名称是否重复都没关系,我只想将最近的对保存到另一个csv。怎么执行,专家,请​​帮忙!


解决方案

如果只想为每个邻域提供最近的城市,则不需要计算完整距离矩阵。

这是一个工作代码示例,尽管我得到的输出与您的输出不同。可能是经纬度错误。

我用了你的资料

import pandas as pdimport numpy as npimport sklearn.neighborslocations_stores = pd.DataFrame({ \'city_A\' : [\'City1\', \'City2\', \'City3\', \'City4\', ], \'latitude_A\': [ 56.361176, 56.34061, 56.374749, 56.356624], \'longitude_A\': [ 4.899779, 4.871195, 4.893847, 4.912281]locations_neigh = pd.DataFrame({ \'neigh_B\': [\'Neigh1\', \'Neigh2\', \'Neigh3\', \'Neigh4\',\'Neigh5\'], \'latitude_B\' : [ 53.314, 53.318, 53.381, 53.338,53.7364], \'longitude_B\': [ 4.955,4.975,4.855,4.873,4.425]

创建了一个BallTree我们可以查询

from sklearn.neighbors import BallTreeimport numpy as npstores_gps = locations_stores[[\'latitude_A\', \'longitude_A\']].valuesneigh_gps = locations_neigh[[\'latitude_B\', \'longitude_B\']].valuestree = BallTree(stores_gps, leaf_size=15, metric=\'haversine\')

对于每个邻居,我们要最接近(k=1)城市/商店:

distance, index = tree.query(neigh_gps, k=1)earth_radius = 6371distance_in_km = distance * earth_radius

我们可以使用以下命令创建结果的DataFrame

pd.DataFrame({ \'Neighborhood\' : locations_neigh.neigh_B, \'Closest_city\' : locations_stores.city_A[ np.array(index)[:,0] ].values, \'Distance_to_city\' : distance_in_km[:,0]

这给了我

 Neighborhood Closest_city Distance_to_city0 Neigh1 City2 19112.3341061 Neigh2 City2 19014.1547442 Neigh3 City2 18851.1687023 Neigh4 City2 19129.5551884 Neigh5 City4 15498.181486

Since our output is different, there is some mistake to correct. Maybe swapped lat/long, I am just guessing here. But this is the approach you want, especially for the amounts of your data.

Edit: For the Full matrix, use

from sklearn.neighbors import DistanceMetricdist = DistanceMetric.get_metric(\'haversine\')earth_radius = 6371haversine_distances = dist.pairwise(np.radians(stores_gps), np.radians(neigh_gps) )haversine_distances *= earth_radius

This will give the full matrix, but be aware, for largers numbers it will take long, and expect hit memory limitation.

You could use numpy\'s np.argmin(haversine_distances, axis=1) to get similar results from the BallTree. It will result in the index of the closest in distance, which can be used just like in the BallTree example.

本文链接: http://dfdist.immuno-online.com/view-762429.html

发布于 : 2021-03-25 阅读(0)