我想要的是另一个表格,该表格过滤了最接近邻居的城市。例如,对于“ Neigh1”,City4是最近的(距离最小)。所以我想要下表
city_A neigh_B Dist(km)0 City4 Neigh1 5.561 City3 Neigh2 4.322 City1 Neigh3 7.933 City2 Neigh4 3.214 City4 Neigh5 4.565 City5 Neigh6 6.676 City3 Neigh7 6.16 ..... and so on
城市名称是否重复都没关系,我只想将最近的对保存到另一个csv。怎么执行,专家,请帮忙!
解决方案
如果只想为每个邻域提供最近的城市,则不需要计算完整距离矩阵。
这是一个工作代码示例,尽管我得到的输出与您的输出不同。可能是经纬度错误。
我用了你的资料
import pandas as pdimport numpy as npimport sklearn.neighborslocations_stores = pd.DataFrame({ \'city_A\' : [\'City1\', \'City2\', \'City3\', \'City4\', ], \'latitude_A\': [ 56.361176, 56.34061, 56.374749, 56.356624], \'longitude_A\': [ 4.899779, 4.871195, 4.893847, 4.912281]locations_neigh = pd.DataFrame({ \'neigh_B\': [\'Neigh1\', \'Neigh2\', \'Neigh3\', \'Neigh4\',\'Neigh5\'], \'latitude_B\' : [ 53.314, 53.318, 53.381, 53.338,53.7364], \'longitude_B\': [ 4.955,4.975,4.855,4.873,4.425]
创建了一个BallTree我们可以查询
from sklearn.neighbors import BallTreeimport numpy as npstores_gps = locations_stores[[\'latitude_A\', \'longitude_A\']].valuesneigh_gps = locations_neigh[[\'latitude_B\', \'longitude_B\']].valuestree = BallTree(stores_gps, leaf_size=15, metric=\'haversine\')
对于每个邻居,我们要最接近(k=1)城市/商店:
distance, index = tree.query(neigh_gps, k=1)earth_radius = 6371distance_in_km = distance * earth_radius
我们可以使用以下命令创建结果的DataFrame
pd.DataFrame({ \'Neighborhood\' : locations_neigh.neigh_B, \'Closest_city\' : locations_stores.city_A[ np.array(index)[:,0] ].values, \'Distance_to_city\' : distance_in_km[:,0]
这给了我
Neighborhood Closest_city Distance_to_city0 Neigh1 City2 19112.3341061 Neigh2 City2 19014.1547442 Neigh3 City2 18851.1687023 Neigh4 City2 19129.5551884 Neigh5 City4 15498.181486
Since our output is different, there is some mistake to correct. Maybe swapped lat/long, I am just guessing here. But this is the approach you want, especially for the amounts of your data.
Edit: For the Full matrix, use
from sklearn.neighbors import DistanceMetricdist = DistanceMetric.get_metric(\'haversine\')earth_radius = 6371haversine_distances = dist.pairwise(np.radians(stores_gps), np.radians(neigh_gps) )haversine_distances *= earth_radius
This will give the full matrix, but be aware, for largers numbers it will take long, and expect hit memory limitation.
You could use numpy\'s np.argmin(haversine_distances, axis=1) to get similar results from the BallTree. It will result in the index of the closest in distance, which can be used just like in the BallTree example.
本文链接: http://dfdist.immuno-online.com/view-762429.html