Make recommendation logic in PHP

Aug 26, 2020 PHP Laravel

A memorandum when creating a recommendation function for an application created with Laravel

I was able to do the k-nearest neighbor method (modoki) without using python, but I think Python that can be used with pandas, numpy, etc. is easier to use. There may be a float precision problem.

Premise

・Recommend items from multiple question results ・Question is Yes・No (ie, 0 or 1) ・Initial data can be input ・Learn the answer results of actual questions and make recommendations that are more practical ・The number of recommended items is not too high (a few hundred)

#Design

Enter the question result pattern as an initial value for each recommended item
Capture the answer pattern as a vector component ([1,0,1,1,~,0])


$itemA = [1,0,1,1,1,0,~,1,0]
  ~
$itemN = [0,0,1,0,1,0,~,1,0]

Calculate the Euclidean distance between the actual response vector component and the recommended candidate item, and obtain and recommend them in ascending order

foreach($items as $coordinate_points) {// items = ["itemA" => [0,1,0,0....], "itemB" => [0,1,1,..] .. .]
    $total = 0;
    foreach($coordinate_points as $coordinate_point) {
        $t += abs($coordinate_point-$actual_answer_point) ** 2
    }
    $euc_distance = $total ** 1/2
}
// $euc_distances = ["itemA" => 80, "itemB" => 14, ....]

Acquire and save the actual selected item as a recommendation result
When the actual selection data of recommended candidate items exceeds a certain number, the center of gravity of the actual answer group is obtained and the answer pattern of the corresponding item is newly obtained.

// $merged_scores = ["q1" => [1,0,1,1,1], "q2" => [0,0,0,1,0] ...]
foreach ($merged_scores as $question_id => $coordinate_points) {
    $new_coordinate_point = array_sum($coordinate_points) / $merged_scores->count(); //balance
    if (!is_nan($new_coordinate_point)) {//prebend NaN
        //update vector components
}

#Summary I tried initial implementation in an infinitely simple form, but there are many problems such as points that do not consider outliers and points that are not suitable when the number of recommended items becomes innumerable due to full search, experimental It is necessary to implement it in this form, collect actual data, and check the correlations, and change it to a more realistic form. As for the calculation in the first place, it may actually be faster to let Euclidean distance calculation in SQL if a relation can be constructed.