php - How can I efficiently identify the a natural division point between two sets of numbers? -


i have 2 datasets (a & b). each have 1000 numbers.

99% of time: < 5 <= b

however, 1% of time b < 5 < a.

if division point unknown - x - how can 1 determine x given dataset?

obviously max(a) , min(b) misleading. , i'd prefer not loop through entire range (or between min(b) , max(a)) guessing , identifying greatest probable division point.

sample dataset  1 1 1 2 b 2 <--anomoly 3 3 3 4 5 <--anomoly b 5 <--division, or `x` b 5 b 5 b 5 6 <--anomoly b 7 b 8 b 8 b 8 b 9 b 9 b 10 b 10 

assume pair of datasets exists (c & d). how can find point c becomes d after allowing threshold of anomalies.

what recommend?

here's rough "guessing" strategy. i'd same without "guessing" loop.

$maxprobable = 0; $pointofdivision = 0; ($i = min($b); $i <= max($a); $i++) {     // probability $i in_array($a)     $countbelow = below($i,$a); // assume function returns count of $a items below $i     $countabove = above($i,$b); // assume function returns count of $b items above $i     $probbelow = $countbelow/count($a);     $probabove = $countabove/count($b);     if (($probbelow+$probabove) > $maxprobable) {         $maxprobable = $probbelow+$probabove;         $pointofdivision = $i;     } } echo $pointofdivision; 

this well-known problem in statistics , machine learning: given number of labeled datapoints, determine likeliest label new datapoint. in 1d case boils down determining threshold value x , saying "anything below x has label a" , "anything above x has label b."

there many algorithms: use example logistic regression, neural networks, or support vector machines. choice of algorithm depends on can assumed of data , on tools , libraries have available; example svm apparently tricky implement yourself.

if told how data generated or if comes known statistical distribution there might shortcut solution that's less complex still adequate.


Comments

Popular posts from this blog

java - activate/deactivate sonar maven plugin by profile? -

python - TypeError: can only concatenate tuple (not "float") to tuple -

java - What is the difference between String. and String.this. ? -