Well, as mentioned above the speed is O(N^3), i've done a longest common subsequence way that is O(m.n) where m and n are the length of str1 and str2, the result is a percentage and it seems to be exactly the same as similar_text percentage but with better performance... here's the 3 functions i'm using..



<?php

function LCS_Length ( $s1 , $s2 )

{

$m = strlen ( $s1 );

$n = strlen ( $s2 );



$LCS_Length_Table = array(array( 128 ),array( 128 ));





for( $i = 1 ; $i < $m ; $i ++) $LCS_Length_Table [ $i ][ 0 ]= 0 ;

for( $j = 0 ; $j < $n ; $j ++) $LCS_Length_Table [ 0 ][ $j ]= 0 ;



for ( $i = 1 ; $i <= $m ; $i ++) {

for ( $j = 1 ; $j <= $n ; $j ++) {

if ( $s1 [ $i - 1 ]== $s2 [ $j - 1 ])

$LCS_Length_Table [ $i ][ $j ] = $LCS_Length_Table [ $i - 1 ][ $j - 1 ] + 1 ;

else if ( $LCS_Length_Table [ $i - 1 ][ $j ] >= $LCS_Length_Table [ $i ][ $j - 1 ])

$LCS_Length_Table [ $i ][ $j ] = $LCS_Length_Table [ $i - 1 ][ $j ];

else

$LCS_Length_Table [ $i ][ $j ] = $LCS_Length_Table [ $i ][ $j - 1 ];

}

}

return $LCS_Length_Table [ $m ][ $n ];

}



function str_lcsfix ( $s )

{

$s = str_replace ( " " , "" , $s );

$s = ereg_replace ( "[��������]" , "e" , $s );

$s = ereg_replace ( "[������������]" , "a" , $s );

$s = ereg_replace ( "[��������]" , "i" , $s );

$s = ereg_replace ( "[���������]" , "o" , $s );

$s = ereg_replace ( "[��������]" , "u" , $s );

$s = ereg_replace ( "[�]" , "c" , $s );

return $s ;

}



function get_lcs ( $s1 , $s2 )

{

$s1 = strtolower ( str_lcsfix ( $s1 ));

$s2 = strtolower ( str_lcsfix ( $s2 ));



$lcs = LCS_Length ( $s1 , $s2 ); $ms = ( strlen ( $s1 ) + strlen ( $s2 )) / 2 ;



return (( $lcs * 100 )/ $ms );

}

?>



you can skip calling str_lcsfix if you don't worry about accentuated characters and things like that or you can add up to it or modify it for faster performance, i think ereg is not the fastest way?

hope this helps.

Georges