Calculating similarity of words

int similar_text ( string first, string second [, float percent])

You can calculate how similar two words are by using the function similar_text. It takes a minimum of two parameters (the words to check) and returns an integer reporting the number of letters matched between the two. Note that this is actually quite a smart function, as demonstrated by the following script:

<?php
    $word1
= "Connotation";
    
$word2 = "Annotation";
    
$match = similar_text($word1, $word2);
    echo
"$match letters are the same between '$word1' and '$word2'\n";
?>

That outputs "9 letters are the same between 'Connotation' and 'Annotation'" - it sees that the two words are identical apart from the opening letter or two.

For additional power you can pass a third parameter where PHP will store a percentage score of the match. This makes our script look like this:

<?php
    $word1
= "Connotation";
    
$word2 = "Annotation";
    
$match = similar_text($word1, $word2, $percent);
    
$percent = round($percent, 2);
    echo
"$match letters are the same between '$word1' and '$word2': a $percent% match.\n";
?>

This time you should get "9 letters are the same between 'Connotation' and 'Annotation': a 85.71% match.". If you don't put the call to round() in there the percentage is likely to be very long!

So, with this new way to compare similarities of words, we can rewrite our sentence suggestion script so that it reports how alike each suggestion is to the original word: now we're finally getting to the stage of making cool scripts with this stuff!

<?php
    $pspell
= pspell_new("en");
    
$sentence = "The quik brown fox jumpd over the lazyyy dog";
    
$words = explode(" ", $sentence);

    foreach(
$words as $word) {
        if (
pspell_check($pspell, $word)) {
            
// this word is fine; print as-is
            
echo $word, " ";
        } else {
            
// this word is bad; look for suggestions
            
$suggestions = pspell_suggest($pspell, $word);

            if (
count($suggestions)) {
                
// we have suggestions for this word; print them out
                
echo " <SELECT>";

                foreach(
$suggestions as $suggestion) {
                    
$match = similar_text($word, $suggestion, $percent);
                    
$percent = round($percent, 2);

                    echo
"<OPTION>$suggestion ($percent%)</OPTION>";
                }

                echo
"</SELECT> ";
            } else {
                
// no suggestions; just print the word
                
echo $word;
            }
        }
    }
?>

That new script is just the combination of the previous two, so there should be no surprises in there. Having said that, looking at the screenshot you should notice that the suggestions for "jumpd" aren't sorted according to the absolute similarity with the word. That's a relatively easy fix to make, so I'll leave it as a challenge to you!

 

Next chapter: Templates >>

Previous chapter: Spellchecking and text matching

Jump to:

 

Home: Table of Contents

Follow us on Identi.ca or Twitter

Username:   Password:
Create Account | About TuxRadar