开发者

Searching a text for predefined words

开发者 https://www.devze.com 2023-04-05 00:49 出处:网络
Hi I have a database table that looks like this word_id int(10) word varchar(30) And I have a text, I wanna see which one of the words in this text are defined in that table, what\'s the most elega

Hi I have a database table that looks like this

word_id int(10)
word varchar(30)

And I have a text, I wanna see which one of the words in this text are defined in that table, what's the most elegant way of doing this?

Currently I query the database for all the words, t开发者_JAVA技巧hen using PHP I search for each word in the whole text, so it takes a long time for PHP to download all the words from database, and then it checks each and everyone of them against my text.


You can try to extract the words in the text and put them in a SELECT query like this:

$words = array_unique(get_words_in_text(...));
$sql = "SELECT * FROM words WHERE word IN (".implode(", ", $words)).")";

Might be that your SQL engine optimizes this statement. In any case, the database connection is utilized less than it is in your current approach.

You can also try to temporarily create a separate word table and add all words in the text to that table. Then you can perform a JOIN with the main word table. If both tables are indexed properly, this might be quite fast.

EDIT: This question/answer suggests that creating a temporary table is indeed faster (see comments): mysql select .. where .. in -> optimizing. However, it certainly depends on the concrete database you're using, the size of your word table, the size of the texts and the configuration of your index(es). Thus, I recommend evaluating both approaches for your specific scenario. Please report your results. :-)


An idea:

// get words in file into array
$file = file_get_contents('file.txt', FILE_IGNORE_NEW_LINES);
$file_words = explode(" ", $file);

// remove duplicate words, count elements in array after de-duplication
$file_words = array_unique($file_words);
$file_count = count($file_words);

// create empty array in which to store hits
$words_with_definition = array();

// check to see if each word exists in database
for ($i=0; $i < $file_count; $i++)
{
    // intentionally leaving out db connection, this is just a concept
    // word should be at least three characters, change as needed
    if (strlen($file_words[$i]) >= 3)
    {
        $sql = "SELECT word FROM your_table WHERE word='".$file_words[$i]."'";

        if (mysql_num_rows($sql) > 0)
        {
            // this is a hit, add it to $words_with_definition
            array_push($words_with_definition, $file_words[$i]);
        }
    }
}

Whatever is in the $words_with_definition array will be the words that hit off the database.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号