开发者

pattern recognition and string matching

开发者 https://www.devze.com 2023-04-09 22:56 出处:网络
I have two files ta开发者_如何转开发ken from two different server. In these two files are presents the matches of some football teams. As you know football teams can be called with differents names.

I have two files ta开发者_如何转开发ken from two different server. In these two files are presents the matches of some football teams. As you know football teams can be called with differents names. I would like implement a code that can recognise the same football match in the two files in order to take same variables from a file and some other from the other file. for example in one file i have a match called

Derry City - Bray Wanderers

and in the other file i have the same match that is called

Derry City - Bray 

how can i do this? i have no ideas.


Very simple script to replace aliases for teams. You'll need to fill it with aliases yourself, I made some up. If you have multiple games, the hash will overwrite the existing ones, as long as all the aliases are exchanged for full names.

#!/usr/bin/perl
use strict;
use warnings;

my %games;
while (<DATA>) {
    chomp;
    my ($home, $guest) = split /\s*-\s*/, $_, 2;
    $home  = get_name($home);
    $guest = get_name($guest);
    $games{"$home - $guest"} = 1;
}

sub get_name {
# Return the full name for the team, if it exists, otherwise return the original
    my %alias = (
        'Derry'     => 'Derry City',
        'Brawlers'  => 'Beijing',
        'Dolphins'  => 'Miami',
        'Bray'      => 'Bray Wanderers',
    );
    return $alias{$_[0]} // $_[0];
}

use Data::Dumper;
print Dumper \%games;

__DATA__
Derry City - Bray Wanderers
Derry City - Bray
Brawlers - Dolphins
Beijing - Miami
Miami - Beijing


In c++: Have a look at Boost.Regex and Boost.Tokenizer as they will do what you need. All you need is a pattern to match.

boost::regex("Bray[\s]*(Wanderers)?", boost::regex::icase);

Or something like that -- easy to set up as a set of unit tests.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号