开发者

Comparing keywords in a page or CSV file: PHP ? Bash?

开发者 https://www.devze.com 2023-03-16 15:54 出处:网络
I have a series of keywords in an HTML web page - they are comma separated so I could get them to CSV, and would like 开发者_Go百科to know which ones are NOTin another CSV file displayed as an html we

I have a series of keywords in an HTML web page - they are comma separated so I could get them to CSV, and would like 开发者_Go百科to know which ones are NOTin another CSV file displayed as an html web page. How would you do that comparison ? I have ideas for mysql and tables but this is CSV or html sources. Thanks !


In Python, given 2 csv files, a.csv and b.csv, this script will create (or edit if it already exists) a new file out.csv that contains everything in a.csv that's not found in b.csv.

import urllib

url = 'http://www.website.com/x.csv'
urllib.urlretrieve(url, 'b.csv')


file_a = open('a.csv', 'r')
file_b = open('b.csv', 'r')    
file_out = open('out.csv', 'w')

list_a = [x.strip() for x in file_a.read().split(',')]
list_b = [x.strip() for x in file_b.read().split(',')]    
list_out = list(set(list_a) - set(list_b)) # Reverse if necessary

file_out.write(','.join(list_out))
file_out.close()


If it is just a list of keywords, you want to do a search and replace (you can use sed) to replace all the commas with carriage returns. So you will end up with a file containing one keyword on each line. Do that to both versions of the list. Then use the "join" command:

join -v 1 leftfile rightfile

This will report all the entries in leftfile that are not in rightfile. Don't forget to sort the files first, or join won't work. There is a bash tool for sorting too (it's called, not surprisingly, "sort").


PHP solution.. Get keywords as strings, convert then in arrays and use array_diff function:

<?php
$csv1 = 'a1, a2, a3, a4';
$csv2 = 'a1, a4';

$csv1_arr = explode(',', $csv1);
$csv2_arr = explode(',', $csv2);

$diff = array_diff($csv1_arr, $csv2_arr);
print_r($diff);

?>

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号