I have a very big .txt file with our开发者_开发技巧 clients order and I need to move it in a mysql database . However I don't know what kind of regex to use as the information is not very different .
----------------------- 4046904 KKKKKKKKKKK Laura Meyer MassMutual Life Insurance 153 Vadnais Street Chicopee, MA 01020 US 413-744-5452 lmeyer@massmutual.co... KKKKKKKKKKK 373074210772222 02/12 6213 NA ----------------------- 4046907 KKKKKKKKKKK Venkat Talladivedula 6105 West 68th Street Tulsa, OK 74131 US 9184472611 venkat.talladivedula... KKKKKKKKKKK 373022121440000 06/11 9344 NA -----------------------
I tried something but I couldn't even extract the name ... here is a sample of my effort with no success
$htmlContent = file_get_contents("orders.txt"); //print_r($htmlContent); $pattern = "/KKKKKKKKKKK(.*)\n/s"; preg_match_all($pattern, $htmlContent, $matches); print_r($matches); $name = $matches[1][0]; echo $name;
You may want to avoid regexes for something like this. Since the data is clearly organized by line, you could repeatedly read lines with fgets() and parse the data that way.
You could read this file with regex, but it may be quite complicated create a regex that could read all fields.
I recommend that you read this file line by line, and parse each one, detecting which kind of data it contains.
As you know exactly where your data is (i.e. which line its on) why not just get it that way?
i.e. something like
$htmlContent = file_get_contents("orders.txt");
$arrayofclients = explode("-----------------------",$htmlContent);
$newlinesep = "\r\n";
for($i = 0;i < count($arrayofclients);$i++)
{
$temp = explode($newlinesep,$arrayofclients[i]);
$idnum = $temp[0];
$name = $temp[4];
$houseandstreet = $temp[6];
//etc
}
or simply read the file line by line using fgets() - something like:
$i = 0;$j = 0;
$file = fopen("orders.txt","r");
$clients = [];
while ($line = fgets($ffile) )
{
if(line != false)
{
$i++;
switch($i)
{
case 2:
$clients[$j]["idnum"] = $line;
break;
case 6:
$clients[$j]["name"] = $line;
break;
//add more cases here for each line up to:
case 18:
$j++;
$i = 0;
break;
//there are 18 lines per client if i counted right, so increment $j and reset $i.
}
}
}
fclose ($f);
You could use regex's, but they are a bit awkward for this situation.
Nico
For the record, here is the regex that will capture the names for you. (Granted speed very well may be an issue.)
(?<=K{10}\s{2})\K[^\r\n]++(?!\s{2}-)
Explanation:
(?<=K{10}\s{2}) #Positive lookbehind for KKKKKKKKKK then 2 return/newline characters
\K[^\r\n]++ #Greedily match 1 or more non-return/newline characters
(?!\s{2}-) #Negative lookahead for return/newline character then dash
Here is a Regex Demo.
You will notice that my regex pattern changes slightly between the Regex Demo and my PHP Demo. Slight tweaking depending on environment may be required to match the return / newline characters.
Here is the php implementation (Demo):
if(preg_match_all("/(?<=K{10}\s{2})\K[^\r\n]++(?!\s{2}-)/",$htmlContent,$matches)){
var_export($matches[0]);
}else{
echo "no matches";
}
By using \K
in my pattern I avoid actually having to capture with parentheses. This cuts down array size by 50% and is a useful trick for many projects. The \K
basically says "start the fullstring match from this point", so the matches go in the first subarray (fullstrings, key=0) of $matches
instead of generating a fullstring match in 0
and the capture in 1
.
Output:
array (
0 => 'Laura Meyer',
1 => 'Venkat Talladivedula',
)
精彩评论