Say I have strings like these:
bunch of other html<a href="http://domain.com/133742/The_Token_I_Want.zip" more html and stuff bunch of other ht开发者_StackOverflow中文版ml<a href="http://domain.com/12345/another_token.zip" more html and stuff bunch of other html<a href="http://domain.com/0981723/YET_ANOTHER_TOKEN.zip" more html and stuff
What is the regular expression to match The_Token_I_Want, another_token, YET_ANOTHER_TOKEN?
Appendix B of RFC 2396 gives a doozy of a regular expression for splitting a URI into its components, and we can adapt it for your case
^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*/([^.]+)[^?#]*)(\?([^#]*))?(#(.*))?
                                     #######
This leaves The_Token_I_Want in $6, which is the “hashderlined” subexpression above. (Note that the hashes are not part of the pattern.) See it live:
#! /usr/bin/perl
$_ = "http://domain.com/133742/The_Token_I_Want.zip";    
if (m!^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*/([^.]+)[^?#]*)(\?([^#]*))?(#(.*))?!) {
  print "$6\n";
}
else {
  print "no match\n";
}
Output:
$ ./prog.pl The_Token_I_Want
UPDATE: I see in a comment that you're using boost::regex, so remember to escape the backslash in your C++ program.
#include <boost/foreach.hpp>
#include <boost/regex.hpp>
#include <iostream>
#include <string>
int main()
{
  boost::regex token("^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*"
                     "/([^.]+)"
                   //  ####### I CAN HAZ HASHDERLINE PLZ
                     "[^?#]*)(\\?([^#]*))?(#(.*))?");
  const char * const urls[] = {
    "http://domain.com/133742/The_Token_I_Want.zip",
    "http://domain.com/12345/another_token.zip",
    "http://domain.com/0981723/YET_ANOTHER_TOKEN.zip",
  };
  BOOST_FOREACH(const char *url, urls) {
    std::cout << url << ":\n";
    std::string t;
    boost::cmatch m;
    if (boost::regex_match(url, m, token))
      t = m[6];
    else
      t = "<no match>";
    std::cout << "  - " << m[6] << '\n';
  }
  return 0;
}
Output:
http://domain.com/133742/The_Token_I_Want.zip: - The_Token_I_Want http://domain.com/12345/another_token.zip: - another_token http://domain.com/0981723/YET_ANOTHER_TOKEN.zip: - YET_ANOTHER_TOKEN
/a href="http://domain.com/[0-9]+/([a-zA-Z_]+).zip"/
Might want to add more characters to [a-zA-Z_]+
You can use:
(http|ftp)+://[[:alnum:]./_]+/([[:alnum:]._-]+).[[:alnum:]_-]+
([[:alnum:]._-]+) is a group for the matched pattern, and in your example its value will be The_Token_I_Want. to access this group, use \2 or $2, because (http|ftp) is the first group and ([[:alnum:]._-]+) is the second group of the matched pattern.
Try this:
/(?:f|ht)tps?:/{2}(?:www.)?domain[^/]+.([^/]+).([^/]+)/i
or
/\w{3,5}:/{2}(?:w{3}.)?domain[^/]+.([^/]+).([^/]+)/i
First, use an HTML parser and get a DOM. Then get the anchor elements and loop over them looking for the hrefs. Don't try to grab the token straight out of a string.
Then:
The glib answer would be:
/(The_Token_I_Want.zip)/
You might want to be a little more precise then a single example.
I'm guessing you are actually looking for:
/([^/]+)$/
m/The_Token_I_Want/
You'll have to be more specific about what kind of token it is. A number? A string? Does it repeat? Does it have a form or pattern to it?
It's probably best to use something smarter than a RegEx. For example, if you're using C# you could use the System.Uri class to parse it for you.
 
         
                                         
                                         
                                         
                                        ![Interactive visualization of a graph in python [closed]](https://www.devze.com/res/2023/04-10/09/92d32fe8c0d22fb96bd6f6e8b7d1f457.gif) 
                                         
                                         
                                         
                                         加载中,请稍侯......
 加载中,请稍侯......
      
精彩评论