The usage of twitter by its own web interface or via the most recent clients, accustomed users to see linked @usernames and #hashtags inside any status update they read.
When using Twitter API on your own website, service or app, you need to deal with plain text tweet, with no tags so no links. Using Regular Expressions it is possible to create links easily and quickly, in quite any programming language.
As what really matters is the regular expression itself and not the programming language used, I would just show 2 examples, using PHP and PERL .
A quick introduction
Let’s imagine we interrogated twitter API and got back a tweet like the following one
just saw @johndoe talking at #someforum about his product http://bit.ly/foo
to do a good job we need to create 3 links:
- @johndoe will be linked to its twitter profile
- #someforum to the related search on twitter
- http://bit.ly/foo to its own link
What we want to achieve is to replace any of these strings with an HTML link which points to the correct URL. We obviously need to consider that it is possible to have more then one keyword for each type.
Replacing functions in PHP and Perl
As we said before, we have the necessity to replace some text with a link, let’s see what functions we’re going to use
PHP
Php allows developers to interface with regex substitutions using native function preg_replace . The needed syntax is explained below
preg_replace('/search/','replace',$source);
In our case search will be the regular expression that we are going to build, replace is the link and source our initial tweet.
Perl
Perl uses a different syntax, making use of ~ regular expression native operator
$tweet =~ s/search/replace/g;
where $tweet is both our source and the output, search is our regex and replace our link.
Look for @usernames
First of all let’s create a regular expression able to recognize usernames quoted on a tweet
/@([a-zA-Z0-9_]*)/
As twitter usernames allow only alphanumeric characters plus the underscore the regular expression was built to match just them in the username, so that email addresses are excluded (they contain dots).
REGULAR EXPRESSION DETAILS
- the initial @ is needed to start finding the username. It’s outside the parenthesis so it’s not returned by the regular expression and we need to remember to write it again the in output.
- inside the parenthesis () there’s everything will get back.
- inside the squares [] there are the characters we do allow to be present in the username. a-z is any lower Latin char, A-Z any uppercase char, 0-9 any number and _ , well the underscore
- the asterisk * indicates that each of the previous character might be present for infinite times in our matching string. This just means we do not exactly know how long the username will be
CHOOSE THE RIGHT OUTPUT
The link needed for any @mention is the twitter profile of that user. As our regular expression will be returning just one value we’ll use that value to construct that link. Remember that our regular expression will return just the username, without the [at]!
<a href="http://twitter.com/$1" title="$1 profile on Twitter" rel="nofollow">@$1</a>
Note that $1 is the username returned by the regular expression, and that rel=”nofollow” is used for page rank safety.
Finally, having both the regular expression to search and the replacement link, we can proceed to create the code
PHP
$tweet ='just saw @johndoe talking at #someforum about his product http://bit.ly/foo'; $regex = '/@([a-zA-Z0-9_]*)/'; $link_pattern = '<a href="http://twitter.com/$1" title="$1 profile on Twitter" rel="nofollow">@$1</a>'; $tweet = preg_replace($regex,$link_pattern,$tweet);
Perl
my $tweet ='just saw @johndoe talking at #someforum about his product http://bit.ly/foo'; $tweet =~ s/@([a-zA-Z0-9_]*)/<a href="http://twitter.com/$1" title="$1 profile on Twitter" rel="nofollow">@$1</a>/g;
Both the above codes (if echoed or printed ) would output
just saw @johndoe talking at #someforum about his product http://bit.ly/foo
Look for #hashtags
The regular expression needed for hashtags is just the same as the one we saw above for @usernames, obviously replacing the @ with an hash #. The output will be similar too, but changing from the twitter domain to the search one like this
<a href="http://search.twitter.com/search?q=%23$1" title="search for $1 on twitter" rel="nofollow">#$1</a>
where %23 is the urlencoded symbol for the hash and all the other parameters have already been explained.
As you should understand everything about the previous regex I will just list the complete codes.
PHP
$tweet ='just saw @johndoe talking at #someforum about his product http://bit.ly/foo'; $regex = '/#([a-zA-Z0-9_]*)/'; $link_pattern = '<a href="http://search.twitter.com/search?q=%23$1" title="search for $1 on Twitter" rel="nofollow">#$1</a>'; $tweet = preg_replace($regex,$link_pattern,$tweet);
Perl
my $tweet ='just saw @johndoe talking at #someforum about his product http://bit.ly/foo'; $tweet =~ s/\#([a-zA-Z0-9_]*)/<a href="http://search.twitter.com/?q=%23$1" title="search for $1 on Twitter" rel="nofollow">#$1</a>/g;
Both previous codes would output
just saw @johndoe talking at #someforum about his product http://bit.ly/foo
Look for links
Latest substitution we’re going to perform is about links. To match the links we’re going to look for any kind of word located between http and a space, or a parenthesis. Here’s the regex
/http([s]?):\/\/([^\ \)$]*)/
REGULAR EXPRESSION DETAILS
- http is outside our matching, it’s just needed to search for urls in the tweet
- ([s]?) is a simple trick to match both http and https urls. It matches and returns the ‘s’ just if it exists “?”
- :// is just the next part of the url
- ([^\ \)$]*) matches infinite characters ( * ) that might be anything except “^” the space “\ ” a closing parenthesis “\)” or the end of the string “$”
CHOOSE THE RIGHT OUTPUT
Obviously for a link we just need to put the <a> tag before the link and close it immediately after. However because of our precision on the protocol matching (http or https ) we need to use 2 variables in our output: $1 will be the s (if present), $2 will be the real url. Below is the complete output
<a href="http$1://$2" rel="nofollow" title="$2">http$1://$2</a>
So here are the two examples
PHP
$tweet ='just saw @johndoe talking at #someforum about his product http://bit.ly/foo'; $regex = '/http([s]?):\/\/([^\ \)$]*)/'; $link_pattern = '<a href="http$1://$2" rel="nofollow" title="$2">http$1://$2</a>'; $tweet = preg_replace($regex,$link_pattern,$tweet);
Perl
my $tweet ='just saw @johndoe talking at #someforum about his product http://bit.ly/foo'; $tweet =~ s/http([s]?):\/\/([^\ \)$]*)/<a href="http$1://$2" rel="nofollow" title="$2">http$1://$2</a>
Both previous codes would output
just saw @johndoe talking at #someforum about his product http://bit.ly/foo
The complete code
Just as reference I will report for each language the complete code of this example.
PHP
$tweet ='just saw @johndoe talking at #someforum about his product http://bit.ly/foo'; $regex = '/http([s]?):\/\/([^\ \)$]*)/'; $link_pattern = '<a href="http$1://$2" rel="nofollow" title="$2">http$1://$2</a>'; $tweet = preg_replace($regex,$link_pattern,$tweet); $regex = '/@([a-zA-Z0-9_]*)/'; $link_pattern = '<a href="http://twitter.com/$1" title="$1 profile on Twitter" rel="nofollow">@$1</a>'; $tweet = preg_replace($regex,$link_pattern,$tweet); $regex = '/\#([a-zA-Z0-9_]*)/'; $link_pattern = '<a href="http://search.twitter.com/search?q=%23$1" title="search for $1 on Twitter" rel="nofollow">\#$1</a>'; $tweet = preg_replace($regex,$link_pattern,$tweet);
Perl
my $tweet ='just saw @johndoe talking at #someforum about his product http://bit.ly/foo'; $tweet =~ s/http([s]?):\/\/([^\ \)$]*)/<a href="http$1://$2" rel="nofollow" title="$2">http$1://$2</a> $tweet =~ s/@([a-zA-Z0-9_]*)/<a href="http://twitter.com/$1" title="$1 profile on Twitter" rel="nofollow">@$1</a>/g; $tweet =~ s/\#([a-zA-Z0-9_]*)/<a href="http://search.twitter.com/?q=%23$1" title="search for $1 on Twitter" rel="nofollow">\#$1</a>/g;
Possibly related posts: (automatically generated)
- Related posts on perl
- SCS Concordia » Blog Archive » (CANCELLED) Introduction to Perl …
- Debian update for libhtml-parser-perl « Bug-Blog
- Introducing XML::Easy « As Thick As Two Short Planks – Mark …

Pingback: Links in twitter feeds in Liferea | Callum Macdonald