Friday, May 02, 2014
Beginner’s C# Part 4: Regular Expressions
Last time we left off with a GUI for our keyword checker program. What would be the next logical building block to add now that we have our checker with a nice preview pane?
Well the preview pane is not too handy if it does not show the part that we are looking for, so let us improve that today. We will use regular expressions to achieve this. In short, a regular expression acts as a text search, but instead of using a specific keyword to search for, it is generally more effective with a pattern that finds a set of text results that match this pattern. So our first piece of code is to add a reference to System.Text.RegularExpressions at the top of our form code.
But first, let’s diverge a bit to describe how we are going to scroll the preview pane to our desired location. Since the web page is in HTML format, we can find an HTML element on the page that we can just scroll to. We just need the id of that element. Thankfully, with regular expressions we can get a list of all the ids inside the page, and then scroll to the one that is closest to our content. Simple.
The regex (a shorter term for regular expression) that we are going to use to find all matches of element ids is this: @"id=""\s*?\S*?"""
And we will use it as follows:
var pageIds = Regex.Matches(pageContent, @"id=""\s*?\S*?""");
This will give us the page ids we wanted and conveniently store them as a list of matches in our pageids variable. Now we also need a private function that will give us the closest element to our content. A function is a piece of code that does a specific task, and we usually create a function for each simple task we need, so that we can use it in different parts of our program without having to rewrite the same code over and over. It could also be used by other programs, if it weren’t for that private adjective I’ve used (in technical terms called access modifier). The private access modifier limits the way that the function can be used only within the same class, in our case the program’s form. We are happy with that, so let’s move on.
Here’s our function:
private string closestId(int keywordLocation,
int? closestId = null;
string closestIdName = null;
foreach (Match id in matchingIds)
if (closestId != null)
int idDistance = Math.Abs(id.Index - keywordLocation);
if (idDistance < closestId.Value)
closestId = idDistance;
closestIdName = id.Value;
closestId = Math.Abs(id.Index - keywordLocation);
closestIdName = id.Value;
The function, which I named closestId, will take two parameters. The first one is the index of our original keyword search (which is described in the first part of the tutorial), and the second parameter is the list of regex matches. What is important is that this list of matches contains the id and index of each match. What this function does is to iterate through the list of matches in order to find the closest one to our keywordLocation. The distance between each match and the keyword is calculated with the absolute distance function called Math.Abs (now that is a handy public function!). Every time that a new minimum distance is found, we store the value of this distance until we find a better one, whereby it will replace the current minimum. Initially the value of the closest distance is null, so the first match in the list will always be set as the closest in the first iteration. Once the loop ends, we just return the name of the closest id that we found. The function would then be called from the main function like this:
string matchedId = closestId(keywordLocation, pageIds);
Actually, we just need the id of the element without the id= part, so let’s go ahead and strip it off:
string idTag = matchedId.Substring(4, matchedId.Length - 5);
This last piece of code can also go inside the closestId function, so feel free to put it there. The last piece of the puzzle is to navigate to the page as we did before, but by adding the id to the url (prefixed with a hash sign) we get the nice effect of scrolling to the element with this id into view.
brwPreview.Navigate(url + "#" + idTag);
This method is not guaranteed to work 100% of the time, as some website may not have any Id elements or the id of the closest element may not be so close to our content, but it’s a start. I also increased the size of the window from the previous tutorial so that we have more space for the preview pane. The full source code for this tutorial is available on GitHub. Here is a sample screenshot.