In this short introduction to C#, I tried to do something different by making a program that scrapes a website to report if today it is showing content that we want to see. The act of scraping for an automated program is to retrieve a website's content in order to obtain some useful information. The example that we will build will check if our favourite website (e.g. gametrailers.com) are posting some information about our favourite game (e.g. Final Fantasy) and any of its modern versions and updates. If so, we can then visit the Gametrailers website safe in the knowledge that we will view something about Final Fantasy.

First, open Visual Studio and create a new console application. I named mine keywordCheck, but you are free to chose your own name.

This will create a standard program class containing a Main method that will be executed every time that we run the program. It is currently empty, so let us fix that.

Since we will be using the system's web client library to connect to and fetch the required page, let us add a reference to that at the top of our class:

using System.Net;

Now let us try to fetch the page that we require, using the following code:

static void Main(string[] args)
{
var client = new WebClient();
var url = "http://www.gametrailers.com";
Console.Write(client.DownloadString(url));
Console.ReadKey();
}

Here, we are first initialising a new instance of a web client and setting it to the client variable. Then we are setting the url variable with the required url to fetch, and finally we instruct the client to fetch this url for us and output the page’s HTML to the console. When we run the program, we can confirm that we are indeed fetching the page:

That’s great, but we’re still not there yet. Let us add a new variable to hold the keywords in. Then we can make a check to see if these keywords are included in the downloaded web page. If the website includes the text that we are looking for, we display a confirmation message:

    var client = new WebClient();
var url = "http://www.gametrailers.com";
var keywords = "final fantasy";
var pageContent = client.DownloadString(url);
if (pageContent.IndexOf(keywords, StringComparison.OrdinalIgnoreCase)
>= 0)
{
Console.WriteLine(url + " are talking about " + keywords +
" today.");
}
Console.ReadKey();

The IndexOf method will return a positive number if the text is found. This number indicates the position in the page where the keywords were found. We also instruct this method to ignore the case when comparing strings so we make sure to still find the keywords even if they are in a different case. The if statement will display a message if the returned number from IndexOf is positive.

To finish off this tutorial, we will also display a snippet of the text where the keywords are included in the fetched website. Nothing big and fancy, but it will give us a general idea of what the page’s content is.

static void Main(string[] args)
{
var client = new WebClient();
var url = "http://www.gametrailers.com";
var keywords = "final fantasy";
var pageContent = client.DownloadString(url);
var keywordLocation = pageContent.IndexOf(keywords, StringComparison
.OrdinalIgnoreCase)>= 0)
if (keywordLocation >= 0)
{
Console.WriteLine(url + " are talking about " + keywords +
" today.");
Console.WriteLine("\nSnippet:\n" + pageContent.Substring(
keywordLocation, 100));
}
Console.ReadKey();
}

And here is the result:

Next time we will see how to improve upon this code, like adding command line parameters or a GUI, for example. A full version of the code is also available on github here, part 2 of the tutorial here.

0 comments

Add your comments