Getting content out of web pages - The WebRequest class

Name: *
My email: *
Recipient email: *
Message: *
Fields marked as bold are compulsory.
You haven't filled in compulsory values. The email is not correct

A simple project may be able to do just fine using its local resources, database and nothing but that. However an advanced application may be expected to use distributed resources, such as getting data out of web pages. In that case we need to make requests of our own in order to get that info. .Net contains the WebRequest class which can be quite helpful in such cases. We are going to use WebRequest's methods to get distant info, create a few useful examples and finally find out how we can create asynchronous requests using .Net 4.5.
 
 

Using the WebRequest

 
A request could be regarded as a basic part of the web architecture. A client sends a request to the server in order to get data. In return, the server will send the data back to the client.
 
create web request
 
We can also create requests using a few lines of code using the WebRequest class.  This is what this article is about. To begin with, we are going to create a simple GET requests to see how we can use it.
 
 
First we need to create a WebRequest object
WebRequest request = WebRequest .Create("http://dotnethints.com");
 
A request is by default a GET request. If however we would like to change that we could use the request.Method property.
Now we create the response 
WebResponse response = request.GetResponse();
 
So far, so good. Now we are going to get the page response. That's what we wanted in the first place. We will use a Stream and a StreamReader
 
Stream dataStream = response.GetResponseStream();
StreamReader reader = new StreamReader (dataStream);
string responseFromServer = reader.ReadToEnd();
 
responseFromServer is the data we are looking for. However we need to release the resources as soon as we are done.
 
reader.Close();
dataStream.Close();
response.Close();
 
That's all. Let's put it all together and create some C# source code which gets the response from a page requested.
 
private string GetWebPageString(string targetURL)
    {
        //Create a request to the referenced URL
        WebRequest request = WebRequest .Create(targetURL);
        //Get the response
        WebResponse response = request.GetResponse();
        //Create a Stream and StreamReader containing info from the server
        Stream dataStream = response.GetResponseStream();
        StreamReader reader = new StreamReader (dataStream);
        //Get the info
        string responseFromServer = reader.ReadToEnd();
        
        //Release resources
        reader.Close();
        dataStream.Close();
        response.Close();
 
        return responseFromServer;
    }
 
For example calling GetRequest("http://dotnethints.com"); will return the dotnethints home page HTML.
 
 
This way we got our HTML info. However this is usually not what we want to do. Most likely we would prefer to get the part of the page which is of interest to us.
 
Supposing a web page contains the following div
<div id="testDiv">This is a test div</div>
and what we want is to get the HTML contained within the testDiv div.
 
In that case we could use a method similar to the following,
 
private string HandleResponse(string responseText, string startString, string endString)
    {
 
        string[] responseParts = responseText.Split(new string[] { startString }, StringSplitOptions.None);
        string handledResponse = responseParts[1].Split(new string[] { endString }, StringSplitOptions.None)[0];
 
        //We could use regular expressions instead
        //string partPattern = startString + ".+" + endString;
        //Match partMatch = Regex.Match(responseText, partPattern);
        //string handledResponse = partMatch.Groups[0].Value.Remove(0, startString.Length);
        //handledResponse = handledResponse.Remove(handledResponse.Length - endString.Length);
 
        return handledResponse;
    }
 
We could now get our the desired info using 
HandleResponse(GetRequest("http://dotnethints.com","<div id=\"testDiv\">", "</div>"));
and get the result we've been expecting.
This is a test div
 
 
So far we created a GET request. In order to create a POST request we will first have to insert the POST parameters inside our HttpWebRequest object. The following piece of code looks like the one we used in the previous GET request. However we will need to set request's Method, ContentType, ContentLength and use a Stream object in order to send our parameters, which we have enclosed within a byte array, to the server.
 
 
    private string GetWebPageStringPost(string targetURL)
    {
        //Create a request to the referenced URL
        WebRequest request = (WebRequest )WebRequest.Create(targetURL);
        //Set request method
        request.Method = "POST";
        string formContent = "par1=1&par1=2";
        //Set parameters to a byte array
        byte[] byteArray = Encoding.UTF8.GetBytes(formContent);
        //Set content type and length
        request.ContentType = "application/x-www-form-urlencoded";
        request.ContentLength = byteArray.Length;
        //Use a Stream to send parameters
        Stream dataStream = request.GetRequestStream();
        dataStream.Write(byteArray, 0, byteArray.Length);
        dataStream.Close();
        //Get the response
        WebResponse response = request.GetResponse();
        dataStream = response.GetResponseStream();
        //Create a StreamReader containing info from the server
        StreamReader reader = new StreamReader(dataStream);
        //Get the info
        string responseFromServer = HttpUtility.UrlDecode(reader.ReadToEnd());
 
        //Release resources
        reader.Close();
        dataStream.Close();
        response.Close();
 
        return responseFromServer;
    }
 
 

A WebRequest example

 
dictionary web request
 
Before we go on, let's take a look at an interesting example. Supposing you want to create a dictionary and you are familiar enough with the Dictionary.com, so you know it is quite trustworthy a website. Doing some research, we find out that the search keyword is placed within the search url in the form of http://dictionary.reference.com/browse/search_keyword?s=t.
We also know that the primary result is always placed inside the following HTML code
<span class="dnindex">1.</span><div class="dndata">search_results</div>
Using both facts mentioned we can create a simple method that gets a string parameter and returns the primary result.
 
This is our method
private string UseOnlineDictionary(string keyword)
    {
        return HandleResponse(GetWebPageString("http://dictionary.reference.com/browse/" + keyword + "?s=t"), "<span class=\"dnindex\">1.</span><div class=\"dndata\">", "</div>");
    }
 
Using the following code
        Response.Write("dot: " + UseOnlineDictionary("dot"));
        Response.Write("<br/>net: " + UseOnlineDictionary("net"));
        Response.Write("<br/>hint: " + UseOnlineDictionary("hint"));
 
we get
 
dot: a small, roundish mark made with or as if with a pen.
net: a bag or other contrivance of strong thread or cord worked into an open, meshed fabric, for catching fish, birds, or other animals: a butterfly net.
hint: an indirect, covert, or helpful suggestion; clue: Give me a hint as to his identity.
 
We have succeeded in creating a simple dictionary!
 
Well, our dictionary will not be effective all the time as Dictionary.com does not always create the HTML format we presumed it does. For the sake of our example however we do not take all possible results and error handling into account .
 
Keep in mind that web pages are usually followed by privacy issues. You may probably not be allowed to use directly or copy the content of a web page in case the author is not aware of it. Thus, the previous dictionary example, no matter how simple or useful it may seem, is not suggested to be of use none other than development practice.
 

Getting web pages info asynchronously

 
We have already covered how to make asynchronous calls in a previous article. If you are not familiar with asynchronous operations I'd suggest you take a look before going any further. Moving on let's do the previous dictionary example in a WPF manner.
 
GetStringAsync is a method contained within the HttpClient class. It will cause an asynchronous request. We are going to use the Result parameter to get the response HTML following our request.
 
 
All we need is a TextBox named KeywordTextBox, a Label named ResultsLabel and a Button that calls the GetResultsButton_Click method. The user may insert his keyword in the KeywordTextBox and when the button is pressed he may get his results in the ResultsLabel. This time however, the request will be executed asynchronously. While we wait for the response to get to us we may use that valuable time to compute other staff.
 
In just a few words, when using asynchronous operations a method marked as async may use the await keyword to stop the code executing till all previous asynchronous methods are completed.
 
Here's the source code.
 
 
        // State the method as async, so you can use the await keyword
        private void GetResultsButton_Click(object sender, RoutedEventArgs e)
        {
            CreateAsyncRequest();
 
        }
 
        private async Task CreateAsyncRequest()
        {
            string keyword = KeywordTextBox.Text;
 
            //pageHTML will contain all page HTML
            string pageHTML = await GetDictionaryAsync(keyword);
 
            //Show results
            if (pageHTML != "")
                ResultsLabel.Content = HandleResponse(pageHTML, "<span class=\"dnindex\">1.</span><div class=\"dndata\">", "</div>");
            else
                //In case sth goes wrong with the GetDictionaryAsync we get an empty string
                ResultsLabel.Content = "Something went wrong. Please try again.";
        }
 
 
        private async Task<string> GetDictionaryAsync(string keyword)
        {
            try
            {
                HttpClient client = new HttpClient();
 
                //Create asynchronous operation
                //We are going to get the content of the dictionary web page asynchronously
                Task<string> getDotNetHintsStringTask = client.GetStringAsync("http://dictionary.reference.com/browse/" + keyword + "?s=t");
 
                //The method will still run at the same time as the asynchronous operation
                ComputeOtherStuff();
 
                //Here the method must wait for the response to complete
                await Task.WhenAll(getDotNetHintsStringTask);
                return getDotNetHintsStringTask.Result;
            }
            catch
            {
                return "";
            }
        }
 
 
        private void ComputeOtherStuff()
        {
            ResultsLabel.Content = "Getting data . . . In the meanwhile several operations are executed.";
        }
        
        //Get the part of the page we are interested in.
        private string HandleResponse(string responseText, string startString, string endString)
        {
 
            string[] responseParts = responseText.Split(new string[] { startString }, StringSplitOptions.None);
            string handledResponse = responseParts[1].Split(new string[] { endString }, StringSplitOptions.None)[0];
 
            return handledResponse;
        }
 
This is the way the example works.
 
get data synchronous
 

Summary

 
WebRequest is a useful class that helps us create requests using .Net code. We can create both GET and POST requests and then get the response HTML using a stream object. Using .Net 4.5  we can create asynchronous requests so that we can continue working till the response is complete.

Back to BlogPreviousNext

Comments



    Leave a comment
    Name: