How to load an html string in a WebView in Google’s recommended way

Nitesh goyal
6 min readJan 6, 2022

--

When it comes to loading an html string in a WebView, most of the android developers end up using loadData() API. In fact, if I give a google search on “how to load an html string in a WebView”, most of the links will give me the same answer i.e. make use of loadData() API.

It may seem too late talking about loadData() API at this time but….

What if I tell you that loadData() API is neither the best way nor the Google’s recommended way to load an html string in a WebView and yet this API has been spread across almost everywhere on the internet. The most common one i.e. StackOverFlow.com .

Note: Better to not use the following API for this case and use the one that is recommended by Google and discussed later in this article.

147 upvotes and still counting. Shocking right and now you must be wondering,

1. What is wrong with loadData() API and Why is loadData() used more extensively if it is not the recommended way?

2. What is the better and google recommended way of achieving the same task?

I will try to answer all of these questions at my best.

Lets look at this API’s signature and its definition first — public void loadData(String data, String mimeType, String encoding)

Loads the given data into WebView using a ‘data’ scheme URL. The encoding parameter specifies whether the data is base64 or URL encoded.

Let’s look at the potential problems with this API now.

As you might have noticed, this API takes an encodedHtml string as input which means if the input html string is not properly encoded, it may lead to errors.

The other issue with this API is something called opaque origin. When your content get loaded then it has an opaque origin this means that it is going to fail all same origin checks in the web.

“Same Origin Policy, When a page is loaded into a WebView to be displayed, all code in this page runs “in the context” of that page (its origin). The Same Origin Policy (SOP) is a mechanism that restricts JavaScript running in the context of one origin to access objects from another origin.”

These same origin checks are actually critical to provide powerful web APIs securely. Without these same origin checks, you can’t use great APIs like XML HTTP Request.

loadData() API was introduced in API level 1 and during those days developers tend to do the percentage encoding manually. However with API level 8, Android introduced API to do the Base64 encoding which does the encoding job correctly hence developers do not have to do the encoding part by hand anymore. This has been shown in the above example as well.

But we can also take a look at the same origin restrictions which I mentioned earlier is pretty important, and this brings us to the 2nd question’s answer.

The way Google recommend to get around this is to use something called loadDataWithBaseURL() also available since API level 1. There are articles available that suggest to use this API instead of loadData() but most of them lack to explain the security features of this API and do not use this API properly. Let’s look at the signature of this API first.

public void loadDataWithBaseURL(String baseUrl, String data, String mimeType, String encoding, Sting historyUrl)

I am only going to discuss about the key advantages about this API and why this API should be used instead of loadData(), complete documentation about this API can found at loadDataWithBaseURL .

Let’s look at some of the benefits of loadDataWithBaseURL() API.

One of the nicest thing about this API is that it accepts html string content as is as input without doing any encoding. So as a developer, we don’t have to worry about Base64 or percentage encoding. This also makes it Android future updates proof as we don’t have to rely on any encoding method anymore.

String html = "my html content";webView.loadDataWithBaseURL("Https://mydomain.com", html, "text/html", "UTF-8", null);

The other attraction of this API is the baseUrl which is the first parameter in this API. The baseUrl configures the origin that this operates with, which means, with this parameter you can control which origin you get without disabling the important same origin security settings. We have discussed about same origin in a very brief above, and I was talking about this security feature that is mostly skipped by the developers even after using this API.

The key point to discuss here is, how do we actually choose the right baseUrl?

We can look at a couple of use-cases to discuss on how to decide which is the right baseUrl.

  1. Cached content — Use original URL as baseUrl

Many apps fetch the html content from the internet and caches it locally and display it in a WebView whenever needed. In such case, baseUrl should be the original URL from where the html originally came. For example:

If you have to load some html content to your WebView that looks like:

String html = "<br /><br />Read the handouts please for tomorrow.<br /><br /><!--homework help homework" +
"help help with homework homework assignments elementary school high school middle school" +
"// --><font color='#60c000' size='4'><strong>Please!</strong></font>" +
"<img src='http://www.homeworknow.com/hwnow/upload/images/tn_star300.gif' />";

For this example, assume html value is coming from the server. The right way to load would be providing the resource domain url as baseUrl.

webView.loadDataWithBaseURL("http://www.homeworknow.com", "text/html", "UTF-8", null);

2. Your own content — your domain

The other use-case is where apps bundle the html content with the app itself. In this case the recommended way of choosing the baseUrl is a real internet based URL (HTTPS or HTTP). In most of the cases it would be your organisation’s domain. The advantage of using this approach is, firstly you get the same origin security and secondly, you can use the resources safely from your domain while showing the html content.

The same example that we saw in 1. Cached content is applicable here as well, Only difference is we are not fetching the html content from the internet/server but bundled with the app.

The preferred scheme in this case is HTTPS, but you can also use HTTP in case you have to load insecure resources.

“If a valid HTTP or HTTPS base URL is not specified in baseUrl, then content loaded using this method will have a window.origin value of "null". This must not be considered to be a trusted origin by the application or by any JavaScript code running inside the WebView (for example, event sources in DOM event handlers or web messages), because malicious content can also create frames with a null origin. If you need to identify the main frame's origin in a trustworthy way, you should use a valid HTTP or HTTPS base URL to set the origin.”

3. Avoid custom schemes

Sometimes apps make up their own custom schemes, and use that, for example myapp://path. The problem with such custom schemes is that web standards only understand the standard URL schemes and do not expect such custom schemes. Hence, they don’t know how to handle custom schemes and end-up having very inconsistent behaviour. Eventually, this may result into arbitrary app breakage. So it is recommended to always use the standard internet URL schemes to avoid such issues.

I hope the information provided in this article will help you understand the importance of using loadDataWithBaseURL() over loadData() and what is the right way of using this API.

PS — I have used null as value for the historyUrl param of this API which is fine for most of the cases. I will try to gather more information on this parameter and update it later.

Thanks for reading, Happy coding!!

--

--

Nitesh goyal
Nitesh goyal

No responses yet