August 10, 2015
A large component of an iOS app I'm currently working on involves scraping content from web pages. After some trial and error, I found a method that works well for me and wanted to share my choice and the reasoning behind it. While researching I found the following methods to be the most common:
Parse the HTML with NSXMLParser
Parse the HTML manually with regular expressions and string operations
Third party libraries
Though it was slower, scraping through a web view meant that I didn't have to deal with parsing HTML myself and could take full advantage of the DOM for finding and manipulating elements. I decided to stay with the web view option, and focused on making it faster. My first thought was to try loading the page in a WKWebView instead of a UIWebView, but it didn't improve the speed noticeably. After putting print statements at various stages of the page load process I found that the slowest part of the process was after the page's body loaded but before the web view was completely done loading (i.e., it called the
Here's an example version of my final code, which just gets the web page's title:
Calling the startScrape method above prints: "Received script message: iPhone - Apple", matching the web page's title.
Using this method decreased the time scraping took to the point where I was happy with using it in the app. In some extreme cases it sped up by about 5 seconds, changing parts of the app from unbearably slow to quick and responsive.
Feel free to use my example code or contact me if you have questions. I hope this is helpful to others in the same position I was in!