Joe Conley Tagged rss Random thoughts on technology, books, golf, and everything else that interests me Everything is a Function <p>I’ve been working on a platform that would help free us from our <a href="">dependence on apps</a>. I’m hoping this ends up being a <a href="">horizontal tool</a> with a combinatorial number of use cases (hence the name DataCombinator). The main idea is to turn all digital interactions into simple functions (a la Excel) which can be composed to do cool stuff.</p> <p>Here are the main concepts you need to know to understand how to execute and compose these functions</p> <ul> <li>HTTP methods/requests and APIs (<a href="">tutorial</a>)</li> <li>JSON (<a href="">tutorial</a>)</li> <li>JSONPath (<a href="">tutorial</a>)</li> <li>Handlebars.js for templating (<a href="">tutorial</a>)</li> </ul> <p>That’s it. Composing functions is as easy as writing a sequence of commands (separated by semi-colon). Each function will return a valid JSON value, which will be passed to the context of the subsequent function. <strike>The "current" JSON value can be accessed using the `_` character, and can also be used in argument strings using the Handlebars.js syntax.</strike></p> <p><strong>UPDATE</strong> As I was going through some various examples it became clear to me that I needed both the most recent JSON value as well as past ones, so to that end I’ve modified the scripting language to use <code class="highlighter-rouge">this</code> as the current JSON value and <code class="highlighter-rouge">_</code> to represent the array of all results in the script (with the 0th entry being the JSON value representing arguments passed to the script). I’ve updated the examples below accordingly.</p> <h2 id="examples">Examples</h2> <p>To quote George R. R. Martin, “Words are wind”, so let’s look at some actual code. You can test them out and see the full list of available functions <a href="">here</a>. If you’re confused by any of the following examples, try out the worksheet as each function call result is shown.</p> <h3 id="weather-alerts">Weather Alerts</h3> <p>Here’s an example of a custom weather alert service (one of the <a href="">most popular IFTTT recipes</a>).</p> <script src=""></script> <p>This script does the following:</p> <ul> <li>Performs a geo lookup using <a href="">Google Maps</a> to get the geocoding details of the address we’re interested in (in this case my office address). <ul> <li>I added a Handlebars.js helper called <code class="highlighter-rouge">urlEncode</code> to facilitate things like url encoding</li> </ul> </li> <li>Navigates the result to find the latitude and longitude</li> <li>Gets the short-term weather forecast from <a href=""></a> (sign up to get your own api key) using the <code class="highlighter-rouge">lat</code> and <code class="highlighter-rouge">long</code> from the previous call (using Handlebars to pass the values)</li> <li>Sends an SMS message to my phone if it’s below 40</li> </ul> <p>Pretty straightforward. The beauty of this is we can customize it to our preferred notification channel, and could easily swap out the SMS call with e-mail or Twitter.</p> <h3 id="auto-generate-rss-based-on-website-updates">Auto-generate RSS based on website updates</h3> <p>Despite the demise of Google Reader, I’ve always been a big fan of RSS. It tends to be a less noisy channel of information than social media, allowing for less frequent but longform communication. This example checks the AV Club for reviews of new episodes for a given show (I chose game-of-thrones-experts for this example, you’ll have to inspect the URL for your show) and generates an RSS feed.</p> <script src=""></script> <ul> <li>Performs a web request to get the html source code of the webpage</li> <li>Performs an XPath query to extract the HTML elements we need (and implicitly converts to JSON)</li> <li>Uses the resulting JSON to build an RSS xml file from a Handlebars.js template</li> </ul> <p>This has some more advanced concepts like <a href="">XPath</a> (not to mention knowing proper RSS formatting), but the script is still relatively easy to understand.</p> <h3 id="website-monitoring">Website monitoring</h3> <p>As an owner/maintainer of a few websites, it’s important to know when any of them go down. There are several free services that will do this for you, but what if we wanted a customized view into our website uptime? Let’s write a script that monitors a <a href="">really cool webapp you can use to track your golf scores and handicap, SwingStats</a></p> <script src=""></script> <ul> <li>Performs a web request to the site we’re monitoring</li> <li>Save the response info in a MongoDB database collection</li> <li>If the status is not 200 (OK), then send an e-mail</li> </ul> <p>This not only handles notifications but saves website data in a database (MongoDB). We can then build a separate script to access this data and calculate uptime percentage for SLA purposes.</p> <h2 id="next-steps">Next Steps</h2> <p>I hope these examples were interesting. DataCombinator will soon have the ability to save scripts, expose them as API endpoints, and schedule script execution on a specific time interval using <a href="">Cron expressions</a>. The next version willl also expose your favorite social network functions, which could lead to things like one massive interleaved activity stream for all of your social media sites. If you’re interested in updates, follow along on the <a href="">DataCombinator tag</a> or <a href="">sign-up for email updates</a>.</p> Mon, 26 Jan 2015 00:00:00 +0000 Roll Your Own Notification Service <p>Have you ever wished you could receive customized updates whenever your favorite websites update their content? Most sites offer the means to get notified when a new blog post hits the wire or new products are added to their catalog (RSS, social media, e-mail, etc.). But what if the site doesn’t use any of these services? Or what if you only want specific updates (i.e. blog posts from author X, new products containing the name Y)? Then you’re left with only one course of action: build your own notification service!</p> <p>Armed with the mighty powers of HTML scraping, the Scala programming language, and a recurring scheduling mechanism (in this case Play’s Akka scheduler), you have all the tools you need to setup your custom notification.</p> <h2 id="my-new-ebook-notification-service">My New EBook Notification Service</h2> <p>Let’s create a notification service which let’s us know when new ebooks are available at my local digital library, <a href="">Delaware County Library System</a>. At the time of this writing, no such notification service exists. As I’d prefer not to miss any notifications, I’d like to setup an RSS feed. Specifically, we’ll write a process which periodically checks the digital library site for new ebooks and updates an RSS feed accordingly.</p> <h3 id="scalasbt">Scala/SBT</h3> <p>We’ll start out by creating a basic Scala application using SBT (you can checkout a skeleton project <a href="">here</a>). Let’s add the <a href="">HTMLUnit</a> and <a href="!/overview">Scala IO</a> libraries to our project. We’ll use HTMLUnit to parse the HTML code of the library’s website, and we’ll use Scala IO to write our XML to file. Your build file should now look like this (assuming you named your project “ebook”):</p> <script src=""></script> <h3 id="scala-xml">Scala XML</h3> <p>Let’s start by building an abstraction for an RSS feed (you can read about the basics of RSS <a href="">here</a>). We’ll start with an Item case class which holds the basic properties of an RSS item and a method to generate xml. Similarly, we define the basic properties of a Feed using a trait. We’ll make this abstract in the anticipation of re-using this abstraction for other feeds.</p> <script src=""></script> <h3 id="screen-scraping-with-htmlunit">Screen Scraping with HTMLUnit</h3> <p>Let’s build a NewEBookFeed which implements Feed. When we implement the items method, we’ll use HTMLUnit to parse the HTML code from <a href="">Delaware County Library System</a> to find out the newest items. This requires digging around the source HTML a bit to understand the structure and find useful patterns. Basic knowledge of <a href="">XPath</a> is required to leverage those patterns. After inspecting the source code and following the appropriate links, we can view the New Ebook page source and parse out the new titles, authors, and image URLs.</p> <script src=""></script> <p>That’s it! You can find my complete code as part of my <a href="">scrape library</a>, specifically the com.josephpconley.books and com.josephpconley.rss packages. We can test the code by running the following:</p> <script src=""></script> <h2 id="deploy-using-play">Deploy using Play</h2> <p>Now that we have a way to generate an up-to-date RSS feed, we need a way to update our feed periodically and make it publically available to an RSS Reader like <a href="">feedly</a> (my personal favorite). We could handle this a few different ways (i.e. schedule a CRON job to push a file to our Dropbox folder), however I’d like to demonstrate how to handle both the scheduling and file writing/serving using the <a href="">Play Framework</a>.</p> <p>Start a new Play Scala project, and either package our ebook project as a jar and copy to the lib folder, or just copy and paste the source code into the new Play project (I’ve done the former).</p> <h3 id="akka-scheduler">Akka Scheduler</h3> <p>To hook into Play’s Akka scheduler, we create a Global object in the app folder and override the onStart method, which allows us to run code once the application starts. The Akka system scheduler allows you to schedule a recurring process for a given Duration. In our case, since the site doesn’t update that frequently and we want to be respectful by not overloading the site with requests, we’ll set the duration to 12 hours.</p> <script src=""></script> <p>From there, it’s simply a matter of building out a controller with some routes to host the updated file (a straightforward exercise I’d leave to the reader). I personally included this code and hosted the RSS feeds in <a href="">my own Play app</a> running on Heroku.</p> <h2 id="drawbacks">Drawbacks</h2> <p>One drawback you might have noticed from this specific example is the possibility of the target site’s source code changing. We relied on very specific HTML tags, text and class attributes to query the information we needed, and should the site be re-written significantly, it’s possible that we would have to re-write our scraping code to accommodate.</p> <h2 id="conclusion">Conclusion</h2> <p>Managing the daily flow of information can be a challenge. With a little bit of coding, however, we can gain finer control over the information we consume, helping us be more productive in our everyday life.</p> Mon, 27 Jan 2014 00:00:00 +0000