Joe Conley Tagged opendata Random thoughts on technology, books, golf, and everything else that interests me http://www.josephpconley.com/name/opendata Query JSON/XML/CSV using SQL <p>Ever wish you could use your favorite query language across different data formats? Or get query results in several formats (XML, JSON, and CSV/XLS)? Then check out <a href="http://www.datacombinator.com/query">DataCombinator’s new query engine</a>.</p> <h2 id="data-sources">Data sources</h2> <p>You can copy and paste structured data manually, point to a URL, or connect to a database directly (H2, MongoDB, MySQL, or PostgreSQL). The engine hasn’t been optimized yet to handle large documents or tables so please be mindful.</p> <h2 id="query-languages">Query languages</h2> <p>The engine supports JSONPath (powered by <a href="http://www.josephpconley.com/2014/04/15/jsonpath-for-play.html">my open-source Play library</a>), XPath and SQL. You can use any of these languages to query data in any of the JSON, XML or CSV formats. Since JSONPath and XPath are fairly similar and straightforward, the more interesting use cases tend to involve SQL.</p> <h3 id="sql">SQL</h3> <p>The FROM CLAUSE isn’t necessary as the query only applies to one “table”, that is, the data being queried. For SQL to work against JSON, the JSON must be an array of objects, e.g.</p> <div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">[</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="s2">"id"</span><span class="p">:</span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="s2">"name"</span><span class="p">:</span><span class="s2">"Joe"</span><span class="w"> </span><span class="p">},</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="s2">"id"</span><span class="p">:</span><span class="mi">2</span><span class="p">,</span><span class="w"> </span><span class="s2">"name"</span><span class="p">:</span><span class="s2">"Janine"</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="p">]</span><span class="w"> </span></code></pre></div></div> <p>If the objects in the array have nested levels, each object will be flattened, and the keys concatenated with an “_”, e.g.</p> <div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">[</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="s2">"id"</span><span class="p">:</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="s2">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Joe"</span><span class="p">,</span><span class="w"> </span><span class="s2">"address"</span><span class="w"> </span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="s2">"street"</span><span class="w"> </span><span class="p">:</span><span class="w"> </span><span class="s2">"123 Main St."</span><span class="p">,</span><span class="w"> </span><span class="s2">"city"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Springfield"</span><span class="p">,</span><span class="w"> </span><span class="s2">"state"</span><span class="p">:</span><span class="w"> </span><span class="s2">"PA"</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="p">]</span><span class="w"> </span></code></pre></div></div> <p>would be flattened to</p> <div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">[</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="s2">"id"</span><span class="p">:</span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="s2">"name"</span><span class="p">:</span><span class="s2">"Joe"</span><span class="p">,</span><span class="w"> </span><span class="s2">"address_street"</span><span class="p">:</span><span class="s2">"123 Main St."</span><span class="p">,</span><span class="w"> </span><span class="s2">"address_city"</span><span class="p">:</span><span class="s2">"Springfield"</span><span class="p">,</span><span class="w"> </span><span class="s2">"address_state"</span><span class="p">:</span><span class="s2">"PA"</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="p">]</span><span class="w"> </span></code></pre></div></div> <p>Similarly, an XML must be in a “table format” in order to handle a SQL query, e.g.</p> <div class="language-xml highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="nt">&lt;table</span> <span class="na">class=</span><span class="s">"ui table"</span><span class="nt">&gt;</span> <span class="nt">&lt;row&gt;</span> <span class="nt">&lt;id&gt;</span>1<span class="nt">&lt;/id&gt;</span> <span class="nt">&lt;name&gt;</span>Joe<span class="nt">&lt;/name&gt;</span> <span class="nt">&lt;/row&gt;</span> <span class="nt">&lt;row&gt;</span> <span class="nt">&lt;id&gt;</span>2<span class="nt">&lt;/id&gt;</span> <span class="nt">&lt;name&gt;</span>Janine<span class="nt">&lt;/name&gt;</span> <span class="nt">&lt;/row&gt;</span> <span class="nt">&lt;/table&gt;</span> </code></pre></div></div> <h3 id="supported-sql-functions">Supported SQL functions</h3> <p>The engine supports basic single-table query functionality (no self joins yet) with simple clauses (WHERE, GROUP BY, and ORDER BY) and a few basic aggregation functions (COUNT, MIN, MAX, SUM). I’ll be working to expand upon this, so if you have any requests <a href="http://www.datacombinator.com/contact">let me know</a>.</p> <h2 id="query-results">Query results</h2> <p>The query engine outputs results in JSON, XML, and CSV/HTML Table/Excel if the resulting structure can be converted to a table structure.</p> <h2 id="examples">Examples</h2> <p>Here’s a few examples where I’ve found the query engine helpful.</p> <h3 id="espn-apis---json">ESPN APIs - JSON</h3> <p>ESPN has released a <a href="http://developer.espn.com/docs">variety of APIs</a> that allow developers to access headlines and basic team statistics. You’ll need to create a free account and register for a key, at which point you’ll have immediate access to the Public APIs.</p> <p>So for example, if I wanted to find out stats on my beloved Philadelphia Phillies, I would enter http://api.espn.com/v1/sports/baseball/mlb/teams?apikey=MY_API_KEY as the URL in DataCombinator. Using the JSON Raw tab, I can see the pretty printed response, and quickly search on Phillies to find their id of 22. Using this id, I can get the latest news on the Phightins by using the URL of http://api.espn.com/v1/sports/baseball/mlb/teams/22/news?apikey=MY_API_KEY. I can then use JSONPath to only include the part of the response I want. For example, if I just want all the latest headlines associated with the Phillies, I take a quick look at the structure and apply the <code class="highlighter-rouge">$..headline</code> JSONPath query to return an array of headlines:</p> <div class="language-js highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">[</span> <span class="s2">"Mets end 5-game skid, rally past Phils 5-4 in 11"</span><span class="p">,</span> <span class="s2">"Howard, Rollins lead Phillies past slumping Mets"</span><span class="p">,</span> <span class="s2">"Byrd's double lifts Phillies over Mets 3-2 in 11"</span><span class="p">,</span> <span class="s2">"The base: Approach at your own risk"</span><span class="p">,</span> <span class="s2">"Phillies fall to hot-hitting Blue Jays in 20,000th game"</span><span class="p">,</span> <span class="s2">"Adam Lind activated by Blue Jays"</span><span class="p">,</span> <span class="s2">"Mark Buehrle posts MLB-best sixth win as Blue Jays rock Phillies"</span><span class="p">,</span> <span class="s2">"Blue Jays edge Phillies on sac fly in 10th after blowing 5-run lead"</span><span class="p">,</span> <span class="s2">"Happ stifles Phillies, Blue Jays win 3-0"</span><span class="p">,</span> <span class="s2">"Hernandez outduels Gonzalez, Phillies edge Nats"</span> <span class="p">]</span> </code></pre></div></div> <h3 id="weather-data---xml">Weather Data - XML</h3> <p>OpenWeatherMap.org provides a <a href="http://openweathermap.org/API">free weather API</a> which returns data in XML format. For example, if I wanted to get the current weather in my hometown of Springfield, PA, I could use the URL</p> <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>http://api.openweathermap.org/data/2.5/weather?q=Springfield&amp;mode=xml&amp;units=imperial </code></pre></div></div> <p>to get an XML document back. I could then query the document using XPath to get just the temperature via <code class="highlighter-rouge">//temperature</code>.</p> <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>&lt;temperature max="71.52" min="71.52" unit="fahrenheit" value="71.52"/&gt; </code></pre></div></div> <h3 id="opendata---csv">OpenData - CSV</h3> <p>Public institutions are starting to embrace open data practices, enabling civic-minded hackers to build useful applications that provide a public service. In this spirit, the city of Philadelphia has made <a href="https://github.com/CityOfPhiladelphia">various data sets</a> available for public consumption. Most of these data sets are in CSV format. We’ll take one such data set, <a href="https://github.com/CityOfPhiladelphia/phl-site-stats">phl-site-stats</a>, and use the Raw url from Github to query it (I picked this dataset as it’s relatively small).</p> <p>We’ll take a look at the latest month’s stats found at <a href="https://raw.githubusercontent.com/CityOfPhiladelphia/phl-site-stats/master/SiteStats0514.csv">https://raw.githubusercontent.com/CityOfPhiladelphia/phl-site-stats/master/SiteStats0514.csv</a>. Without entering a query, we would get the entire data set in the results. One point to note is that the query engine will try to convert strings to numbers, making it easy to query based on certain conditions. If we wanted to view the most popular sites for phila.gov, we would simply enter a query of</p> <div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">select</span> <span class="o">*</span> <span class="k">order</span> <span class="k">by</span> <span class="n">page_count</span> <span class="k">desc</span> </code></pre></div></div> <p>Or we could get the total number of unique hits for the month of May</p> <div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">select</span> <span class="k">sum</span><span class="p">(</span><span class="n">unique_page_count</span><span class="p">)</span> </code></pre></div></div> <h2 id="next-steps">Next steps</h2> <p>This query engine will be the foundation of DataCombinator’s platform of data collection and composition tools. Our next step is to not only host structured data via API endpoints, but to also combine multiple datasources into one document (which in turn would be hosted as well!). If you’re interested in learning more, <a href="http://www.datacombinator.com">sign up</a> for e-mail updates or <a href="https://www.twitter.com/DataCombinator">follow us on Twitter @DataCombinator</a>.</p> Tue, 13 May 2014 00:00:00 +0000 http://www.josephpconley.com/2014/05/13/datacombinator-query-engine.html http://www.josephpconley.com/2014/05/13/datacombinator-query-engine.html (Triz)Swagging out at the Philly Codefest <p>This past weekend I teamed up with some buddies from <a href="http://point.io">Point.io</a> (<a href="https://twitter.com/twrivera">Angel</a>, <a href="https://twitter.com/jxshin75">Jon</a>, and <a href="https://twitter.com/dyang_pointio">Dylan</a>) to participate in my first hackathon, <a href="http://phillycodefest.com/">Philly Codefest</a>. We spent the weekend bringing Angel’s dream to life: a platform called TrizSwagger to analyze and track the use of “swag” (i.e. T-shirts, office supplies, and other marketing mathoms). We leveraged social media and geolocation to give companies real-time visibility to their marketing campaigns. Feel free to check out the app <a href="http://www.trizswagger.com/">here</a>.</p> <table class="image"> <caption align="bottom">Angel demoing Point.io's apiDoc service</caption> <tr><td><img src="/assets/angel.jpg" alt="Angel promoting Point.io" /></td></tr> </table> <p> <br /></p> <h2 id="lessons-learned">Lessons Learned</h2> <p>Good programming is always concerned with simplicity and efficiency, whether it’s using efficient data structures and algorithms, conciseness in your codebase, or even naming variables properly. Building an app in 24 hours, however, throws the need for simplicity and efficiency into sharp relief. Here are a few takeways from my experience.</p> <h3 id="coast-to-coast-json">Coast-to-Coast JSON</h3> <p>I’ve always been a big fan of <a href="http://en.wikipedia.org/wiki/Domain-driven_design">Domain-driven Design</a>. Writing POJOs in Java and case classes in Scala can provide a clear crystallization of the main actors of your application. However, models may not always be necessary, especially if your app is backed by a service/database which gives you JSON (TrizSwagger is backed in <a href="http://www.mongodb.com/">MongoDB</a> and served by <a href="http://flask.pocoo.org/">Flask</a>). The extra layer of complexity in marshalling/unmarshalling between JSON and your model can hinder performance and readability of your code, especially if you’re using heavy ORM frameworks like Hibernate. During a hackathon, if you’re rapidly making changes to the model you’ll surely get slowed down. For a more thorough treatment of JSON Coast-to-Coast, check out <a href="http://mandubian.com/2013/01/13/JSON-Coast-to-Coast/">the Mandubian Blog</a></p> <h3 id="knockoutjs-mapping-plugin">Knockout.js Mapping plugin</h3> <p>In order to implement coast-to-coast design effectively, it’s important to have a front-end framework that manages JSON well. One such framework I’m fond of is Knockout.js, more specifically their <a href="http://knockoutjs.com/documentation/plugins-mapping.html">mapping plugin</a>. This plugin automatically maps a JSON message into a Javascript observable object. You can then code the front-end directly against object properties without having to pre-define a viewmodel. You can also customize how objects are mapped by either modifying or enhancing the created object. This strategy proved quite helpful during the hackathon as any changes to our back-end API literally only had to be changed in one place (the front-end).</p> <p>One caveat is the creation of a new object using this plugin. Since the plugin requires a JSON object to build out the observable, I wrote a basic method in Play to take an expected JSON object and “empty” it, setting default values that would be used in the new object form.</p> <script src="https://gist.github.com/josephpconley/9207995.js"></script> <h3 id="understanding-your-tools">Understanding your tools</h3> <p>TrizSwagger integrates with both Twitter and Facebook. Understanding and setting up those integrations, however, occupied a lot of our time. We also ran into issues with a server on OpenShift, slowing us down further. Ultimately I think simpler is better, and every choice in technology needs to be well thought-out and well-suited to its use case.</p> <h2 id="conclusion">Conclusion</h2> <p>Overall it was an awesome experience. Even though we didn’t win (or place, or show for that matter), we learned a lot and we still took the time to mentor other teams who were using the <a href="http://point.io/pointio-platform">Point.io API</a>. Great job Team TrizSwagger!</p> <table class="image"> <caption align="bottom">Jon, Dylan, and Joe doing some last-minute coding</caption> <tr><td><img src="/assets/triz.jpg" alt="Jon, Dylan, and Joe doing some last-minute coding" /></td></tr> </table> Tue, 25 Feb 2014 00:00:00 +0000 http://www.josephpconley.com/2014/02/25/phillycodefest-trizswagger-lessons-learned.html http://www.josephpconley.com/2014/02/25/phillycodefest-trizswagger-lessons-learned.html