Data Hero – Will be awesome

Data Hero Logo
Data Hero with the mission to “help you unmask the answers in your data” launched this week. Everyone that has ever touched Excel should check it out, and see a possible future for online analytics and reporting. To get my head around what they have created, I took DH for a spin.
For my test of Datahero, I wanted to upload and work with the Apple Enterprise Partner Feed in txt format. The DropBox/Box/GoogleDrive plugins are a nice feature, and allow you to quickly get data from other people into your workspace. It doesn’t allow you to get around the upload limits however.
The first issue I ran into was the 10MB limit on file upload size. The EPF files are 40MB, 500MB,2GB for the TV, Music, and App data feeds respectively. So from getting done what I wanted to get done, Data Hero is a non-starter at this point. From an experimental standpoint, it was easy enough to split the smallest file into something that I can upload to try DH out.
10MB limit
The upload was fine, but the first thing I ran into was that the EPF files have comments at the top with legal notices, column names, and file formats. I didn’t see an easy way to specific comment lines so my only option was to strip out the comments. Again easy enough, but it would be convenient if DH allowed me to specify comment lines, as many feeds like the EPF files, as well as syndicated data feeds in the pharma space have comments at the top or even interspersed with the data (sounds crazy, but really).
comments
Stripping out the comment lines, meant that I lost the column names, and the column format hints that where in the file. It would have been nice if DH allowed me to specify, which row had the column names. I’m not sure how common data types, embedded at the top of the file are, but that would be nice. This is a bit of escalating functionality, but being able to point to a commented row as the column names would be nice. DH did take the first row as the column names, so that was a help. From the EPF feeds the first row is:
#title network production_number season original_release_date copyright …..
The dates in the EPF feed are of the form: “2006 03 10″ without the quotes. DH wasn’t able to parse those. When I switched the column type to date, I just got a – . I don’t think it was too optimistic to expect DH to be able to parse the column, but I would have expected some backdoor that allowed me to write a RegEx or parsing snippet for cases like the one described here. As with the file size limit, the lack of functionality to import this column means DH is not an option for me to us with this data right now. Still worth an experiment, though. If I wanted to get the column into shape so that DH could recognize it as an actual date, I would probably write a snippet of Python to clean it up. I think, based on what I interpret your target segment to be, that most people would do this in Excel … which I might still do, as the file size that I can upload is limited on DH, so I won’t run into limits in Excel.
date-as-textdate-as-date
The next columns in the EPF feeds are URLs. It is common in DBs to just treat URLs as a text field, but with a web based SAAS, I would like to see built in functionality to handle URLs. It would be nice to be able to parse it out to a minimum of Protocol, Domain (maybe TLD, too), Path, and Query. I could imagine building a chart that groups by the first 2 parts of this. Again maybe this isn’t something you expect in a standard relational DB, but for a system that lives on the web it is needed.
ep-url
With the data loaded, and the columns mapped as best I could, I created a Histogram broken out by Network with a count of the episodes for that network as the value. Surprisingly the a large proportion of the episodes don’t have a network associated with them.
no-category
It was easy enough to filter those out, but the first thing that I wanted was some sort of list view that would allow me to look at the records in question, and see if there was a data issue or if the data was just missing. I know in my specific case, I was able to just sort in the data view, and luckily the non-network records sorted to one end, but if I was looking for a data error that was in the middle of the data domain for a given column, I wouldn’t have been able to find it.
distribution
Dropping in a new dimension to see average price instead of just a count of records was pretty straight-forward.
dropit
One slightly annoying thing was when adding or removing dimensions to the chart, it would switch back to Pie charts every time. What was really annoying was the taunting message at the top telling me to use a Bar chart!
As mentioned above, DH doesn’t provide any tooling to allow me to fix my data problems. For example, while trying to show both the average SD and HD prices in 1 bar chart, I now see that I’ve got a data error. I can’t see anyway to fix this or identify the record from DH itself. Note: Yes, I can just search the raw file for the id.
url-in-category
I really love Data Hero, and I can’t wait to find the right use case for me so that I can use it in the future. However, after my first experiment, detailed above, I think it might be best suited to someone using one of the available SaaS feeds (Salesforce.com or MailChimp). My attempts to use DH to analyze some data from the Apple Enterprise Partner Feed met with only limited success. With DH’s strong support for pulling data in from GoogleDrive/Box/DropBox, my guess is that they expect people to use more than the SaaS sources, but my impression is the DH isn’t ready for anything, but 100% clean data that fits their specific use cases.
I know they just launched this week, but one benefit to using the online data tools (i.e. Google Spreadsheets) is that they facilitate sharing and collaboration to a far greater extent than Excel/LibreOffice (who DH is certainly competing against). Without the obvious sharing features right now, again I’m not sure where I can best use this new tool, but I’m really going to try.

Jelly Balance – DENIED!

 

Denied Stamp

My hopes and dreams of an exciting new career in iOS App development have been dashed by the faceless, corporate gatekeepers at Apple. Jelly Balance was deemed boring without any redeeming entertainment value.

The feedback was that there was only 1 level, no way to store high scores , and that the app tester repeatedly fell asleep while testing the application is was so boring. I might actually be charged for wasting their time and for any coffee consumed during the testing of my application.

They were very encouraging about resubmitting, however.

Can’t say I blame them. Jelly Balance was pretty boring. It was an experiment on my part to get an App into the App Store as quickly and easily as possible. I did most of the development on a windows machine using Javascript. I packaged it up for the app store using Trigger.io. The development process was really straight forward going this route as I didn’t need to learn Objective C. I did beta testing using TestFlight for distribution to a number of people I found on Reddit.

The hardest part of this whole endeavor was sorting out the certificates so that I was able to upload the correct IPA. The biggest issue for me turned out that I had an old version of XCode of my old Macbook and it was using the wrong code/certificates. Kind of an amatuer mistake, but I’ll cut myself some slack as it was my first time with App publishing. While I didn’t get my app into the App Store it finally gave me an excuse buy a brand spanking new MacBook Pro!

Jelly Balance Waiting for Review

I finally scratched that itch to make an iPhone game, and submitted it to Apple the other night. I built it using Trigger.io and JavaScript. I’ll document my choice in this toolchain in a future post.

Jelly Balance is waiting for review in the App Store.

 

Get Jiggly With it – Constant Volume Joint for Box2dWeb

Inspired by the Balance Blob post by Byron Knoll, I ported the Constant Volume Joint from the JBox2D project over to Box2dWeb, one of the JavaScript implementations of Box2d.

 

 

This new joint type for BoxWeb2d lets you add blob-like shapes into the rigid-body world of Box2d.

Box2d and it’s many ports is the physics engine behind hundreds of games on a variety of platforms, including Angry Birds, Cut the Rope, and other favorites. It is also one of the physics engine options in Cocos2d.

Updated Box2dWeb library with the new joint can be downloaded here. Constant Volume Joint .

Many thanks to ewjordan for implementing this in the first place, I think.

 

 

Using Spotify and The Echo Nest to Keep the Pace

TLDR; Spotify makes it easy to build Spotify apps. The Echo Nest is a fantastic data provider. Keep reading to see how I built an app.

In this post, I talk a bit about why & how I created a Spotify application and enriched with data from Echo Nest, to build the perfect playlists for running.

Get the code: Keep The Pace Spotify App

Why Did I Need an App?

I started using Spotify’s mobile app on my iPhone to play music and keep me motivated during my run. As with any playlist that you create for use while you are running, having the right song with the right beat is key to having it act as a motivation tool. Trial and error isn’t too bad, but it would be really, really great, I thought, if I could see the Beats Per Minute for a song right in my search results.

I’d need 2 things: some way to hack or enhance Spotify and some way to figure out the BPM of a given song.

With some very quick Googling, I realized that the Spotify desktop application is actually a platform that would allow me to build my own app, and add in my own columns. One down. One to go. A bit of searching on the BPM topic, and I started to get mired into a bunch of apps that depend on iTunes and the music files on your computer (Tangerine, MixMeister), limited specialty databases, and code examples (SO1,  SO2). When you start digging into the BPM area, Echo Nest pretty quickly stands out, as the place to go. They have a well-designed API as well as phenomenally rich database of music information.

 

The Spotify App

For hackers and experimenting developers, Spotify makes it easy to get started writing Spotify Apps. Getting your Apps into the Spotify App store for general distribution, I think, is a whole other ballgame which I won’t cover here. To get started you need to enable your account, free ones work as well, for developer status. Spotify provides a nice Introduction page for this.

Once you have your account enabled you can get started by looking at a combination of the Integration Guidelines documentation and the rock solid Spotify Apps API Tutorial on github.

I initially tried doing my development on Ubuntu. There were a number of annoying quirks (EX: Quirk 1 ), but I was able to overcome most of them. The blocker, however, was when I tried to run the Tutorial App. There are some dependencies in the manifest.json that point to the Linux version of the Spotify desktop app being at a different version level from the Windows version. I don’t typically work on a Mac, so I can’t speak to that platform.

I recommend downloading, or cloning, the Tutorial from Github, and getting that to work before you try to make your own app.

 

Hello World

Once you have the Tutorial running, you can be confident that you have your account enabled with developer status, and you know where to put the files. Below, I will focus on building your app on Windows. We’ll start with the bare minimum to see something in the Spotify UI, and go from there.

For our application, create the folders & files (empty text files):

  • My Documents\keepthepace
    • index.html
    • manifest.json
      • js
        • keepthepace.js
      • css
        • main.css


The manifest, per the extension, is a JSON file. The bare minimum fields aren’t too onerous, and documented in the Integration Guidelines. (Tip: If you are having problems, make sure your JSON will parse by validating it.)

 

{
"AppDescription": {
"en" : "Find track to keep the pace"
},
"AppName": {
"en": "Keep The Pace"
},
"BundleIdentifier": "keepthepace",
"BundleType": "Application",
"BundleVersion": "0.0.1",
"RequiredInterface": 1,
"SupportedLanguages": [
"en"
],
"VendorIdentifier": "com.webfanatic"
}


Make sure the BundleIdentifier and the application folder are the same and match case. This might not be a big deal on Windows, but I know it was important on Ubuntu.

Now drop “Hello World!” into the index.html

Jump back into the Spotify app, and type: spotify:app:keepthepace into the search box, and voila! You just built your first app!


Two important tools are built right into Spotify for you during development.

The Spotify.Console, accessible by pressing CTRL+ALT+HOME shows you a valuable log file of what the Spotify client is doing under the covers.

The second tool, is the Inspector accessible either via Right-Click on your app or from the Develop menu item. Spotify chose to embed the Chromium rendering engine into their desktop client, so you have access to all the goodies (debugging included) of a modern web browser available. Don’t forget to leverage the javascript console for debug purposes.


 

Design

My initial goal is to really just add an additional column to the existing track list pages, but since I can’t figure out how to hack into that, I’ll roll my own results. I want the functionality to mimic the existing app closely.

 

Pulling the UI Together

Grid

Key to the results is a good grid/data-table library. I tested out a number of different jQuery based libraries. There are a lot of good ones out there, but some, I think, make assumptions about file locations, and I had a bit of trouble getting some of them to work properly inside the Spotify desktop client. I finally settled on SlickGrid. Not only was I able to get it working pretty quickly, but the default styling was acceptable. I know I’m not trying to win any design contests, but it has got to be presentable.

 

Spotify UI Elements

Spotify allows you to reuse some of the standard button styles that they use in the app itself, by including the correct CSS. Note when adding any resources into your Spotify App, you need to use the protocol “sp” in your URI.

<style>
@import url('sp://import/css/api.css');
</style>

In the tutorial, there are a couple of intriguing links that aren’t yet active.

I think, when these links are active, I would have been able to skip SlickGrid.

Getting the Data

The Spotify API is straightforward to use, and one call searches for your term over artist, track, and album. The callback approach means you can put up wait icons, and not block to UI, which the results are being returned. In my case, once I get the track information back, I need to go out and get the tempo information from Echo Nest  so I put the results into an array for later display.

function startSearch( searchTerm) {
var sp = getSpotifyApi(1);
var models = sp.require('sp://import/scripts/api/models');
var search = new models.Search(searchTerm);
search.localResults = models.LOCALSEARCHRESULTS.APPEND;
search.observe(models.EVENT.CHANGE, function() {
var results = search.tracks;
// Save this for later.
tempoToGetCounter = results.length;
if( results.length > 0) {
for(i=0;i<results.length;i++){
trackId = parseTrackId( results[i].uri);
searchResults[trackId] = results[i];
getTempo(trackId);
}
} else {
var resultsSection = document.getElementById('results');
resultsSection.innerHTML = "No results found.";
}
});
search.appendNext();
}


Getting the Tempo

As mentioned above, the tempo is coming from calls to the Echo Nest API. You will need to setup a free account at Echo Nest to get your API key.

Many of the methods on the Echo Nest API are straightforward http get calls, making it easy to experiement. I’m still a bit confused as to the difference between a Track and a Song. I think a Song can have multiple Tracks (i.e. version or covers), but I’m not 100% sure. In the context of this exercise, what I need is the Track API.

I didn’t check if the Echo Nest API has a batch option, so to keep it simple, I fired off a new call for each tempo that I needed to get.

In the back of my head, I was a bit worried about how I was going to link the Spotify results and IDs with the Echo Nest IDs. I was pleasantly surprised to see that Echo Nest has done an amazing job of linking the IDs from a number of different ID spaces including Spotify, Musicbrainz, Rdio, and more through their Project Rosetta Stone.  For my current job, we do a lot of big data handling, and one of the biggest hassles is building the links between disconnected datasets. At the inception of Project Rosetta Stone, Echo Nest stated, “…we want to make the world easier for music app developers…”. I think they nailed it.

While the Spotify API was cool, I am by far much more impressed with the depth and breadth of the Echo Nest API. There just seems to be an endless amount of information about everything in the music world.

 

Show the Results

With the Track search results and the Tempo results populated, it’s now time to update the Grid. I’ve been doing a lot of backend work, as well as the some pretty cool Flex development on the front end, but haven’t done anything serious with JavaScript in some time.

function updateResultsTable()
{
var nIndex = 0;
resultsData.length = 0;
var resultsSection = document.getElementById('results');
resultsSection.innerHTML = "Updating...";
for(trackId in searchResults){
resultsData[resultsData.length++] = {
track: searchResults[trackId],
artist: searchResults[trackId].artists[0].name,
time: searchResults[trackId].duration,
album: searchResults[trackId].album.name,
tempo: tempoResults[trackId]
};
}
grid.resizeCanvas();
grid.autosizeColumns();
grid.invalidate();
grid.render();
resultsSection.innerHTML = "";
}


Next Steps

Some thoughts on next steps for this app are

  • Add a Tempo calculator at the top
  • Add a Playlist Tempo view where you can see the Tempo of items in your playlist
  • Rewrite using backbone.js. There is no good reason to do this other than I’ve been wanting to try out a JS MVC framework
  • Add in energy & danceability from the Echo Nest and take it for a test run.
  • Make batch calls to Echo Nest to try and avoid the API Call Limit.

Freakin’ Genius

Genius Scan - PDF Scanner

Over the years my expense reporting habits have been relatively poor. The hassle of making copies, for my own records mostly, and/or scanning the documents has been a real pain. This is of course one of those self defeating habits, that I’ll go into in some other post, but let’s leave it at, I’ve probably left some money on the table in regards to reimbursable expenses.

For me the iPhone changed things up a bit, as I started to take pictures of my expenses, usually after I got back to the office … some weeks later. At this point, I was still just making copies for my own records, which I have at times really been happy about (again the leaving money on the table problem!).

Recently at my day job, we started using Open Air to report expenses in conjunction with a single merged PDF containing all the expense receipts. No more paper copies, just the single PDF.

After a bit of looking and a recommendation from a co-worker, I started using “Genius Scan – PDF Scanner from Grizzly Labs“. With Genius Scan, I take a pic of all of my receipts as soon as I get them. Usually before I even walk away from the register. There is nice set of crop, rotate, enhance features that allow me to get exactly what I need on the fly.

Genius Scan - PDF Scanner

Genius Scan - PDF Scanner

It also has good document management features that allow me to add new images to an existing PDF or create a new one.

Genius Scan - PDF Scanner

Genius Scan - PDF Scanner

The killer feature for me however was the ability to put the PDF directly into a DropBox! This was the major time saver. I know mailing the PDF to yourself (available in other apps of this sort) is not that big of a deal, but my workflow for expenses is that I dump all of my e-receipts right into a staging folder in DropBox. This just saves one extra step.

This process is working out so great for me, that I usually THROW AWAY the paper receipts as soon as I get them.

When I listen to my peers struggle (complain) about doing their expense reports, I can’t help but think that I’m a Freakin’ Genius for changing up my workflow.

 

Buy Them All. Back Now.

This blog took a long sleep, and it turns out tweets are just as hard as blogging on a consistent basis.

Stay tuned to this space for a restart of “Buy Them All”. There have been a lot of big changes in the app store over the past couple of years, and my scripts for downloading and parsing the App Store data no longer work. :(

How ever, I found out that Apple now offers some fantastic Affiliate tools for getting all of the application meta-data and building links. I’ve got things in motion to get access to this data. Once I do, expect to see tons of cool visualizations and analytics.

Blog posts are too hard. Follow my tweets

Follow me on Twitter, http://twitter.com/ericestabrooks .

BUY THEM ALL – 36,687 APPS FOR $ 76,785

As of June 8, 2009, there are 36,687 applications in the Apple App Store. If you wanted to purchase all of those applications it would set you back around $ 76,785.

Old Thailand Blog Back Online

As I was poking around an old harddrive tonight, I realized that I hadn’t re-uploaded the content from a trip to Thailand that Monika and I took back in 2002. Check out the Thailand Travel Log.

It was really excellent finding the old content, and Monika and I spent awhile reminiscing. 

I originally built the Thailand Travel Log using Blogger. Notice the veeery narrow content area and the very small images. I was targeting a much smaller screen at the time. Back in 2002 Blogger allowed you to export your blog to static HTML pages that you could then upload to your own web host. Luckily for me, I still had the static pages. I have no idea where my content on the Blogger site itself actually went to. 

This is an interesting lesson for me regarding good backup and archiving practices for my personal content. Be consistent!