Will's Blog: October 2012

October's been a fun and busy month.

I ran my first half marathon. I signed up and ran with coworkers in Central Park last week. This was my first long run and I tried it using a new pair of minimalist shoes. We were all pretty casual about the whole thing; we chatted during the run and I even had to stop to pee. All in all, we did pretty well and I averaged 10:01 minutes per mile.

A friend and I went to the NYC Comic Con. It wasn't as big as the San Diego Comic Con, but it was just as much fun. I love seeing how creative people can get. You can tell people's passion through the level of detail in costumes. For example, here's a Ghostbusters costume with blinking LEDs.

I also applied to graduate school. It's a long shot, especially since I only applied to one school. I figured if I don't get into the program I want, I shouldn't settle for a second choice. That said, the new program is for a Masters of Science in Business Analytics at NYU Stern School of Business. It'll be an inaugural class that'll be limited to 60 students. (Big thanks to Sherry Patheal and Marshall Ellis for the recommendations) Fingers crossed!

Even if I don't get into graduate school, I've been getting the itch to experiment with more data projects. I've been attending a lot of data meetups and it's neat to see how data analytics is changing. It's funny seeing articles that mention data being the next big thing as if no one's ever mined or used data to influence business decisions. People have done this for years.

As an analyst, it is interesting to see that we're capturing so much more information, but it's also interesting that the technical infrastructure is changing.

I'm so used to data being structured into a relational database management system (i.e. this table holds these fields in these columns). This has the advantages of linking tables/fields, but it requires so much data cleaning and doesn't scale well for large amounts of data. If you're trying to store big data (think petabytes, as in 1 million gigabytes) with varying data sets, this system is outdated.
With technology catching up we have new noSQL systems like Apache Hadoop and MongoDB that allow horizontal scaling for more clusters, which means more data storage (as opposed to vertical scaling, say having a more powerful single server). We also have Amazon's EC2 service (that allows for more computing power) and S3 service (that allows for more storage), both of which scales for as little or as much as you could possibly need. To sort information (since there's no relational tables), there's tags /non-visible metadata (data that tells you about the data).

Anyways, it'll be neat to see what people do with these new tools and how it affects current systems.

Tuesday, October 23, 2012

October 2012