How do I become a data scientist?

Data Science Central blog pointed out an interesting discussion on Quora:

The blog post summaries the lengthy discussion on Quora as follows:

Here’s a summary of the very long and detailed top answer:

  1. Learn about matrix factorizations
  2. Learn about distributed computing
  3. Learn about statistical analysis
  4. Learn about optimization
  5. Learn about machine learning
  6. Learn about information retrieval
  7. Learn about signal detection and estimation
  8. Master algorithms and data structures
  9. Practice
  10. Study Engineering

All the numerous other answers go along the same lines. We strongly disagree with this – in the sense that these posters miss 50% of what makes a real data scientist:┬ábusiness acumen, domain expertize, craftsmanship and tricks of the trade, data vision (both metaphorically and literally), leadership, communication skills, vendor selection, consulting skills, and expertize in finding data sets (not just insights) and metrics. Also, I believe matrix factorizations and some other stuff (eigenvalues) are not part of modern data science anymore. These answers by young very smart educated people illustrate the mismatch between what hiring managers are looking for, and what potential hires think they should learn (reinforced by university curricula) to become a data scientist.

Read more ›

Tagged with: , , , , ,
Posted in analytics, big data, data science

Shark, Real-time queries and analytics on big data

Shark, Real-time queries and analytics on big data

Shark is an interactive SQL system for Hadoop that claims to provide blazing fast (even real-time) performance that is comparable to MPP databases. It is highly-scalable system that works on top of  Spark that includes features for data co-partitioning, fault tolerance, and even the integration of machine learning. Shark supports many Hive data formats as well as HDFS, HBase, and Amazon S3.

Documentation for Shark is available Github. According to the project website, it takes around 5 mins to set up Shark on a single node for a quick spin, and about 20 mins on an Amazon EC2 cluster.

Tagged with: , , , , , ,
Posted in analytics, big data, hadoop, technology

Font Awesome – An Awesome Font for Awesome Web Applications


The self-professed iconic font designed for use with Bootstrap Framework is available for download on GitHub.

The font looks really nice with tons of icons in a variety of categories, including:

  • Web Application
  • Text Editor
  • Directions
  • Video Player
  • Social Media
  • Healthcare

The project site have instructions for integrating with Bootstrap, with or without LESS, SCSS, and SASS. And guess what? It even works with IE7!!

Tons of examples are included on the project site to get you up and running in no time.

Tagged with: , , , ,
Posted in library, web development



Backbone.js gives structure to web applications by providing models with key-value binding and custom events, collections with a rich API of enumerable functions, views with declarative event handling, and connects it all to your existing API over a RESTful JSON interface.

The project is hosted on GitHub, and the annotated source code is available, as well as an online test suite, an example application, a list of tutorials and a long list of real-world projects that use Backbone. Backbone is available for use under the MIT software license.

The library is very well documented in the form of well commented code and excellent list of projects.

Tagged with: , , , ,
Posted in javascript, library

NPR News App Architecture and Development

NPR Blog has a nice article on how to build a scalable web app using effective technology stack and low maintenance and low-cost Amazon powered servers. The post also provides links to their code.

Technologies used include:

Hardware/Server resources needed to deploy:

  • Amazon S3
  • Amazon EC2 Small Instance (Only if you need to run cronjobs)
  • That’s it. What were you expecting????


Tagged with: , , , , , , , , , , ,
Posted in apps, architecture, technology