Book: What Is Data Science?

I just finished reading the book What is Data Science?.


It is a small book (25 pages) and one of the many good starting points to learn about Data Science. This not a review but a few quotes from the book:

  • According to Mike Driscoll(@dataspora), statistics is the “grammar of data science.”
  • According to Martin Wattenberg (@wattenberg, founder of Flowing Media), visualization is key to data conditioning: if you want to find out just how bad your data is, try plotting it.
  • Making data tell its story isn’t just a matter of presenting results; it involves making connections, then going back to other data sources to verify them.
  • Data science requires skills ranging from traditional computer science to mathematics to art.
  • According to DJ Patil,  (@dpatil), the best data scientists tend to be “hard scientists,” particularly physicists, rather than computer science majors. Physicists have a strong mathematical background, computing skills, and come from a discipline in which survival depends on getting the most from the data. They have to think about the big picture, the big problem. When you’ve just spent a lot of grant money generating data, you can’t just throw the data out if it isn’t as clean as you’d like. You have to make it tell its story. You need some creativity for when the story the data is telling isn’t what you think it’s telling.
  • What Patil calls “data jiujitsu”—using smaller auxiliary problems to solve a large, difficult problem that appears intractable (he has a book on Data Jujitsu)
  • Patil’s first flippant answer to “what kind of person are you looking for when you hire a data scientist?” was “someone you would start a company with.”
  • Data scientists combine entrepreneurship with patience, the willingness to build data products incrementally, the ability to explore, and the ability to iterate over a solution. They are inherently interdiscplinary. They can tackle all aspects of a problem, from initial data collection and data conditioning to drawing conclusions. They can think outside the box to come up with new ways to view the problem, or to work with very broadly defined problems: “here’s a lot of data, what can you make from it?”
  • The future belongs to the companies who figure out how to collect and use data successfully. Google, Amazon, Facebook, and LinkedIn have all tapped into their datastreams and made that the core of their success. They were the vanguard, but newer companies like are following their path.
  • The part of Hal Varian’s quote that nobody remembers says it all: “The ability to take data—to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it—that’s going to be a hugely important skill in the next decades.”

This graphic from BigData Startups shows that lots of organizations still do not understand Big Data and predicts a shortage f 140k-190k big data scientists and 1.5M big data managers in USA alone by 2018.


I am reading a bunch of books and will probably do more of these posts. BTW, big data is not always about big data. It is an umbrella term to cover different areas that deal with deriving value out of data.


Guardians Data Blog – How it Came About

From Guardian’s Data Blog:

We are drowning in information. The web has given us access to data we would never have found before, from specialist datasets to macroeconomic minutiae. But, look for the simplest fact or statistic and Google will present a million contradictory ones. Where’s the best place to start?

That’s how this blog came about. Everyday we work with datasets from around the world. We have had to check this data and make sure it’s the best we can get, from the most credible sources. But then it lives for the moment of the paper’s publication and afterward disappears into a hard drive, rarely to emerge again before updating a year later.



Love The Motivation Behind This Book

I really like this para – The Motivation Behind the Book Doing Data Science:

The world is opening up with possibilities for people who are quantitatively minded and interested in putting their brains to work to solve the world’s problems. I consider it my goal to help these students to become critical thinkers, creative solvers of problems (even those that have not yet been identified), and curious question askers. While I myself may never build a mathematical model that is a piece of the cure for cancer, or identifies the underlying mystery of autism, or that prevents terrorist attacks, I like to think that I’m doing my part by teaching students who might one day do these things. And by writing this book, I’m expanding my reach to an even wider audience of data scientists who I hope will be inspired by this book, or learn tools in it, to make the world better and not worse.

The solutions to all the world’s problems may not lie in data and technology—and in fact, the mark of a good data scientist is someone who can identify problems that can be solved with data and is well-versed in the tools of modeling and code. But I do believe that interdisciplinary teams of people that include a data-savvy, quantitatively minded, coding-literate problem-solver (let’s call that person a “data scientist”) could go a long way.

Good Reads: Single Most Important Habit that Shaped up my Career

Kunal Jain on Single habit that defined the trajectory of my career

What is the single most important habit that shaped up my career? This is the habit which propelled me from from being just an ordinary analyst to some one who can influence, manage and mentor people in Analytics industry.

Here is the habit:

Spend a defined fraction of your day working on the the most important project / problem you have.

Please note the importance of two words here: defined and most important. You need to fix what fraction of your time you would spend and what is the most important task for you.


I discovered Kunal through an article on KDNuggets. Found his Twitter account and followed him and from there to his LinkedIn account to this article. It is nice to see people sharing so much of their knowledge through Tweets and blog posts.

A couple of other useful links if you are interested in Analytics from Kunal

Must read books and blogs on Web Analytics

Analytics Vidhya Twitter Account

Thanks Kunal. We need more people like you.

Interview: On High Level Languages

An interesting discussion about higher level languages and reinventing data science.

…it will be a disappointing moment in technology history if our  definition of a language was a thing that had ‘for’ loops and had to involve this very low-level, procedural way of approaching creating functionality.

Ultimately, what we want to do with language is we want to express ourselves and get the things we want to do done. My theory of that is the more we can have that done automatically by a system, the better for us.

Declarative languages like SQL do that to an extent. But Stephen Wolfram talks about something even higher level. Here are some examples.


Good Read: Learning How To Program – Top Down or Bottom Up?

From A concrete approach to learning how to program 

there are good arguments for and against both of these approaches. But my experience has shown me that bottom-up approaches tend to be better for the vast majority of beginners, while only a select few beginners can easily excel at the top-down approach. My hunch is that most of these folks who excel at learning via top-down approaches have some kind of prior experience with programming, or have some natural inclination towards being adept with technology.

I am biased since I started with the bottom-up approach. What is your experience in training developers?

Good Reads: Ways to Acquire Knowledge

14 Ways to Acquire Knowledge: A Timeless Guide from 1936 by  is a great read.

Here are a few fragments from the original post. There is a lot more to read and reflect upon.

On reading:

If you must read in order to acquire knowledge, read critically. Believe nothing till it’s understood, till it’s clearly proven.

On Writing:

To know it — write it! If you’re writing to explain, you’re explaining it to yourself! If you’re writing to inspire, you’re inspiring yourself! If you’re writing to record, you’re recording it on your own memory.

On Learning:

To learn, experiment! Try something new. See what happens. Lindbergh experimented when he flew the Atlantic. Pasteur experimented with bacteria and made cow’s milk safe for the human race. Franklin experimented with a kite and introduced electricity.

On  knowledge:

If you would have knowledge, knowledge sure and sound, teach. Teach your children, teach your associates, teach your friends. In the very act of teaching, you will learn far more than your best pupil.

On Reasoning:

Animals have knowledge. But only men can reason. The better you can reason the farther you separate yourself from animals.


Mark Zuckerberg on Oculus Acquisition

Mark Zuckerberg on Oculus Acquisition

But this is just the start. After games, we’re going to make Oculus a platform for many other experiences. Imagine enjoying a court side seat at a game, studying in a classroom of students and teachers all over the world or consulting with a doctor face-to-face — just by putting on goggles in your home.

This is really a new communication platform. By feeling truly present, you can share unbounded spaces and experiences with the people in your life. Imagine sharing not just moments with your friends online, but entire experiences and adventures.

I am really excited by “studying in classroom of students and teachers all over the world” part. Will every MOOC recreate their learning material to take advantage of emerging technologies like this?

More about What is Oculus Rift and Why Facebook is Buying it. 

Tall Claims

Once in a while I get an email that starts out with:

We are a world leading….

When I read that, I cringe a bit and promptly delete the email. I don’t understand why there is a compulsion to claim to be “world leading” .

What is wrong with:

  • We are a product company growing fast (if that is really true)
  • Our products are liked by our customers (can be proven)
  • We provide good service (can be proven)
  • We are always trying to please our customers
  • We try hard to keep improving ( a modest claim but can be believed)

I am still trying to understand some of these marketing communications. I think more than anything else, what you say must be believable. Trust me. If you are world leading in anything, people are likely to know you without your saying so.


I am guilty of this tendency too, to some extent. We claim that “x is the coolest way to…”. That was my immaturity in clear evidence.