Generating Value from Data

Education hasn’t kept up with development in Big Data according to Deng.

“We are facing a huge deficit in people to not only handle big data, but more importantly to have the knowledge and skills to generate value from data — dealing with the non-stop tsunami. How do you aggregate and filter data, how do you present the data, how do you analyze them to gain insights, how do you use the insights to aid decision-making, and then how do you integrate this from an industry point of view into your business process? The whole thing is hugely important for the future.”

Good questions. But education alone may not be answer. How about moving some of the data(base) experts and content analysis experts into this field?

The problem in solely focusing on teaching this stuff is that students do not have enough context about industry needs. While educational institutions teach theory, there is very little industry alignment or effort to understand industry needs. This problem cannot be solved in isolation. It has to be part of a bigger picture and spending a year or so in doing a variety of projects helping the industry.

6 thoughts on “Generating Value from Data

  1. The obvious stuff in business intelligence articulated at relative improvements, such as improve revenue growth by 5%, or increase margin by 10%, can be nicely packaged and pushed down the decision making stack. However, the next level of analysis where you take external data and start hypothesizing and characterizing how external forces operate on the business requires a thorough understanding of the business and its externalities and not to forget its strategic planning. As very few people in the organization are privy to the strategic planning dimension, the deep analytics to derive value is localized at the top of the organization.

    This explains why the deep analytics space has bifurcated in the super high-end driven by IBM and SAS for the FORTUNE 500, and the super low-end driven by scrappy entrepreneurs in the consumer web 2.0 space. There is no opportunity for deep analytics in the missing middle. Otherwise stated, there is sufficient talent already being educated for the few spots in the high-end, and the low-end is defining/inventing it as they go along.

  2. At the nuts and bolts level, people underestimate how different running algorithms on a 1000 or more servers is, even if you run the job for a few hours on a shared resource. Very few people have this experience. Wonder how one can close that gap? Is on the job experience the only option.

    1. There is good opportunity for a service to take large jobs, partition, deploy. I am not sure whether companies like Cloudera actually provide this service. Cloudera uses Hadoop. I wonder whether there is a list of such providers in Amazon (on top of the elastic hadoop services).

      If you are interested, let me know. Will do some research and get back to you.

      1. Where is the data located? Difficult to move data around (from servers to the cloud), path of least resistance is to move computation to data, (almost as if data has mass/gravity).

        If data is on customer servers, they will prefer to work on it themselves. Alternative is to help source the data, reach criticality and become a data black hole!

        Taking a large job would necessarily involve getting access to data, and moving data around has a cost, how should one approach this aspect of the problem ?

        1. If you are going to process it in 1000 servers, you may have to move it, I guess. Optimizing movement of data itself needs to be part of the solution. You are not moving all the data but parts of it.

          how the New York Times used Amazon’s cloud to great effect. The Times’ Derek Gottfrid blogged about it 6 weeks ago. But I think it’s worth revisiting.

          His challenge was to turn 11 million archived articles, Times coverage from 1851 to 1980, into pdf files so that all readers could access them quickly from the Web. I won’t go into detail here, but he implemented the open-source software, Hadoop, bought computing power on Amazon’s cloud, and he did the entire work in one day. That’s something that would have taken weeks, or longer, on in-house machines at most publishing companies.

Comments are closed.