Notes from Data Science, Past, Present and Future

One of my favorite podcasts on Data Science is by Ben Lorica from O’Reilly. You can find more about the Data Show Podcast here.

This podcast is about Data Science Past, Present and Future – an interview with DJ Patil (a well-known Data Scientist). Here are a few snippets.

  1. Data Science is a big tent model. It requires a knowledge of Mathematics, Statistics, Programming,  a hacker mentality of exploration, and the ability to communicate well.
  2. Current Data Science applications are geared towards consumer internet. But there are lots of opportunities in areas like social sciences.
  3. A Data Driven Organization is one that uses data to test hypotheses and make decisions.
  4. Chief Data Officer (CDO) is responsible for the good stewardship of data in an organization. Sometimes, they are also known as CAO – Chief Analytics Officers
  5. Towards the end of the podcast, there was a discussion on data ethics, dilemma of decision making in automated systems and societal questions.

Restarting My Read Log

I am restarting my Read Log. A read log is a blog of a list of things I read and find useful. I tweet some of them but Tweets have a short half-life.

The inspiration for Read Log comes from different sources – Four short links by Nat, Brain Pickings from Maria, Farnam Street by Shane, and a few others.

Some of the best bloggers I know work hard at writing their posts and sometimes I feel like I am cheating. But these posts are worth sharing and if I am lucky, some of them may even start conversations.

Half Life of Knowledge

How long is your knowledge relevant. In other words, what is the half-life of your knowledge?

Wikipedia has a nice description of the half-life of knowledge

The half-life of knowledge or half-life of facts is the amount of time that has to elapse before half of the knowledge or facts in a particular area is superseded or shown to be untrue. These coined terms belong to the field of quantitative analysis of science known as scientometrics.

Here are a few things to think about:

  • What is the half-life of entrepreneur knowledge? Can we take lessons from the past and use them today?
  • What is the half-life of knowledge about software architecture and design?
  • What is the half-life of knowledge about sales and marketing techniques?

Some of the knowledge may have a shorter half-life, some of it may not be. To stay relevant in your industry you need to figure out how much of your knowledge is still useful.

Why I Retweet

There are several reasons. In no particular order:

  • I like the message in the tweet. I resonate with it. 
  • I like the link – typically a pointer to good reading material
  • Because it provides a different point of view 
  • I use it as a marker in my life – a part of my daily log
  • It is a hat tip to the author who causes me to pause and think. 
  • I think this tweet requires recognition and I would like to spread the idea
  • It may be a part of a discussion. I jump in and do my two bits.
  • It may be an event and I want to share it (a picture, a quote, a sound bite)
  • Same reason I tweet – to start a conversation
  • Same reason I tweet – to ask a question

Applied ML – How Uber Uses Machine Learning at Scale

This is one of the most comprehensive engineering blog posts on how Uber uses Machine Learning (ML) at scale. It covers:

  • Uber’s ML Platform – Michael Angelo
  • Uber’s research and production efforts and how they inter-relate
  • How Uber achieves Model Developers Velocity

I made a list of few terms and concepts from the article:

  • ML deployment use cases
  • Pervasive deployment of ML in several applications
  • Distributed training of ML
  • Aligning ML applications with Uber’s priorities
  • ML tools across the company (where and what)
  • Internal events like – ML conferences, ML reading groups, talk series
  • Data Science Workbench (a tool to build and iterate ML models)
  • ML Platform team and how they work to support ML development inside Uber
  • Technology stacks – Spark, Cassandra, Python and others
  • Experiments with external tools both open source and commercial
  • Uber’s open source contributions

It is nice to know how a dynamic company uses Machine Learning. There is a lot to learn from here. If you are thinking about building and deploying ML applications Scaling Machine Learning at Uber with Michelangelo | Uber Engineering Blog is a must read. I may go back and read it again.

A Great CTO Talk about Technology at Walmart

OrangeScape has this new initiative called CTO Talks. I think it is a brilliant idea. “While there are a lot of conversations taking place at the software development level, there are none at the CTO level”, says Suresh. I agree. We need different levels of conversations on technology.

I enjoyed  The talk on Technology at Walmart – a few Glimpses. I hope to see other more comprehensive blog posts. I was looking for the use of Machine Learning at Walmart, and I was not disappointed.

Here is a list of uses of Machine Learning (ML) at Walmart.

    • Competitive Intelligence and Analytics
    • Crawl frequency prediction (how frequently you can crawl certain sites for price information – too many crawls, and you will be blocked. Too few and you an miss useful information. Different sites update information at different intervals)
    • Natural Language Processing (NLP) of product catalogs
    • Bossa Nova robots roaming the aisles at Walmart locations checking out of stock items mislabeled shelf tags, and incorrect prices.
    • IOT  at Walmart – Monitoring temperatures of Refrigerators in real time
    • Visual inspection and Spoilage predictions
    • Predictive analytics of future failure of equipment
    • Predicting attrition (they have 2.3 million associates in over 11,000 locations)
    • Predicting absenteeism based on weather patterns an HR application (and an important contributor to maintaining service levels in their stores)
    • Hari briefly touched upon blockchain. They are looking at using it for tracing grocery items from source to customer.

Walmart is one of the leading indicators of technology adoption in retail. Hari mentioned that they were the first to introduce Satellites dishes in their store locations, barcode scanning, use of RFID and providing a direct view of store items to their suppliers.

It was a great talk. It was no wonder that we had an amazing turnout (more than 250 registrations). Hari answered all the questions patiently and in depth.

Little Bits of History of my Programming Journey

1972 – I wrote my first program in PDP-8 assembly language

1973-74 – Diagnostics for a clone of PDP-11 called TDC-16, early device drivers (they were called IOCS – input/output control systems). Early programs were written in Machine Language (coded in octal) since no assembler was available, keyed programs into the console using toggle switches (as binary code) and debugged

1974 – Had paper tape – ASR-35 later high-speed reader and mylar tapes

1974 – Debugged device drivers for magnetic tapes and discs, wrote memory diagnostics that detected noise in core memory (and required shielding)

1975 – My First commercial program in assembly language for Bombay stock exchange for matching buys and sells of stocks. The records were punched in cards and fed to the computer, stored in magnetic tapes and matches performed. The memory configuration was a whopping 16KB.

1976 – Taught, RSX-11M (a real-time operating system in PDP-11) at Tata Electric. Wrote first set of PDP-11 program in RSX-11M an operating system for PDP-11

1978 – Learned  operating systems (RT-11, 11M, IAS, RSTS/E) all PDP-11

1978-79 Built the first soap survey program on RSTS/E in Basic Plus (for IMRB)

1979 – Wrote first commercial applications in Cobol (mostly for training others) and several small Basic-Plus utilties. Worked on performance tuning of RSTS/E operating system.

 Patched RSTS/E corrupted disk writing programs in Basic-Plus

1980-81 Wrote commercial programs in Cobol for consulting at Ashok Leyland

1983 – Developed benchmarks in Cobol for Wipro in Cobol

1984 – First C program (a database schema analyzer in Decus C)

1984 – My first Comdex in Las Vegas

1985 – First relational database metadata design as part of Integra SQL development and wrote small C programs mostly for testing the database

1986 – Integra SQL Version -1 with no nested selects, designed and built entirely by reading C.J.Date’s book on Relational Database Systems

1986 – Licensed Integra SQL to SCO (Santa Cruz Operations)

1987 – 1990: C-Trieve (an ISAM file management system), Objectrieve – C-Trieve extended to support Blobs, Licensing of C-Trieve to the White Water Group (they called it WinTrieve)

1991 – Objectrieve/VB was born and exhibited at Comdex May 1991

1992 – DbControls a set of custom controls for building database applications

1993 – Integra VDB – The first relational database set of components. Got covered in the BYTE magazine

1994-1996 – Layered SQL on top of dBase, Paradox, Btrieve (the last one was a project for Varian systems). Most of the coding was writing small examples in C, VB.

1996-2008 Coding winter

2009-Now – Dabbling in Python, little bits of ML, Chatbots

Some Ideas for a Newbie Tweeter

I am always urging people who would listen (and even people who would not ) to blog, tweet or learn Python. A friend of mine, who finally bought into my idea asked me “What should I tweet about”. I wrote a list. I thought it may be useful to others too. So I am sharing it here.
 
I assume that you know your target audience. When you start out, you may not know. Make your best-educated guess but confirm it as you tweet and get responses.
 
  1. Tweet about your professional self. Especially, lessons you learned that you think may be relevant to your audience. 
  2. Tweet about your profession. Talk about what aspects you enjoy most.
  3. Tweet about events. Not just that the event happened but what caused it, what you see as the effect of such events.
  4. Tweet about your learning (related to your profession). 
  5. Advice to my younger self is a nice format in which you can share your insights and wisdom about life. 
  6. Share little bits of knowledge. A one-pager or a paragraph of about a topic in your industry would be a great start.
  7. Share tweets you like. Please annotate it with your observations.
  8. Ask your audience a simple open question and start a conversation. Use a hashtag to watch these conversations. 
  9. Tweet about something worth reading, listening to or watching. Mention why you are recommending it.
  10. Tweet about ideas and trends in your industry and their potential impact. This can be another interesting conversation starter.
Please share your ideas on tweeting. If you write blog posts, please tweet them and use #tweetideas as a hashtag.

Thinking Through the Design of a Product is Fun

I was talking to a student. He is fascinated with a robot that cleans pipes. He had a prototype and won some awards. He wanted to discuss it.

We sat with him and brainstormed many ideas for the design at a very high level. I encouraged him to think about a different cleaner robot – one that cleans water tanks. Our discussion lasted half an hour and it was one of the most rewarding exercises I did today.

Thinking through the design of products is fun. When you do it as a small passionate group, it is even more fun. One of the reasons I hang out with a lot of engineering students.