Machine Learning – A Few Links and Tweets

On Machine Learning from A free book on ML – A First Encounter of Machine Learning by Max Welling

The first reason for the recent successes of machine learning and the growth of the field as a whole is rooted in its multidisciplinary character. Machine learning emerged from AI but quickly incorporated ideas from fields as diverse as statistics, probability, computer science, information theory, convex optimization, control theory, cognitive science, theoretical neuroscience, physics and more.
The second, perhaps more important reason for the growth of m
achine learning is the exponential growth of both available data and computer power. While the field is build on theory and tools developed statistics machine learning recognizes that the most exiting progress can be made to leverage the enormous flood of data that is generated each year by satellites, sky observatories, particle accelerators, the human genome project, banks, the stock market, the army, seismic measurements, the internet, video, scanned text and so on.

On why this book was written

Much of machine learning is built upon concepts from mathematics such as partial derivatives, eigenvalue decompositions, multivariate probability densities and so on. I quickly found that these concepts could not be taken for granted at an undergraduate level.

Machine learning will be one of the most important tech trends over the next three to five years for innovation”

Startups making machine learning an elementary affair

Use Cases Machine Learning on Big Data for Predictive Analytics #ml usecases

A startup journey, the improvement in Python’s data science capabilities and hosted machine learning #techtrends

RT @woycheck: Zico Kolter wants to use machine learning to analyze electrical current behavior and provide details about your power bill (@…

Microsoft Research Machine Learning Summit: April 22-24, 2013

RT @siah: A free ebook by Max Welling “A First Encounter with Machine Learning”

Google Hires Brains that Helped Supercharge Machine Learning | Wired Enterprise |

RT @siah: PyMADlib: A Python wrapper for MADlib – an open source library for scalable in-database machine learning algorithms http://t.c

Peekaboo: Machine Learning Cheat Sheet (for scikit-learn)

Panels and Discussions

This is a panel from Churchill Club featuring
Peter Norvig, Director of Research, Google ,Gurjeet Singh, Co-founder & CEO, Ayasdi, Jeremy Howard, President and Chief Scientist, Kaggle


Once in a while, I go and gather my recent tweets and create a Tweet Cloud (a project developed by a student). I find some interesting topics, save the tweets and start a blog. I have written about this Linked Tweet Cloud a couple of times.


On Respecting the Intrinsic Limitations of the Human Mind

 On respecting the intrinsic limitations of the human mind and approaching the (programming) task as Very Humble Programmers – From The Humble Programmer

… the amount of intellectual effort needed to design a program depends on the program length. It has been suggested that there is some kind of law of nature telling us that the amount of intellectual effort needed grows with the square of program length. But, thank goodness, no one has been able to prove this law. And this is because it need not be true. We all know that the only mental tool by means of which a very finite piece of reasoning can cover a myriad cases is called “abstraction”; as a result the effective exploitation of his powers of abstraction must be regarded as one of the most vital activities of a competent programmer. In this connection it might be worth-while to point out that the purpose of abstracting is not to be vague, but to create a new semantic level in which one can be absolutely precise.

the tools we are trying to use and the language or notation we are using to express or record our thoughts, are the major factors determining what we can think or express at all! The analysis of the influence that programming languages have on the thinking habits of its users, and the recognition that, by now, brainpower is by far our scarcest resource, they together give us a new collection of yardsticks for comparing the relative merits of various programming languages.

Programming will remain very difficult, because once we have freed ourselves from the circumstantial cumbersomeness, we will find ourselves free to tackle the problems that are now well beyond our programming capacity.

Hierarchical systems seem to have the property that something considered as an undivided entity on one level, is considered as a composite object on the next lower level of greater detail; as a result the natural grain of space or time that is applicable at each level decreases by an order of magnitude when we shift our attention from one level to the next lower one. We understand walls in terms of bricks, bricks in terms of crystals, crystals in terms of molecules etc. As a result the number of levels that can be distinguished meaningfully in a hierarchical system is kind of proportional to the logarithm of the ratio between the largest and the smallest grain, and therefore, unless this ratio is very large, we cannot expect many levels.

I do not know of any other technology covering a ratio of 1010 or more: the computer, by virtue of its fantastic speed, seems to be the first to provide us with an environment where highly hierarchical artefacts are both possible and necessary. This challenge, viz. the confrontation with the programming task, is so unique that this novel experience can teach us a lot about ourselves. It should deepen our understanding of the processes of design and creation, it should give us better control over the task of organizing our thoughts.

Once in a while you come across an essay that is timeless. A lot has changed in the world of software development, since this talk was delivered (in 1972). By a funny coincidence, my programming career started in 1972 and I was blissfully ignorant of the challenges Dijkstra was talking about. It has been both an exhilarating and humbling experience to be a developer for a while.

We are Greedy. We Want More.

Julia is a new language for data analysis. From four of the origianl developers – f Jeff BezansonStefan KarpinskiViral Shah, and Alan Edelman on why they invented Julia.

We are power Matlab users. Some of us are Lisp hackers. Some are Pythonistas, others Rubyists, still others Perl hackers. There are those of us who used Mathematica before we could grow facial hair. There are those who still can’t grow facial hair. We’ve generated more R plots than any sane person should. C is our desert island programming language.

We love all of these languages; they are wonderful and powerful. For the work we do — scientific computing, machine learning, data mining, large-scale linear algebra, distributed and parallel computing — each one is perfect for some aspects of the work and terrible for others. Each one is a trade-off.

We are greedy: we want more.

That is a bit of good kind of greed which makes you build stuff because you want more and benefit an entire community in the process.

More at A Julia a Meta Tutorial. 

A Few TED Talks on Education

For the past few days I have been watching a few  TED talks on Education.

I want to share a  couple of my favorites.

Shimon Schocken and Noam Nisan developed a curriculum for their students to build a computer, piece by piece. When they put the course online — giving away the tools, simulators, chip specifications and other building blocks — they were surprised that thousands jumped at the opportunity to learn, working independently as well as organizing their own classes in the first Massive Open Online Course (MOOC). A call to forget about grades and tap into the self-motivation to learn.

Daphne Koller is enticing top universities to put their most intriguing courses online for free — not just as a service, but as a way to research how people learn. With Coursera (cofounded by Andrew Ng), each keystroke, quiz, peer-to-peer discussion and self-graded assignment builds an unprecedented pool of data on how knowledge is processed.

With Coursera, Daphne Koller and co-founder Andrew Ng are bringing courses from top colleges online, free, for anyone who wants to take them

Some observations:
  • I like the approach taken by the self organizing computer course going from fundamental principles (NAND gates) to building a computer, writing an OS, a compiler and a game. It may be worth starting a community just to do that for interested students and enthusiasts.
  • The Coursera talk was fascinating. MOOCs are a popular but also a controversial topic. Daphne, in her talk mentions some of their learning from teaching students online. It was cool to see that there were using machine learning to spot some trends and how they started personalizing certain aspects of the course based on their analysis.
  • I think online learning and learning communities can help existing educational institutions. They do not replace teachers or class room learning, but complement them.
  • Anything that sparks interest or curiosity, help students follow some specific path (even if it is not part of the curriculum) of their own interest will be great tools to improve learning experience.
  • Learning by doing is probably one of the better methods of learning but the existing labs do not seem to fulfill that need.
  • Finally, teachers need help. We need to help teachers use more interesting tools to make learning engaging.
I think some of the autonomous colleges take some of these ideas and adopt them for their own needs or offer them as optional courses to interested students.

Recommended Reading – July 8, 2013

This is a list of some of my tweets and some context so that you can decide which ones are worth reading.

Notes by Tim Berners-Lee: These statements of architectural principle explain the thinking behind the specifications. 

I have been reading some of these notes and found them pretty inspiring. You can get a sense of how web evolved, the kind of thinking that goes behind some of the standard efforts at W3C. If you want to track a list of W3C standards and drafts, you may want to take a look at this list.

Writing as a Thinking Tool

There are times when a tweet is not enough. I feel that I should take a fragment from an article and make it a LinkLog (a style of blog post which is basically a link to a recommended post with a teaser). Sometimes, you like the title of a post/article. You read it and you immediately like the author’s style and content. That is what happened in this case. It was an email alert from LinkedIn that got me following Ben Casnocha. 

Mastering a skill involves hundreds of stages of incremental improvement over a very long period of time. From How to Draw an Owl.

This is a really short but very compelling article on what it takes to master a skill. The lessons from this apply to developing products or an innovation too.

If You Aren’t Taking Notes, You Aren’t Learning

A very different perspective on note taking. I take a lot of notes so when I saw this post, I was really interested. I highly recommend both reading and practicing. A bonus link includes some cool note keeping techniques from Tim Ferriss’s extreme “take notes like an alpha geek

UnImaginative Marketing

I loved this fragment from Copy Blogger:

Unimaginative marketers attempt to stand out with message frequency, or by exchanging bribes for attention (resulting in an explosion of Facebook contests and giveaways, among other tactics).

You can’t survive by shouting the loudest and relying solely on anachronistic interruption marketing. You can’t proclaim you’re featuring the “biggest sale ever!” every day (I’m looking at you, Macy’s) or simply rewrite a portion of your online brochure and hope that Google funnels customers to your website.

I think we instinctively know this but we still resort to some of the interruption marketing techniques. I like this advice:

Use social media to promote useful information first, and your company second.

Read Write Linked Data – Notes from Tim Berners-Lee

From TimB on Read-Write Linked Data

There is an architecture in which a few existing or Web protocols are gathered together with some glue to make a world wide system in which applications (desktop or Web Application) can work on top of a layer of commodity read-write storage. The result is that storage becomes a commodity, independent of the application running on it.

I must confess that I am a big fan of TimB. Every since I read his note on Semantic Web, I keep going back for more. His ideas are fascinating and have deep impact on the direction of the web and web technologies.

Pycon India 2013

If you are interested in Python, PyCon is definitely a conference worth attending. If you are working with Python, it may be a great place to share your knowledge with others and you always meet cool developers there.

PyCon India this year is on August 30- September 1st. Please take a look at the topics and vote for the ones you like. If you have experiences to share, please submit a speaking proposal here.