In 2016, we saw a wide range of breakthroughs having to do with artificial intelligence and deep learning in particular. Google, Facebook, and Baidu announced several breakthroughs using deep learning. Google also defeated Go.

Deep learning is one specific class of machine learning algorithms. It has a long history, taking its roots in the earlier days of computer science. However, not all of machine learning is deep learning.

We are in 2017, it is only January, and the breakthroughs in artificial intelligence keep on being announced. These days, we hear about how the best human poker players are being defeated.

In particular, a system from Carnegie Mellon University called Libratus seems to have a lot of successes.

Details are scarce regarding Libratus. What caught my attention was the mention that it used “counterfactual regret minimization”, a conventional machine learning that is not a form of deep learning.

Given all of the hype going to deep learning, I find it almost surprising… are there really still artificial intelligence researchers working on techniques other than deep learning? (I’m being half serious.)

Last year, I participated to a panel on algorithms and their impact on Canadian culture. The organizers retroactively renamed the panel “Are we promoting Canadian content through deep-learning algorithms?” Yet the panel did not address deep learning per se. The successes of deep learning have been so remarkable that “deep learning” has become synonymous with “algorithm”.

I was recently on a graduate scholarship committee… and it seems that every smart young computer scientist is planning to work on deep learning. Maybe I exaggerate a little, but barely. I have seen proposals to apply deep learning to everything, from recognizing cancer cells all the way to tutoring kids.

A similar process is under way in business. If you are a start-up in artificial intelligence, and you are not focused on deep learning, you have to explain yourself.

Of course, machine learning is a vast field with many classes of techniques. However, one almost gets the impression that all of the major problems are going to be solved using deep learning. In fact, some proponents of deep learning have almost made this explicit claim… they often grant that other techniques may work well… on the small problems… but they often stress that for the large problems, deep learning is bound to win out.

We will keep on seeing very hard problems being defeated using various techniques, often unrelated to deep learning. If I am right, this means that these young computer scientists and start-up founders who flock to deep learning should be cautious. They may end up in an overcrowded field, missing out on important breakthroughs happening elsewhere.

It is hard to predict the future. Maybe deep learning is, indeed, the silver bullet and we will soon “solve intelligence” using deep learning… all problems will fall using variations on deep learning… Or it could be that researchers will soon hit diminishing returns. They will need to work harder and harder for ever smaller gains.

There are significant limitations to deep learning. For example, when I reviewed scholarship applications… many of the young computer scientists aiming to solve hard problems with deep learning did not have correspondingly massive data sources. Having an almost infinite supply of data is a luxury few can afford.

I believe that one unstated assumption is that there must be a universal algorithm. There must exist some technique that makes software intelligent in a general way. That is, we can “solve intelligence”… we can build software in a generic way to solve all other problems.

I am skeptical of the notion of general intelligence. Kevin Kelly, in his latest book, suggests that there is no such thing. All intelligence is specialized. Our intelligence “feels” general to us, but that’s an illusion. We think we are good at solving most problems but we are not.

For example, the human brain is showing its limits with advanced mathematics. Given a lot of training and dedication, some of us can write formal proofs without error. However, we are highly inefficient at it. I predict that it is a matter of decade or two before the unassisted human brain is recognized as being obsolete for research in advanced mathematics. The old guy doing mathematics on a blackboard? That’s going to look quaint.

So our brain is no silver bullet with respect to intelligence and thus I don’t believe that any one machine-learning technique can be a silver bullet.

I should qualify my belief. There might be one overarching “algorithm”. Matt Ridley points out in his latest book that everything evolves by a process akin to natural selection. Nature acquires an ever growing bag of tricks that are being constantly refined. In effect, there is an overarching process of trial and error. This is truly general, but with a major trade-off: it is expensive. Our biology evolved but it took all of the Earth ecosystem millions of years to produce homo sapiens. Our technology evolves, but it takes all of the power of human civilization to keep it going. It is also not fool-proof. There are regular extinction events.

**Credit**: Thanks to Martin Brooks for inspiring this blog post.

**Further reading**: DeepStack: Expert-Level Artificial Intelligence in No-Limit Poker. Simon Funk’s Welcome to AI. Performance Trends in AI by Sarah.

It seems to me that extinction events are necessary.

They clear away the obsolete and are a source of renewal.

Like spring.

I think that there is at least one example of connectionist models solving general intelligence: humans!

1. Yes, I do think that humans are a form of quite “general” intelligence. We can solve general tasks, we are useful (I am not sure that this is up for debate), and IA at human level would be too.

2. Yes, “Deep Learning” “Brain”, but I could imagine that we can go from Deep Learning to brain style performance without passing through a disruptive change in science. At some point “System 2 style” reasoning will be necessary for general IA, but, again, humans can do that with connectionist models.

I do think that humans are a form of quite â€œgeneralâ€ intelligence.It is already the case that cheap computers using little power can solve problems that no human being could ever solve in a reasonable amount of time. So it seems that there is no question about the lack of generality of our brains. We can’t come close with our own computers in basic cognitive tasks.

I totally agree with your words: “machine learning is a vast field with many classes of techniques”. That’s why I welcome and follow developments like IBM’s Watson, which continue in the tradition of the “symbolic school”, with algorithms that are less “black box” than neural networks.

Dear Daniel, I apologize in advance for taking the floor but deep learning and AI are topics for which I feel very passionate, and I work hard to keep up.

I agree there is a lot of hype in deep learning (DL), but that’s in large part justified by the awesome results got in many difficult problems which were considered until recently out of reach for artificial intelligence.

Just last year, AlphaGo beating the Go grandmaster Lee Sedol and the Google Neural Machine Translation bridging the gap with human translators fulfilled two old quests of the Grail which date back to the early days of AI. These were historic moments.

That said, in machine learning there is a well known theorem called Â« no free lunch Â» which can be freely interpreted as the lack of a priori distinctions between learning algorithms. This means that there in no Â«silver bullet nor Â«magic pillÂ» in machine learning. It always depends on the datasets and the intended target.

Deep learning is powerful and is now the dominant solution for computer vision, speech recognition and machine translation. But there is a price to be paid… as you rightly wrote, that’s the huge quantity of data needed (Google-sized) to train any reasonable deep network and the computing power in keeping with it.

Moreover, neural networks are black-boxes and human interpretability is one of the biggest challenge of deep neural networks.

That’s leaving plenty of room for more classical machine learning algorithms like XGboost, SVM, KNN, linear models and other random forests…

In conclusion, research proceeds in waves, now it’s deep learning wave. Hype is not always a bad thing since it motivates people and allows an innovation to be pushed to its limits.

My opinion, deep learning is a real breakthrough and moves us closer to strong AI but I would be very surprised that deep learning is the end of the road.

My opinion, deep learning is a real breakthrough and moves us closer to strong AI but I would be very surprised that deep learning is the end of the road.I agree with that.

Thanks for an informative interesting post.

However some advice to students, learners, and industry professionals excited about blindly applying Deep Neural Networks to everything:

1. Beware of over-fitting and bringing unnecessary complexity to what may turn out to be a problem with a simple structure.

Remember that so far the killer apps for DNNs have been problems with inherent complexity like understanding hierarchically structured data produced by nature e.g. speech, vision, etc. If you are analysing business metrics for example, DNNs may be overkill and may even give poor generalization in the field if not designed well.

2. Use a multiple learner approach where you try different activation functions and combine or select from their outputs in some manner. Just like is done in the bagging or boosting approaches in decision tree methods.

I will give you a simple example of using wrong actication function that is quite realistic for many predominantly linear problems:

Consider a regression problem where you know that its structure is predominantly linear. There is one input x and one output y which you know can be modeled well as

y = mx + e where e is minor noise to be neglected for practical accuracies. You *know* this beforehand. What you don’t know is the value of m. Then the problem is simply to find m. In this case the solution is trivial:

– Take any one sample point (x,y) and find m as y/x.

However consider what happens when you don’t apply your domain knowledge of this problem and blindly apply a sigmoidal neural network to it just because everyone else is doing it. Then say you will start with a single layer perceptron i.e. y’ = sigmoid(ax). Why are you choosing a sigmoid activation function here? Because the Universal Approximation Theorem told everyone that 3-layer network of sigmoids can approximate anything to any acuuracy.

But for our simple linear problem you will then find that no matter what value of m you converge to in your training, you will never achieve the correct value of m = y/x! This simply because mathematically the function sigmoid(mx) is close to mx only for sufficiently small x.

Even if you keep madly deepening the network with one sigmoid layer after another to try improve your accuracy you will still find that it works as well as the linear function only for small x.

The lesson here is your domain knowledge or intuition of the problem to choose the right activation function or model.

3. Consider always the tradeoff between accuracy and cost of implementation and operation, both for the training as well as the inference phases.

All right! In other words, don’t use a sledgehammer to kill a fly.

Because it’s trendy, deep learning is used for everything and nothing carelessly.

Along the same lines, I would like to share with you an hilarious blog post by Joel Grus about a recruiter completely lost facing an obsessive TensorFlow developer… https://goo.gl/2oD6xS

> All right! In other words, don’t use a sledgehammer to kill a fly.

Ha. Well said. Actually in my example, the sledgehammer doesn’t even kill the fly while the rolled up newspaper does. What the sigmoid does is add more nonlinearity and worsens the mean square error!

Sometimes though, sigmoid NNs can reasonably approximate Volterra models as I found out in my 1993 Masters Thesis in DSP on Adaptive Nonlinear Echo Cancellation using feedforward nets. The MSE with a single layer NN was worse than the 2nd order Volterra model for the echo canceler but good enough for practical accuracies i.e. echo amplitudes.

The advantage of using sigmoid NN here is that it can be super-efficiently implemented as a single transistor in analog VLSI.

Volterra filters are a linear combo plus additional nonlinear terms that are higher powers and and cross products of the inputs.

I read Joel Grus’ post and it is hilarious! We can make a comedy skit out of this or something..

A few years ago I followed the machine learning course by Andrew Ng. He started with linear regression, a mathematical technique that is over a hundred years old! And I think he was right in doing so, why use a neural network for cases where linear regression suffices? Then he explained logistic regression, a simplified neural network for classification, with the advantage over general neural networks that it is guaranteed to converge to the optimal solution.

More recently I’ve been experimenting with the Spark library. I found decision trees a great technique, because it is quite simple, and has as output an if-else tree that is easy to understand, like this:

If (feature 434 0.0)

Predict: 0.0

Looking at all the weights in a neural network will make you none the wiser.

Quite true! Linear regression is reliable, robust and easy to interpret. Furthermore, Linear regression is often used to introduce gradient descent which is THE optimization method for deep neural networks.

Decision trees and particularly ensemble of trees (XGBoost, Random forest, Gradient Tree Boosting) are very powerful learning algorithms for a large range of applications and often usable directly out-of-the-box. And trees are easy to interpret!

In fact, real Deep Learning should be reserved for very complex models with typically hundred thousands to many millions of parameters.

And yes, neural networks are black-boxes.

Nice post. Sure there is no such a thing as GAI. Intelligence is a verb not a noun. Like language, it emerges and gain substance within a context, outside of it is meaningless. What is really interesting is the societal aspects of intelligence, dealing with conflicts between two types of intelligence: us and algorithms that can knows us better than we do.

Nice post. In regard to the “one overarching algorithm”, you might be interested in Prof. Pedro Domingo’s book “The Master Algorithm” (https://www.amazon.com/Master-Algorithm-Ultimate-Learning-Machine/dp/0465065708). Here’s his relevant talk at Google: https://youtu.be/B8J4uefCQMc. (Disclaimer: I don’t agree with all ideas expressed in the talk, but I’m sharing it for the sake of comprehensiveness).