ML for Noobs: The Landscape

Manasreddy
5 min readJan 11, 2022

By: Manas Reddy

It’s just a joke, even if they do sue me for all I’m worth, go ahead. You cantake my three dollars.

Okay actually getting into the details.

Machine Learning and Artificial Intelligence is a keyword that's just been thrown around SO much, and people have no idea what it means or how it works, or how it's used. People picture, Robots(“Thanks Elon”), real-life terminators, and CNBC.

When in actuality it’s very a very simple concept. If any of you had any pets, you’d remember when you wanted your pet to do something, you try to teach something and if your pet does it you’d give it a treat, and it didn’t well you punish it. The same thing applies here, you give your computer a task, and assign the correct co-relation, and let the computer do anything it wants. When the computer identifies the correct co-relation it’s rewarded. So the computer figures out rules to assign the correct co-relation. Sorta like “Buddy” your Golden Retriever.

For Example Consider: Programming a Spam Filter using Programming Approaches

  1. First, you’d consider what spam e-mails look like. You might notice tha6 some words or phrases (“4u”, “credit card”, “free” and “amazing”) come up a lot on the subject line.
  2. Then build a detection algorithm for each of the patterns you noticed, and your program would flag emails based on the words or phrases you gave.
  3. Simple but not efficient, because the spammers would just change the “key-words”, and you’d have to go and change it up again. See how tedious

In contrast, a spam filter based on Machine Learning would automatically learn the rules based on what you classify as spam and accordingly filter it. Much less overhead, easier to maintain, and thus more efficient.

Another area Machine Learning shines is for problems that are either too complex or there is no known algorithm for them. Language detection, you could hardcode an algorithm that measures the high-pitch intensity and use that to distinguish words, but the algorithm will not scale to identify words in a noisy environment. The best solution would be to write an algorithm that learns by itself, given many examples of recording for words.

To summarize Machine Learning is great for:

  1. Problems for which existing solutions require a lot of fine-tuning or a long list of rules
  2. Complex problems for which using the traditional approach yields no solution
  3. Fluctuating environments
  4. Getting insights about complex problems and large amounts of data.

Types of Machine Learning

Machine Learning is broadly classified into Three Types:

  1. Whether or not they are trained with human supervision(Supervised, Unsupervised, Semi-Supervised, and Reinforcement Learning)
  2. Whether or not they can learn incrementally on the fly(Online versus Batch Learning)
  3. Whether they work by simply comparing new data points to known data points, or instead by detecting patterns in the training data and building a predictive model. (Instance vs Model-Based)

Main challenges of Machine Learning

  1. Insufficient Quantity of Training Data: For a toddler to learn what a ball is, all you do is point to a ball and say “ball”. Machine Learning isn’t quite there yet. For even a single problem you typically need thousands of examples, and for complex problems such as image or speech recognition, you may need millions of examples.
  2. Non- representative Training data: In order for the model to generalize well, it is crucial that the training data be representative of the new cases, you want to generalize to. By using a non-representative training set, we are prone to training a model that is unlikely accurate predictions
  3. Poor Quality Data: Obviously if your training set is full of errors, outliers, and noise, it will make it harder for the system to detect the underlying patterns, so your system is less likely to perform well.
  4. Irrelevant features: As the saying goes “Garbage in, Garbage Out”. Your system will only be capable of learning if the training data contains enough relevant features and not too many irrelevant ones. A critical part of success is coming up with a good set of features to train on.
  5. Overfitting the data: suppose your girlfriend cheats on you(“This hit too close to home”), generalizing that all girls are bad is wrong. (“Guilty”) Machine learning is prone to this too, where the model fits the given data so well, that when given general data it fails to fit it correctly.
  6. Underfitting: As you probably would’ve guessed, underfitting is the opposite of overfitting where it occurs that the model is too simple to learn the underlying structure of the data.

I know this is a lot to take in. But trust me it gets easier, with some more practice. If you think about various scenarios you can relate to this too, like things that just happen around you. I believe learning anything gets easier. So if you’re learning something, just think of something that uses the same logic and you’ll always remember it.

--

--