Machine Learning, Deep Learning, Neural networks,… The Artificial Intelligence lexicon is full of many terms and expressions. It’s easy to get lost in the “world of data”
Here you’ll find Nuukik‘s simple explanations to help you better understand the subject.
Artificial Intelligence is the set of theories and techniques used to simulate human intelligence through machines.
This concept has been around for decades, but has become more common in recent years. It has inspired curiosity and fantasies that are still a stretch from reality.
Machine Learning enables machines to solve problems without having been explicitly programmed to do so. To do so, mathematical models are created using the analysis of past observations in order to make predictions or decisions.
Some examples of Machine Learning algorithms: collaborative filtering, linear regression, random forest, …
Neural networks are directly inspired by the human brain. This system consists of an interconnected structure of neurons that function as pathways for data transfer to solve complex tasks.
Deep Learning deepens the concept of neural networks by multiplying the “layers” of neurons. The goal is to detect patterns, concepts that are too difficult for human explanation. They are often used to analyze unstructured data (images, sounds, texts, etc.) where we try to capture abstract concepts.
[Data Scientist / Data Engineer / Data Analyst]
In the world of data, there are 3 key professions:
Data Scientists build algorithms and predictive models, perform analyses and makes recommendations, and have a very good understanding of the company’s business issues.
Data Engineers sets up, develop and contribute to the entire data infrastructure (construction, maintenance, operations, etc.) in order to make it available to the company and its businesses.
Data Analysts analyze and cross-reference the company’s data, and interpret it either through one-shot analyses or by building dashboards.
[Training or Learning]
In deep learning, Training or Learning corresponds to the phase where algorithms learn from successive experiments in order to find the best solution.
Overfitting degrades the performance of machine learning algorithms. Overfitting occurs when the algorithm overlearns — or in other words, when it learns from the data, but also from patterns (schemas, structures) that are not related to the problem, such as noise (this alters the data collected and may make it difficult to learn the relationship we are trying to predict, or even make modeling impossible).
[Training set / Validation set / Testing set]
The three stages of data-driven development are training, validation, and evaluation.
Training sets are used for learning: algorithms receive the data and the data scientist uses it to design their model.
Validation sets are used to evaluate the model as it is being learned. It also compares different algorithms or settings.
Evaluation sets evaluate the model once at the end and give it a reliablity score.
[Supervised learning / Unsupervised learning]
There are two types of machine learning. They both consist in training a machine using integrated, structured, and then analyzed data with human intervention.
Supervised learning is when, during training, the algorithm is given the “right answer” to the question we want to answer.
Unsupervised learning is when we simply ask the algorithm to group the data based on their similarity.
[Feature / Feature engineering]
Feature engineering uses domain knowledge to extract functionality from raw data via data mining techniques. These features can be used to improve the performance of machine learning algorithms.
Big Data refers to a quantity of data so large that a simple data processing software cannot handle it alone, and therefore requires parallel processing on several machines.
A Data Lake is a database that can store large volumes of structured, semi-structured and unstructured data. It is a huge container ready to receive large volumes of “raw” data.
A Data Warehouse is a technological device designed to store and manage data from different source systems for exploratory analysis. The data stored in a data warehouse has been pre-processed and structured for future use.
A Data Leak or data breach is the intentional or unsecured release of an organization’s information into an environment. Data leakage threats typically occur via the web and email, but also via mobile data storage devices such as optical media, USB sticks and laptops.
Complex expressions but definitions within everyone’s reach. Thanks to Nuukik, improve your understanding of Artificial Intelligence, its technologies and its uses. Do not hesitate to contact us to discuss data.