[Infographic] Data Lexicon

Machine Learning, Deep Learning, Neural networks,… The Artificial Intelligence lexicon is full of many terms and expressions. It’s easy to get lost in the “world of data”

Here you’ll find Nuukik‘s simple explanations to help you better understand the subject.

[Artificial Intelligence]

Intelligence Artificielle

Artificial Intelligence is the set of theories and techniques used to simulate human intelligence through machines.
This concept has been around for decades, but has become more common in recent years. It has inspired curiosity and fantasies that are still a stretch from reality.

Artificial Intelligence - Machine Learning - Neural Network - Deep Learning

[Machine Learning]

Machine Learning enables machines to solve problems without having been explicitly programmed to do so. To do so, mathematical models are created using the analysis of past observations in order to make predictions or decisions.

Some examples of Machine Learning algorithms: collaborative filtering, linear regression, random forest, …

Machine Learning

[Neural network]

Neural network

Neural networks are directly inspired by the human brain. This system consists of an interconnected structure of neurons that function as pathways for data transfer to solve complex tasks.

[Deep Learning]

Deep Learning deepens the concept of neural networks by multiplying the “layers” of neurons. The goal is to detect patterns, concepts that are too difficult for human explanation. They are often used to analyze unstructured data (images, sounds, texts, etc.) where we try to capture abstract concepts.

Deep Learning

[Data Scientist / Data Engineer / Data Analyst]

In the world of data, there are 3 key professions:

Data Scientist
Data Engineer
Data Analyst

Data Scientists build algorithms and predictive models, perform analyses and makes recommendations, and have a very good understanding of the company’s business issues.

Data Engineers sets up, develop and contribute to the entire data infrastructure (construction, maintenance, operations, etc.) in order to make it available to the company and its businesses.

Data Analysts analyze and cross-reference the company’s data, and interpret it either through one-shot analyses or by building dashboards.

[Training or Learning]

In deep learning, Training or Learning corresponds to the phase where algorithms learn from successive experiments in order to find the best solution. 

Training or Learning


Sur-apprentissage (overfitting)

Overfitting degrades the performance of machine learning algorithms. Overfitting occurs when the algorithm overlearns — or in other words, when it learns from the data, but also from patterns (schemas, structures) that are not related to the problem, such as noise (this alters the data collected and may make it difficult to learn the relationship we are trying to predict, or even make modeling impossible).

[Training set / Validation set / Testing set]

The three stages of data-driven development are training, validation, and evaluation.

Training set
Validation set
Testing set

Training sets are used for learning: algorithms receive the data and the data scientist uses it to design their model.

Validation sets are used to evaluate the model as it is being learned. It also compares different algorithms or settings.

Evaluation sets evaluate the model once at the end and give it a reliablity score.

[Supervised learning / Unsupervised learning]

There are two types of machine learning. They both consist in training a machine using integrated, structured, and then analyzed data with human intervention.

Apprentissage supervisé
apprentissage non-supervisé

Supervised learning is when, during training, the algorithm is given the “right answer” to the question we want to answer.

Unsupervised learning is when we simply ask the algorithm to group the data based on their similarity.

[Feature / Feature engineering]

Feature engineering

Feature engineering uses domain knowledge to extract functionality from raw data via data mining techniques. These features can be used to improve the performance of machine learning algorithms.

[Big Data]

Big Data refers to a quantity of data so large that a simple data processing software cannot handle it alone, and therefore requires parallel processing on several machines.

Big Data

[Data Lake]

Data Lake

A Data Lake is a database that can store large volumes of structured, semi-structured and unstructured data. It is a huge container ready to receive large volumes of “raw” data.

[Data Warehouse]

A Data Warehouse is a technological device designed to store and manage data from different source systems for exploratory analysis. The data stored in a data warehouse has been pre-processed and structured for future use.

Data Warehouse

[Data Leak]

Data leak

A Data Leak or data breach is the intentional or unsecured release of an organization’s information into an environment. Data leakage threats typically occur via the web and email, but also via mobile data storage devices such as optical media, USB sticks and laptops.

Complex expressions but definitions within everyone’s reach. Thanks to Nuukik, improve your understanding of Artificial Intelligence, its technologies and its uses. Do not hesitate to contact us to discuss data.

Data Lexicon
Data Lexicon
Data Lexicon
Data Lexicon
Data Lexicon
Data Lexicon

Vous souhaitez en savoir plus sur notre société et nos services, nous sommes à votre disposition par email, téléphone et chat.