LaDissertation.com - Dissertations, fiches de lectures, exemples du BAC
Recherche

Introduction aux Random Forest

Dissertation : Introduction aux Random Forest. Recherche parmi 298 000+ dissertations

Par   •  11 Octobre 2021  •  Dissertation  •  4 057 Mots (17 Pages)  •  324 Vues

Page 1 sur 17

[pic 1][pic 2][pic 3]


Table des matières

Instruction of the project:        - 2 -

Introduction:        - 2 -

Required Knowledge:        - 3 -

Ensemble learning:        - 3 -

Decision trees:        - 5 -

A.        Classification decision trees        - 5 -

B.        Regression trees        - 7 -

C.        Limitation of decision trees:        - 8 -

The Random Forest Method:        - 9 -

The model:        - 9 -

A.        Bagging and pasting:        - 9 -

B.        The creation of the data sets:        - 10 -

C.        Training of the decision trees:        - 10 -

D.        Iteratively repeating step B and C:        - 10 -

When should Random Forest be applied?        - 11 -

A.        Inference and prediction        - 11 -

B.        Advantages and disadvantages of the random forest model        - 12 -

Applications:        - 13 -

Detection:        - 13 -

Conclusion        - 15 -

Instruction of the project:


Write a 10 to 15 pages document about any topic in link with Machine Learning. It could be an investigation of a method seen in class, or a study about a new Machine Learning method, or even an application of Machine Learning that you find appealing

Introduction:

Humans have always learned from past experiences and machines follow instructions given by us, from the combination of these two assumptions we have created machine learning, which has for objective to give the tool to the machine to learn from data sets and to improve itself. Despite the common thoughts, machine learning is everywhere and the application link to the topic is not as hard as it can seem. The price of your Uber race, the playlist called “songs you might like” on Spotify are using machine learning models but if Spotify gives the real name of the program used (can be done with k-nearest neighbors algorithm if you wish to have a look by yourself) for your playlist it will not be as attractive and the field of application of machine learning is larger than Uber and Spotify, we use machine learning in healthcare, sentiment analysis, advertising, fraud detection, booking and so more.

The accuracy of machine learning can be introduced in two points, firstly, remember that the more data you will have the more accurate the response will be, and the other point is that as the data can be from every topic you can think of, there is no universal machine learning model so, you have to find the right model to analyze your data.

We have chosen to present you the Random forest algorithm made by Leo Breiman and Adele Cutler with an overview. This model is the unification of the decision tree model and the ensemble learning theory and is one of the most understandable models that we can find.

Our objective is to make the topic understandable by everyone, at least the non-programming part, we have tried to reduce the mathematical aspect to a specific part and not to have a full mathematical document. So, outside the algorithm explanation part, you will only meet a few and not very hard equation, and even in the case that you are unable to understand the mathematical logic you will still be able to understand the random forest model and that is one of the aspects that has made us choose this model.

The presentation will begin with some required knowledge about decision trees and ensemble learning then we will continue by the random forest model with a description of the model, then its domain of application, and the strong and weak aspects of the model. Furthermore, we will see an application field of the model, which is the detection, and finally, summarize the whole method as a conclusion.

In the appendix, you will find an implementation that we find and our explanation of the program.

We hope that you will find at least as much interest in this model as we had during its writing.

Required Knowledge:

Ensemble learning:

“Ensemble learning is a machine learning paradigm where multiple learners are trained to solve the same problem. In contrast to ordinary machine learning approaches which try to learn one hypothesis from training data, ensemble methods try to construct a set of hypotheses and combine them to use”

Zhi-Hua Zhou – Ensemble learning

To describe ensemble learning theory has been created because of a major problem. We do not have for most of the programmers, thousands of euros to buy the most performing machine to execute one huge system with the best performance, so this theory came up saying: why do not we subdivide our machine into many simple machine learning systems with weaker performance and then unify the results. Today most of the best machine learning systems are created using this theory and also one of its major forces is to be able to compensate the weakness of one model with the power of another one which creates in the end a powerful and polyvalent machine learning system.

Using several systems instead of one leads to the variability in which we distinguish two major parts: the inter-operability and the intra-operability. The intra-operability is the variability of the result giving by the same component. (like we have seen with pictures encoders that do not send back the same rebuild picture in the CIRI session) The inter-operability will be the variability between the result that several systems will send back because for reliability we used several different systems to execute a task and then study the variability of these multiple results. To study the result, most of the time, we are basing ourselves on the wisdom of the crowd theory which implies that the collective opinion is better than the opinion of one member of the crowd, the meaning is, as expert as it could be one of the systems cannot be as accurate as the mean of the result of many systems. To be valid the crowd must respect diversity, meaning that we must base our decision on the result of different systems, independence, meaning that the result of one system must be done without the influence of the others, and the last concept is decentralization, meaning that the result will accumulate without an authority that will reject some of them.

...

Télécharger au format  txt (22.8 Kb)   pdf (2.2 Mb)   docx (2.7 Mb)  
Voir 16 pages de plus »
Uniquement disponible sur LaDissertation.com