Prediction — xgboost 1.7.6 documentation (2024)

There are a number of prediction functions in XGBoost with various parameters. Thisdocument attempts to clarify some of confusions around prediction with a focus on thePython binding, R package is similar when strict_shape is specified (see below).

Prediction Options

There are a number of different prediction options for thexgboost.Booster.predict() method, ranging from pred_contribs topred_leaf. The output shape depends on types of prediction. Also for multi-classclassification problem, XGBoost builds one tree for each class and the trees for eachclass are called a “group” of trees, so output dimension may change due to used model.After 1.4 release, we added a new parameter called strict_shape, one can set it toTrue to indicate a more restricted output is desired. Assuming you are usingxgboost.Booster, here is a list of possible returns:

When using normal prediction with strict_shape set to True:
Output is a 2-dim array with first dimension as rows and second as groups. Forregression/survival/ranking/binary classification this is equivalent to a column vectorwith shape[1] == 1. But for multi-class with multi:softprob the number ofcolumns equals to number of classes. If strict_shape is set to False then XGBoost mightoutput 1 or 2 dim array.
When using output_margin to avoid transformation and strict_shape is set to True:
See Also
Germany Bundesliga I predictions, Accurate Expert Tips & Stats Sutton's Premier League predictions v darts stars Littler & Aspinall Betimate - Fußballvorhersagen, Statistiken und Fußballwetten-Tipps Browns NFL Draft 2024 guide: Picks, predictions and key needs
Similar to the previous case, output is a 2-dim array, except for that multi:softmaxhas equivalent output shape of multi:softprob due to dropped transformation. Ifstrict shape is set to False then output can have 1 or 2 dim depending on used model.
When using preds_contribs with strict_shape set to True:
Output is a 3-dim array, with (rows, groups, columns + 1) as shape. Whetherapprox_contribs is used does not change the output shape. If the strict shapeparameter is not set, it can be a 2 or 3 dimension array depending on whethermulti-class model is being used.
When using preds_interactions with strict_shape set to True:
Output is a 4-dim array, with (rows, groups, columns + 1, columns + 1) as shape.Like the predict contribution case, whether approx_contribs is used does not changethe output shape. If strict shape is set to False, it can have 3 or 4 dims depending onthe underlying model.
When using pred_leaf with strict_shape set to True:
See Also
Germany Bundesliga I predictions, Accurate Expert Tips & Stats
Output is a 4-dim array with (n_samples, n_iterations, n_classes, n_trees_in_forest)as shape. n_trees_in_forest is specified by the numb_parallel_tree duringtraining. When strict shape is set to False, output is a 2-dim array with last 3 dimsconcatenated into 1. Also the last dimension is dropped if it eqauls to 1. When usingapply method in scikit learn interface, this is set to False by default.

For R package, when strict_shape is specified, an array is returned, with the samevalue as Python except R array is column-major while Python numpy array is row-major, soall the dimensions are reversed. For example, for a Python predict_leaf outputobtained by having strict_shape=True has 4 dimensions: (n_samples, n_iterations,n_classes, n_trees_in_forest), while R with strict_shape=TRUE outputs(n_trees_in_forest, n_classes, n_iterations, n_samples).

Other than these prediction types, there’s also a parameter called iteration_range,which is similar to model slicing. But instead of actually splitting up the model intomultiple stacks, it simply returns the prediction formed by the trees within range.Number of trees created in each iteration eqauls to \(trees_i = num\_class \timesnum\_parallel\_tree\). So if you are training a boosted random forest with size of 4, onthe 3-class classification dataset, and want to use the first 2 iterations of trees forprediction, you need to provide iteration_range=(0, 2). Then the first \(2\times 3 \times 4\) trees will be used in this prediction.

Early Stopping

When a model is trained with early stopping, there is an inconsistent behavior betweennative Python interface and sklearn/R interfaces. By default on R and sklearn interfaces,the best_iteration is automatically used so prediction comes from the best model. Butwith the native Python interface xgboost.Booster.predict() andxgboost.Booster.inplace_predict() uses the full model. Users can usebest_iteration attribute with iteration_range parameter to achieve the samebehavior. Also the save_best parameter from xgboost.callback.EarlyStoppingmight be useful.

Predictor

There are 2 predictors in XGBoost (3 if you have the one-api plugin enabled), namelycpu_predictor and gpu_predictor. The default option is auto so that XGBoostcan employ some heuristics for saving GPU memory during training. They might have slightdifferent outputs due to floating point errors.

Base Margin

There’s a training parameter in XGBoost called base_score, and a meta data forDMatrix called base_margin (which can be set in fit method if you are usingscikit-learn interface). They specifies the global bias for boosted model. If the latteris supplied then former is ignored. base_margin can be used to train XGBoost modelbased on other models. See demos on boosting from predictions.

Staged Prediction

Using the native interface with DMatrix, prediction can be staged (or cached). Forexample, one can first predict on the first 4 trees then run prediction on 8 trees. Afterrunning the first prediction, result from first 4 trees are cached so when you run theprediction with 8 trees XGBoost can reuse the result from previous prediction. The cacheexpires automatically upon next prediction, train or evaluation if the cached DMatrixobject is expired (like going out of scope and being collected by garbage collector inyour language environment).

In-place Prediction

Traditionally XGBoost accepts only DMatrix for prediction, with wrappers likescikit-learn interface the construction happens internally. We added support for in-placepredict to bypass the construction of DMatrix, which is slow and memory consuming.The new predict function has limited features but is often sufficient for simple inferencetasks. It accepts some commonly found data types in Python like numpy.ndarray,scipy.sparse.csr_matrix and cudf.DataFrame instead ofxgboost.DMatrix. You can call xgboost.Booster.inplace_predict() to useit. Be aware that the output of in-place prediction depends on input data type, wheninput is on GPU data output is cupy.ndarray, otherwise a numpy.ndarrayis returned.

Categorical Data

Other than users performing encoding, XGBoost has experimental support for categoricaldata using gpu_hist and gpu_predictor. No special operation needs to be done oninput test data since the information about categories is encoded into the model duringtraining.

Thread Safety

After 1.4 release, all prediction functions including normal predict with variousparameters like shap value computation and inplace_predict are thread safe whenunderlying booster is gbtree or dart, which means as long as tree model is used,prediction itself should thread safe. But the safety is only guaranteed with prediction.If one tries to train a model in one thread and provide prediction at the other using thesame model the behaviour is undefined. This happens easier than one might expect, forinstance we might accidentally call clf.set_params() inside a predict function:

def predict_fn(clf: xgb.XGBClassifier, X): X = preprocess(X) clf.set_params(predictor="gpu_predictor") # NOT safe! clf.set_params(n_jobs=1) # NOT safe! return clf.predict_proba(X, iteration_range=(0, 10))with ThreadPoolExecutor(max_workers=10) as e: e.submit(predict_fn, ...)