Advanced predictive methods for wine age prediction: Part I - A comparison study of single-block regression approaches based on variable selection, penalized regression, latent variables and tree-based ensemble methods.

Ricardo Rendall, Ana Cristina Pereira, Marco S Reis

Talanta 2017 August 16

In this paper we test and compare advanced predictive approaches for estimating wine age in the context of the production of a high quality fortified wine - Madeira Wine. We consider four different data sets, namely, volatile, polyphenols, organic acids and the UV-vis spectra. Each one of these data sets contain chemical information of a different nature and present diverse data structures, namely a different dimensionality, level of collinearity and degree of sparsity. These different aspects may imply the use of different modelling approaches in order to better explore the data set's information content, namely their predictive potential for wine age. This happens to be so, because different regression methods have different prior assumptions regarding the predictors, response variable(s) and the data generating mechanism, which may or may not find good adherence to the case study under analysis. In order to cover a wide range of modelling domains, we have incorporated in this work methods belonging to four very distinct classes of approaches that cover most applications found in practice: linear regression with variable selection, penalized regression, latent variables regression and tree-based ensemble methods. We have also developed a rigorous comparison framework based on a double Monte Carlo cross-validation scheme, in order to perform the relative assessment of the performance of the various methods. Upon comparison, models built using the polyphenols and volatile composition data sets led to better wine age predictions, showing lower errors under testing conditions. Furthermore, the results obtained for the polyphenols data set suggest a more sparse structure that can be further explored in order to reduce the number of measured variables. In terms of regression methods, tree-based methods, and boosted regression trees in particular, presented the best results for the polyphenols, volatile and the organic acid data sets, suggesting a possible presence of a nonlinear relationship between predictors and response. Regarding the UV-vis data set, penalized regression methods (ridge regression, LASSO and elastic nets) presented the best results, albeit methods such as partial least squares (PLS) or principal component regression (PCR) are often the practitioners' preferred choice.

Full text links

We have located links that may give you full text access.

Show additional links to paperHide additional links to paper

PubMed

Add to Saved Papers

Get 1-tap access

Related Resources

For the best experience, use the Read mobile app

Get seemless 1-tap access through your institution/university

For the best experience, use the Read mobile app

All material on this website is protected by copyright, Copyright © 1994-2024 by WebMD LLC.
This website also contains material copyrighted by 3rd parties.

By using this service, you agree to our terms of use and privacy policy.

Your Privacy Choices

You can now claim free CME credits for this literature searchClaim now

Get seemless 1-tap access through your institution/university

For the best experience, use the Read mobile app