CHU WEI's HOMEPAGE

User behavior provides many cues to improve the relevance of search results through personalization. One aspect of user behavior that provides especially strong signals for delivering better relevance is an individual’s history of queries and clicked docu-ments. Previous studies have explored how short-term behavior or long-term behavior can be predictive of relevance. Ours is the first study to assess how short-term (session) behavior and long-term (historic) behavior interact, and how each may be used in isolation or in combination to optimally contribute to gains in relevance through search personalization.[pdf]

L. Li, W. Chu, J. Langford, T. Moon, and X. Wang (2012) An unbiased offline evaluation of contextual bandit algorithms with generalized linear models, Journal of Machine Learning Research - Workshop and Conference Proceedings 26 (JMLR W&CP-26) (View Abstract)

Contextual bandit algorithms have become popular tools in online recommendation and advertising systems. Offline evaluation of the effectiveness of new algorithms in these applications is critical for protecting online user experiences but very challenging due to their ``partial-label'' nature. The purpose of this paper is two-fold. First, we review a recently proposed offline evaluation technique. Different from simulator-based approaches, the method is completely data-driven, is easy to adapt to different applications, and more importantly, provides provably unbiased evaluations. We argue for the wide use of this technique as standard practice when comparing bandit algorithms in real-life problems. Second, as an application of this technique, we compare and validate a number of new algorithms based on generalized linear models. Experiments using real Yahoo! data suggest substantial improvement over algorithms with linear models when the rewards are binary. [pdf]

W. Chu, M. Zinkevich, L. Li, A. Thomas, and B. Tseng (2011) Unbiased online active learning in data streams, ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD-17) (View Abstract)

Unlabeled samples can be intelligently selected for labeling to minimize classification error. In many real-world applications, a large number of unlabeled samples arrive in a streaming manner, making it impossible to maintain all the data in a candidate pool. In this work, we consider the unbiasedness property in the sampling process, and design optimal instrumental distributions to minimize the variance in the stochastic process. Meanwhile, Bayesian linear classifiers with weighted maximum likelihood are optimized online to estimate parameters. [pdf]

T. Moon, W. Chu, L. Li, Z. Zheng, Y. Chang (2012) Online learning framework for refining recency search results with user click feedback, to appear in Transactions on Information Systems (View Abstract)

In this paper, we focus on recency search and study a number of algorithms to improve ranking results by leveraging user click feedback. Our contributions are three-fold. First, we use real search sessions collected in a random exploration bucket for \emph{reliable} offline evaluation of these algorithms, which provides an unbiased comparison across algorithms without online bucket tests. Second, we propose a re-ranking approach to improve search results for recency queries using user clicks. Third, our empirical comparison of a dozen algorithms on real-life search data suggests importance of a few algorithmic choices in these applications, including generalization across different query-document pairs, specialization to popular queries, and real-time adaptation of user clicks. [pdf]

L. Li, W. Chu, J. Langford and X. Wang (2011) Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms, in Proc. of ACM Web Search and Data Mining (WSDM-04) 297-306 (View Abstract)

In this paper, we introduce a replay methodology for contextual bandit algorithm evaluation. Different from simulator-based approaches, our method is completely data-driven and very easy to adapt to different applications. More importantly, our method can provide provably unbiased evaluations. Our empirical results on a large-scale news article recommendation dataset collected from Yahoo! Front Page conform well with our theoretical results. Furthermore, comparisons between our offline replay and online bucket evaluation of several contextual bandit algorithms show accuracy and effectiveness of our offline evaluation method. [pdf]

L. Li, W. Chu, J. Langford and R. E. Schapire (2010) A contextual-bandit approach to personalized news article recommendation, in Proc. of International World Wide Web Conference (WWW-19) (View Abstract)

Personalized web services strive to adapt their services (advertisements, news articles, etc.) to individual users by making use of both content and user information. Despite a few recent advances, this problem remains challenging for at least two reasons. First, web service is featured with dynamically changing pools of content, rendering traditional collaborative filtering methods inapplicable. Second, the scale of most web services of practical interest calls for solutions that are both fast in learning and computation. In this work, we model personalized recommendation of news articles as a contextual bandit problem, a principled approach in which a learning algorithm sequentially selects articles to serve users based on contextual information about the users and articles, while simultaneously adapting its article-selection strategy based on user-click feedback to maximize total user clicks. [pdf]

W. Chu and Z. Ghahramani (2009) Probabilistic models for incomplete multi-dimensional arrays, in Proc. of International Conference on Artificial Intelligence and Statistics (AISTATS-12) (View Abstract)

In multiway data, each sample is measured by multiple sets of correlated attributes. We develop a probabilistic framework for modeling structural dependency from partially observed multi-dimensional array data, known as pTucker. Latent components associated with individual array dimensions are jointly retrieved while the core tensor is integrated out. The resulting algorithm is capable of handling large-scale data sets. We verify the usefulness of this approach by comparing against classical models on applications to modeling amino acid fluorescence, collaborative filtering and a number of benchmark multiway array data. [pdf] [third-party pTucker code]

W. Chu and S.-T. Park (2009) Personalized recommendation on dynamic content using predictive bilinear models, in Proc. of International World Wide Web Conference (WWW-18) (View Abstract)

In Web-based services of dynamic content (such as news articles), recommender systems face the difficulty of timely identifying new items of high-quality and providing recommendations for new users. We propose a feature-based machine learning approach to personalized recommendation that is capable of handling the cold-start issue effectively. The proposed framework is general and flexible for other personalized tasks. The superior performance of our approach is verified on a large-scale data set collected from the Today-Module on Yahoo! Front Page, with comparison against six competitive approaches. [pdf] [slides]

S.-T. Park and W. Chu (2009) Pairwise preference regression for cold-start recommendation, in Proc. of ACM Recommender Systems (RecSys-03) (View Abstract)

Recommender systems are widely used in online e-commerce applications to improve user engagement and then to increase revenue. A key challenge for recommender systems is providing high quality recommendation to users in ``cold-start" situations. We consider three types of cold-start problems: 1) recommendation on existing items for new users; 2) recommendation on new items for existing users; 3) recommendation on new items for new users. We propose predictive feature-based regression models that leverage all available information of users and items, such as user demographic information and item content features, to tackle cold-start problems. The resulting algorithms scale efficiently as a linear function of the number of observations. We verify the usefulness of our approach in three cold-start settings on the MovieLens and EachMovie datasets, by comparing with five alternatives including random, most popular, segmented most popular, and two variations of Vibes affinity algorithm widely used at Yahoo! for recommendation.

R. Silva, W. Chu and Z. Ghahramani (2007) Hidden common cause relations in relational learning, in Advances in Neural Information Processing Systems (NIPS-20) (View Abstract)

We consider the case when relationships are postulated to exist due to hidden common causes. We discuss how the resulting graphical model differs from Markov networks, and how it describes different types of real-world relational processes. A Bayesian nonparametric classification model is built upon this graphical representation and evaluated with several empirical studies. GOTO Ricardo Silva's homepage for [pdf], [data] and [code]

K. Yu and W. Chu (2007) Gaussian process models for link analysis and transfer learning, in Advances in Neural Information Processing Systems (NIPS-20) (View Abstract)

In this paper we model relational random variables on the edges of a network using Gaussian processes (GPs). We describe appropriate GP priors, i.e., covariance functions, for directed and undirected networks connecting homogeneous or heterogenous nodes. The framework suggests an intimate connection between link prediction and transfer learning, which were traditionally two separate topics. [pdf]

P. K. Shivaswamy, W. Chu and M. Jansche (2007) A support vector approach to censored targets, in Proc. of IEEE International Conference on Data Mining (ICDM-07) (View Abstract)

Censored targets, such as the time to events in survival analysis, can generally be represented by intervals on the real line. In this paper, we propose a novel support vector technique (named SVCR) for regression on censored targets. Interestingly, this approach provides a general formulation for both standard regression and binary classification tasks. [pdf] [longer version]

W. Chu, V. Sindhwani, Z. Ghahramani and S. S. Keerthi (2006) Relational learning with Gaussian processes, in Advances in Neural Information Processing Systems (NIPS-19) (View Abstract)

Correlation between instances is often modelled via a kernel function using input attributes of the instances. Relational knowledge can further reveal additional pairwise correlations between variables of interest. In this paper, we develop a class of models which incorporates both reciprocal relational information and input attributes using Gaussian process techniques. This approach provides a novel non-parametric Bayesian framework with a data-dependent prior for supervised learning tasks. We also apply this framework to semi-supervised learning. Experimental results on several real world data sets verify the usefulness of this algorithm. [pdf]

S. K. Shevade and W. Chu (2006) Minimum enclosing spheres formulations for support vector ordinal regression, in Proc. of IEEE International Conference on Data Mining (ICDM-06):1054-1058 (View Abstract)

We present two new support vector approaches for ordinal regression. These approaches find the concentric spheres with minimum volume that contain most of the training samples. [pdf]

V. Sindhwani, W. Chu and S. S. Keerthi (2007) Semi-supervised Gaussian process classifiers, in Proc. of International Joint Conferences on Artificial Intelligence (IJCAI-20):1059-1064 (View Abstract)

We consider the problem of utilizing unlabeled data for Gaussian process inference. Using a geometrically motivated data-dependent prior, we propose a graph-based construction of semi-supervised Gaussian processes. We demonstrate this approach empirically on several classification problems. [pdf]

S. S. Keerthi and W. Chu (2005) A matching pursuit approach to sparse Gaussian process regression, in Advances in Neural Information Processing Systems (NIPS-18) (View Abstract)

In this paper, we propose a new basis selection criterion for building sparse GP regression models that provides promising gains in accuracy as well as efficiency over previous methods. Our algorithm is much faster than that of Smola and Bartlett, while, in generalization it greatly outperforms the information gain approach proposed by Seeger et al, especially on the quality of predictive distributions. [ps] [code]

W. Chu and Z. Ghahramani (2005) Preference learning with Gaussian processes, in Proc. of International Conference on Machine Learning (ICML-22):137-144 (View Abstract)

In this paper, we propose a probabilistic kernel approach to preference learning based on Gaussian processes. A new likelihood function is proposed to capture the preference relations in the Bayesian framework. The generalized formulation is also applicable to tackle many multiclass problems. [ps] [code]

W. Chu and S. S. Keerthi (2005) New approaches to support vector ordinal regression, in Proc. of International Conference on Machine Learning (ICML-22):145-152 (View Abstract)

In this paper, we propose two new support vector formulations for ordinal regression, which optimize multiple thresholds to define parallel discriminant hyperplanes for the ordinal scales. Both approaches guarantee that the thresholds are properly ordered at the optimal solution. [ps] [code]

W. Chu and Z. Ghahramani (2005) Gaussian processes for ordinal regression, Journal of Machine Learning Research 6(Jul):1019--1041 (View Abstract)

In this paper, we present a probabilistic approach to ordinal regression in Gaussian processes. In the Bayesian framework of Gaussian processes, we propose a likelihood function for ordinal variables that is a generalization of the probit function. Two inference techniques, based on Laplace approximation and expectation propagation respectively, are applied for model selection. [ps] [code]

W. Chu, Z. Ghahramani, F. Falciani, and D. L. Wild (2005) Biomarker discovery with Gaussian processes in microarray gene expression data, Bioinformatics 2005(21):3385-3393 (View Abstract)

In this paper, we describe a gene selection algorithm based on Gaussian processes to discover consistent gene expression patterns associated with ordinal clinical phenotypes. The technique of automatic relevance determination is applied to represent the significance level of the genes in a Bayesian framework. [pdf] [code]

W. Chu, Z. Ghahramani and D. L. Wild (2004) A graphical model for protein secondary structure prediction, in Proc. of International Conference on Machine Learning (ICML-21):161-168 (View Abstract)

In this paper, we present a graphical model that extends segmental semi-Markov models (SSMM) to exploit multiple sequence alignment profiles for protein structure prediction. A novel parameterized model is proposed as the likelihood function for the SSMM. By incorporating the information from long range interactions in beta-sheets, this model is capable of carrying out inference on contact maps. [pdf] [webserver]

W. Chu, S. S. Keerthi and C. J. Ong (2004) Bayesian support vector regression using a unified loss function, IEEE Transactions on Neural Networks 15(1):29-44 (View Abstract)

In this paper, we use soft insensitive loss function in likelihood evaluation, and describe a Bayesian framework in a stationary Gaussian process. Bayesian methods are used to implement model adaptation, while keeping the merits of support vector regression, such as quadratic programming and sparseness. Moreover, confidence interval is provided in prediction. [code]

CHU,WEI

Short CV

@chuwei.website

@LinkedIn

@Google Scholar

1. Recent Work

2. Publications

3. Source Code

1. Recent Work

P. Bennett, R. White, W. Chu, S. Dumais, P. Bailey, F. Borisyuk and X. Cui (2012) Modeling and measuring the impact of short and long-term behavior on search personalization, ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR-35) (View Abstract)

L. Li, W. Chu, J. Langford, T. Moon, and X. Wang (2012) An unbiased offline evaluation of contextual bandit algorithms with generalized linear models, Journal of Machine Learning Research - Workshop and Conference Proceedings 26 (JMLR W&CP-26) (View Abstract)

W. Chu, M. Zinkevich, L. Li, A. Thomas, and B. Tseng (2011) Unbiased online active learning in data streams, ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD-17) (View Abstract)

T. Moon, W. Chu, L. Li, Z. Zheng, Y. Chang (2012) Online learning framework for refining recency search results with user click feedback, to appear in Transactions on Information Systems (View Abstract)

L. Li, W. Chu, J. Langford and X. Wang (2011) Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms, in Proc. of ACM Web Search and Data Mining (WSDM-04) 297-306 (View Abstract)

L. Li, W. Chu, J. Langford and R. E. Schapire (2010) A contextual-bandit approach to personalized news article recommendation, in Proc. of International World Wide Web Conference (WWW-19) (View Abstract)

W. Chu and Z. Ghahramani (2009) Probabilistic models for incomplete multi-dimensional arrays, in Proc. of International Conference on Artificial Intelligence and Statistics (AISTATS-12) (View Abstract)

W. Chu and S.-T. Park (2009) Personalized recommendation on dynamic content using predictive bilinear models, in Proc. of International World Wide Web Conference (WWW-18) (View Abstract)

S.-T. Park and W. Chu (2009) Pairwise preference regression for cold-start recommendation, in Proc. of ACM Recommender Systems (RecSys-03) (View Abstract)

R. Silva, W. Chu and Z. Ghahramani (2007) Hidden common cause relations in relational learning, in Advances in Neural Information Processing Systems (NIPS-20) (View Abstract)

K. Yu and W. Chu (2007) Gaussian process models for link analysis and transfer learning, in Advances in Neural Information Processing Systems (NIPS-20) (View Abstract)

P. K. Shivaswamy, W. Chu and M. Jansche (2007) A support vector approach to censored targets, in Proc. of IEEE International Conference on Data Mining (ICDM-07) (View Abstract)

W. Chu, V. Sindhwani, Z. Ghahramani and S. S. Keerthi (2006) Relational learning with Gaussian processes, in Advances in Neural Information Processing Systems (NIPS-19) (View Abstract)

S. K. Shevade and W. Chu (2006) Minimum enclosing spheres formulations for support vector ordinal regression, in Proc. of IEEE International Conference on Data Mining (ICDM-06):1054-1058 (View Abstract)

V. Sindhwani, W. Chu and S. S. Keerthi (2007) Semi-supervised Gaussian process classifiers, in Proc. of International Joint Conferences on Artificial Intelligence (IJCAI-20):1059-1064 (View Abstract)

S. S. Keerthi and W. Chu (2005) A matching pursuit approach to sparse Gaussian process regression, in Advances in Neural Information Processing Systems (NIPS-18) (View Abstract)

W. Chu and Z. Ghahramani (2005) Preference learning with Gaussian processes, in Proc. of International Conference on Machine Learning (ICML-22):137-144 (View Abstract)

W. Chu and S. S. Keerthi (2005) New approaches to support vector ordinal regression, in Proc. of International Conference on Machine Learning (ICML-22):145-152 (View Abstract)

W. Chu and Z. Ghahramani (2005) Gaussian processes for ordinal regression, Journal of Machine Learning Research 6(Jul):1019--1041 (View Abstract)

W. Chu, Z. Ghahramani, F. Falciani, and D. L. Wild (2005) Biomarker discovery with Gaussian processes in microarray gene expression data, Bioinformatics 2005(21):3385-3393 (View Abstract)

W. Chu, Z. Ghahramani and D. L. Wild (2004) A graphical model for protein secondary structure prediction, in Proc. of International Conference on Machine Learning (ICML-21):161-168 (View Abstract)

W. Chu, S. S. Keerthi and C. J. Ong (2004) Bayesian support vector regression using a unified loss function, IEEE Transactions on Neural Networks 15(1):29-44 (View Abstract)

3. Source Code

email : email dot chuwei at gmail.com

2017.02.19