CHU WEI's PUBLICATIONS

2. Publications@Google Scholar

Journal Article & Book Chapter

T. Moon, W. Chu, L. Li, Z. Zheng, Y. Chang (2012) Online learning framework for refining recency search results with user click feedback, Transactions on Information Systems 30(4) (View Abstract)

In this paper, we focus on recency search and study a number of algorithms to improve ranking results by leveraging user click feedback. Our contributions are three-fold. First, we use real search sessions collected in a random exploration bucket for \emph{reliable} offline evaluation of these algorithms, which provides an unbiased comparison across algorithms without online bucket tests. Second, we propose a re-ranking approach to improve search results for recency queries using user clicks. Third, our empirical comparison of a dozen algorithms on real-life search data suggests importance of a few algorithmic choices in these applications, including generalization across different query-document pairs, specialization to popular queries, and real-time adaptation of user clicks. [pdf]
W. Chu and S. S. Keerthi (2007) Support vector ordinal regression, Neural Computation 19(3):792-815 (View Abstract)

In this paper, we propose two new support vector formulations for ordinal regression, which optimize multiple thresholds to define parallel discriminant hyperplanes for the ordinal scales. Both approaches guarantee that the thresholds are properly ordered at the optimal solution.
W. Chu, Z. Ghahramani, A. Podtelezhnikov and D. L. Wild (2006) Bayesian segmental models with multiple sequence alignment profiles for protein secondary structure and contact map prediction, IEEE/ACM Transactions on Computational Biology and Bioinformatics 3(2):98-113 (View Abstract)

In this paper, we develop a segmental semi-Markov model (SSMM) for protein secondary structure prediction which incorporates multiple sequence alignment profiles with the purpose of improving the predictive performance. By incorporating the information from long range interactions in beta-sheets, this model is also capable of carrying out inference on contact maps. [ps][supplement]
W. Chu, Z. Ghahramani, F. Falciani and D. L. Wild (2005) Biomarker discovery with Gaussian processes in microarray gene expression data, Bioinformatics 2005(21):3385-3393 (View Abstract)

In this paper, we describe a gene selection algorithm based on Gaussian processes to discover consistent gene expression patterns associated with ordinal clinical phenotypes. The technique of automatic relevance determination is applied to represent the significance level of the genes in a Bayesian framework. [pdf] [ps] [code]
W. Chu and Z. Ghahramani (2005) Gaussian processes for ordinal regression, Journal of Machine Learning Research 6(Jul):1019-1041 (View Abstract)

In this paper, we present a probabilistic approach to ordinal regression in Gaussian processes. In the Bayesian framework of Gaussian processes, we propose a likelihood function for ordinal variables that is a generalization of the probit function. Two inference techniques, based on Laplace approximation and expectation propagation respectively, are applied for model selection. [pdf] [ps] [zip] [code]
W. Chu, C. J. Ong and S. S. Keerthi (2005) An improved conjugate gradient scheme to the solution of least squares SVM, IEEE Transactions on Neural Networks 16(2):498-501 (View Abstract)

In this paper, we propose an improved method to the numerical solution of LS-SVM. Compared with the existing algorithm (Suykens et al, 1999) for LS-SVM, our approach is about twice as efficient.
W. Chu, S. S. Keerthi and C. J. Ong (2004) Bayesian support vector regression using a unified loss function, IEEE Transactions on Neural Networks 15(1):29-44 (View Abstract)

In this paper, we use soft insensitive loss function in likelihood evaluation, and describe a Bayesian framework in a stationary Gaussian process. Bayesian methods are used to implement model adaptation, while keeping the merits of support vector regression, such as quadratic programming and sparseness. Moreover, confidence interval is provided in prediction. [pdf] [ps] [zip] [code]
W. Chu, S. S. Keerthi and C. J. Ong (2003) Bayesian trigonometric support vector classifier, Neural Computation 15(9):2227-2254 (View Abstract)

In this paper, we propose Bayesian support vector classifier by introducing a novel likelihood function, known as trigonometric likelihood function. Model adaptation and ARD feature selection could be implemented intrinsically in hyperparameter inference. Another benefit is the class probability in making predictions. [pdf] [code]
W. Chu, S. S. Keerthi, C. J. Ong and Z. Ghahramani (2006) Bayesian support vector machines for feature ranking and selection, In I. Guyon, S. Gunn, M. Nikravesh, and L. Zadeh, editors, Feature Extraction, Foundations and Applications Springer:403-418
K. Duan, S. S. Keerthi, W. Chu, S. K. Shevade and A. N. Poo (2003) Multi-category classification by soft-max combination of binary classifiers, Multiple Classifier Systems (MCS-04) Lecture Notes in Computer Science 2709 Springer:125-134

Refereed Conference

B. Bi, H. Ma, B. Hsu, W. Chu, K. Wang and J. Cho (2015) Learning to recommend related entities to search users, ACM International Conference on Web Search and Data Mining (WSDM-08) (View Abstract)

Over the past few years, major web search engines have introduced knowledge bases to offer popular facts about people, places, and things on the entity pane next to regular search results. In addition to information about the entity searched by the user, the entity pane often provides a ranked list of related entities. To keep users engaged, it is important to develop a recommendation model that tailors the related entities to individual user interests. We propose a probabilistic Three-way Entity Model (TEM) that provides personalized recommendation of related entities using three data sources: knowledge base, search click log, and entity pane log. Specifically, TEM is capable of extracting hidden structures and capturing underlying correlations among users, main entities, and related entities. Moreover, the TEM model can also exploit the click signals derived from the entity pane log. We further provide an inference technique to learn the parameters in TEM, and propose a principled preference learning method specifically designed for ranking related entities. Extensive experiments with two real-world datasets show that TEM with our probabilistic framework significantly outperforms a state of the art baseline, confirming the effectiveness of TEM and our probabilistic framework in related entity recommendation.
J. Yan, W. Chu, R. W. White (2014) Cohort modeling for enhanced personalized search, ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR-37) (View Abstract)

Web search engines utilize behavioral signals to develop search experiences tailored to individual users. To be effective, such personalization relies on access to sufficient information about each user's interests and intentions. For new users or new queries, profile information may be sparse or non-existent. To handle these cases, and perhaps also improve personalization for those with profiles, search engines can employ signals from users who are similar along one or more dimensions, i.e., those in the same cohort. In this paper we describe a characterization and evaluation of the use of such cohort modeling to enhance search personalization. We experiment with three pre-defined cohorts-topic, location, and top-level domain preference-independently and in combination, and also evaluate methods to learn cohorts dynamically. We show via extensive experimentation with large-scale logs from a commercial search engine that leveraging cohort behavior can yield significant relevance gains when combined with a production search engine ranking algorithm that uses similar classes of personalization signal but at the individual searcher level. Additional experiments show that our gains can be extended when we dynamically learn cohorts and target easily-identifiable classes of ambiguous or unseen queries.
H. Wang, X. He, M. Chang, Y. Song, R. W. White, W. Chu (2013) Personalized ranking model adaptation for web search, ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR-36) (View Abstract)

In this paper, we propose a general ranking model adaptation framework for personalized search. Using a given user-independent ranking model trained offline and limited number of adaptation queries from individual users, the framework quickly learns to apply a series of linear transformations, e.g., scaling and shifting, over the parameters of the given global ranking model such that the adapted model can better fit each individual user's search preferences. Extensive experimentation based on a large set of search logs from a major commercial Web search engine confirms the effectiveness of the proposed method compared to several state-of-the-art ranking model adaptation methods.
R. W. White, W. Chu, A. Hassan, X. He, Y. Song, H. Wang (2013) Enhancing personalized search by mining and modeling task behavior, International World Wide Web Conference (WWW-22) (View Abstract)

Personalized search systems tailor search results to the current user intent using historic search interactions. This relies on being able to find pertinent information in that user's search history, which can be challenging for unseen queries and for new search scenarios. Building richer models of users' current and historic search tasks can help improve the likelihood of finding relevant content and enhance the relevance and coverage of personalization methods. The task-based approach can be applied to the current user's search history, or as we focus on here, all users' search histories as so-called "groupization" (a variant of personalization whereby other users' profiles can be used to personalize the search experience). We describe a method whereby we mine historic search-engine logs to find other users performing similar tasks to the current user and leverage their on-task behavior to identify Web pages to promote in the current ranking. We investigate the effectiveness of this approach versus query-based matching and finding related historic activity from the current user (i.e., group versus individual). As part of our studies we also explore the use of the on-task behavior of particular user cohorts, such as people who are expert in the topic currently being searched, rather than all other users. Our approach yields promising gains in retrieval performance, and has direct implications for improving personalization in search systems.
H. Wang, Y. Song, M. Chang, X. He, R. W. White, W. Chu (2013) Learning to extract cross-session search tasks, International World Wide Web Conference (WWW-22) (View Abstract)

Search tasks, comprising a series of search queries serving the same information need, have recently been recognized as an accurate atomic unit for modeling user search intent. Most prior research in this area has focused on short-term search tasks within a single search session, and heavily depend on human annotations for supervised classification model learning. In this work, we target the identification of long-term, or cross-session, search tasks (transcending session boundaries) by investigating inter-query dependencies learned from users' searching behaviors. A semi-supervised clustering model is proposed based on the latent structural SVM framework, and a set of effective automatic annotation rules are proposed as weak supervision to release the burden of manual annotation. Experimental results based on a large-scale search log collected from Bing.com confirms the effectiveness of the proposed model in identifying cross-session search tasks and the utility of the introduced weak supervision signals. Our learned model enables a more comprehensive understanding of users' search behaviors via search logs and facilitates the development of dedicated search-engine support for long-term tasks.
P. Bennett, R. W. White, W. Chu, S. Dumais, P. Bailey, F. Borisyuk and X. Cui (2012) Modeling and measuring the impact of short and long-term behavior on search personalization, ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR-35) (View Abstract)

User behavior provides many cues to improve the relevance of search results through personalization. One aspect of user behavior that provides especially strong signals for delivering better relevance is an individual�s history of queries and clicked docu-ments. Previous studies have explored how short-term behavior or long-term behavior can be predictive of relevance. Ours is the first study to assess how short-term (session) behavior and long-term (historic) behavior interact, and how each may be used in isolation or in combination to optimally contribute to gains in relevance through search personalization.[pdf]
W. Chu, M. Zinkevich, L. Li, A. Thomas, and B. Tseng (2011) Unbiased online active learning in data streams, ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD-17) (View Abstract)

Unlabeled samples can be intelligently selected for labeling to minimize classification error. In many real-world applications, a large number of unlabeled samples arrive in a streaming manner, making it impossible to maintain all the data in a candidate pool. In this work, we consider the unbiasedness property in the sampling process, and design optimal instrumental distributions to minimize the variance in the stochastic process. Meanwhile, Bayesian linear classifiers with weighted maximum likelihood are optimized online to estimate parameters. [pdf]
L. Zhang, J. Yang, W. Chu, and B. Tseng (2011) A machine-learned proactive moderation system for auction fraud detection, ACM Conference on Information Retrieval and Knowledge Management (CIKM-20 Short Paper) (View Abstract)

Online auction and shopping are gaining popularity with the growth of web-based eCommerce. Criminals are also taking advantage of these opportunities to conduct fraudulent activities against honest parties with the purpose of deception and illegal profit. In practice, proactive moderation systems are deployed to detect suspicious events for further inspection by human experts. Motivated by real-world applications in commercial auction sites in Asia, we develop various advanced machine learning techniques in the proactive moderation system. [pdf]
L. Li, W. Chu, J. Langford and X. Wang (2011) Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms, ACM International Conference on Web Search and Data Mining (WSDM-04) 297-306 (View Abstract) Winner of the Best Paper Award

In this paper, we introduce a replay methodology for contextual bandit algorithm evaluation. Different from simulator-based approaches, our method is completely data-driven and very easy to adapt to different applications. More importantly, our method can provide provably unbiased evaluations. Our empirical results on a large-scale news article recommendation dataset collected from Yahoo! Front Page conform well with our theoretical results. Furthermore, comparisons between our offline replay and online bucket evaluation of several contextual bandit algorithms show accuracy and effectiveness of our offline evaluation method. [pdf]
W. Chu, L. Li, L. Reyzin, and R. E. Schapire (2011) Contextual bandits with linear payoff functions, International Conference on Artificial Intelligence and Statistics (AISTATS-14) (View Abstract)

In this paper we study the contextual ban- dit problem (also known as the multi-armed bandit problem with expert advice) for linear payo. functions. we prove a high-probability regret upper bound. We also prove a lower bound for this setting, matching the upper bound up to logarithmic factors. [pdf]
T. Moon, L. Li, W. Chu, C. Liao, Z. Zheng and Y. Chang (2010) Online learning for recency search ranking using real-time user feedback, International Conference on Information and Knowledge Management (CIKM-19 Short Paper) 1501-1504 (View Abstract)

In this paper, we propose an online learning algorithm that can quickly learn the best re- ranking of the top portion of the original ranked list based on real-time users' click feedback. In order to devise our al- gorithm and evaluate it accurately, we collected exploration bucket data that removes positional biases on clicks on the documents for recency-classi.ed queries. Our initial exper- imental result shows that our scheme is more capable of quickly adjusting the ranking to track the varying relevance of documents re ected in the click feedback, compared to batch-trained ranking functions. [pdf]
L. Li, W. Chu, J. Langford and R. E. Schapire (2010) A contextual-bandit approach to personalized news article recommendation, International World Wide Web Conference (WWW-19) 661-670 (View Abstract)

Personalized web services strive to adapt their services (advertisements, news articles, etc.) to individual users by making use of both content and user information. Despite a few recent advances, this problem remains challenging for at least two reasons. First, web service is featured with dynamically changing pools of content, rendering traditional collaborative filtering methods inapplicable. Second, the scale of most web services of practical interest calls for solutions that are both fast in learning and computation. In this work, we model personalized recommendation of news articles as a contextual bandit problem, a principled approach in which a learning algorithm sequentially selects articles to serve users based on contextual information about the users and articles, while simultaneously adapting its article-selection strategy based on user-click feedback to maximize total user clicks. [pdf]
S.-T. Park and W. Chu (2009) Pairwise preference regression for cold-start recommendation, ACM Recommender Systems (RecSys-03):21-28 (View Abstract)

Recommender systems are widely used in online e-commerce applications to improve user engagement and then to increase revenue. A key challenge for recommender systems is providing high quality recommendation to users in ``cold-start" situations. We consider three types of cold-start problems: 1) recommendation on existing items for new users; 2) recommendation on new items for existing users; 3) recommendation on new items for new users. We propose predictive feature-based regression models that leverage all available information of users and items, such as user demographic information and item content features, to tackle cold-start problems. The resulting algorithms scale efficiently as a linear function of the number of observations. We verify the usefulness of our approach in three cold-start settings on the MovieLens and EachMovie datasets, by comparing with five alternatives including random, most popular, segmented most popular, and two variations of Vibes affinity algorithm widely used at Yahoo! for recommendation.
W. Chu and Z. Ghahramani (2009) Probabilistic models for incomplete multi-dimensional arrays, International Conference on Artificial Intelligence and Statistics (AISTATS-12):89-96 (View Abstract)

In multiway data, each sample is measured by multiple sets of correlated attributes. We develop a probabilistic framework for modeling structural dependency from partially observed multi-dimensional array data, known as pTucker. Latent components associated with individual array dimensions are jointly retrieved while the core tensor is integrated out. The resulting algorithm is capable of handling large-scale data sets. We verify the usefulness of this approach by comparing against classical models on applications to modeling amino acid fluorescence, collaborative filtering and a number of benchmark multiway array data. [pdf] [third-party pTucker code]
W. Chu and S.-T. Park (2009) Personalized recommendation on dynamic content using predictive bilinear models, International World Wide Web Conference (WWW-18):692-700 (View Abstract)

In Web-based services of dynamic content (such as news articles), recommender systems face the difficulty of timely identifying new items of high-quality and providing recommendations for new users. We propose a feature-based machine learning approach to personalized recommendation that is capable of handling the cold-start issue effectively. We maintain profiles of content of interest, in which temporal characteristics of the content, e.g. popularity and freshness, are updated in real-time manner. We also maintain profiles of users including demographic information and a summary of user activities within Yahoo! properties. Based on all features in user and content profiles, we develop predictive bilinear regression models to provide accurate personalized recommendations of new items for both existing and new users. This approach results in an offline model with light computational overhead compared with other recommender systems that require online re-training. The proposed framework is general and flexible for other personalized tasks. The superior performance of our approach is verified on a large-scale data set collected from the Today-Module on Yahoo! Front Page, with comparison against six competitive approaches. [pdf] [slides]
W. Chu, et al. (2009) A case study of behavior-driven conjoint analysis on Yahoo! Front Page Today Module, ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD-15 Industry Track):1097-1104 (View Abstract)

In this paper, we report a successful large-scale case study of conjoint analysis on click through stream in a real-world application at Yahoo!. We consider identifying users�� heterogenous preferences from millions of click/view events and building predictive models to classify new users into segments of distinct behavior pattern. A scalable conjoint analysis technique, known as tensor segmentation, is developed by utilizing logistic tensor regression in standard partworth framework for solutions. [pdf]
R. Silva, W. Chu and Z. Ghahramani (2007) Hidden common cause relations in relational learning, Neural Information Processing Systems (NIPS-20):1345-1352 (View Abstract)

We consider the case when relationships are postulated to exist due to hidden common causes. We discuss how the resulting graphical model differs from Markov networks, and how it describes different types of real-world relational processes. A Bayesian nonparametric classification model is built upon this graphical representation and evaluated with several empirical studies. GOTO Ricardo Silva's homepage for [pdf], [data] and [code]
K. Yu and W. Chu (2007) Gaussian process models for link analysis and transfer learning, Neural Information Processing Systems (NIPS-20):1657-1664 (View Abstract)

In this paper we model relational random variables on the edges of a network using Gaussian processes (GPs). We describe appropriate GP priors, i.e., covariance functions, for directed and undirected networks connecting homogeneous or heterogenous nodes. The framework suggests an intimate connection between link prediction and transfer learning, which were traditionally two separate topics. [pdf]
P. K. Shivaswamy, W. Chu and M. Jansche (2007) A support vector approach to censored targets, IEEE International Conference on Data Mining (ICDM-07):655-660 (View Abstract)

Censored targets, such as the time to events in survival analysis, can generally be represented by intervals on the real line. In this paper, we propose a novel support vector technique (named SVCR) for regression on censored targets. Interestingly, this approach provides a general formulation for both standard regression and binary classification tasks. [pdf] [longer version]
V. Sindhwani, W. Chu and S. S. Keerthi (2007) Semi-supervised Gaussian process classifiers, International Joint Conferences on Artificial Intelligence (IJCAI-20):1059-1064 (View Abstract)

We consider the problem of utilizing unlabeled data for Gaussian process inference. Using a geometrically motivated data-dependent prior, we propose a graph-based construction of semi-supervised Gaussian processes. We demonstrate this approach empirically on several classification problems. [pdf]
W. Chu, V. Sindhwani, Z. Ghahramani and S. S. Keerthi (2006) Relational learning with Gaussian processes, Neural Information Processing Systems (NIPS-19):289-296 (View Abstract)

Correlation between instances is often modelled via a kernel function using input attributes of the instances. Relational knowledge can further reveal additional pairwise correlations between variables of interest. In this paper, we develop a class of models which incorporates both reciprocal relational information and input attributes using Gaussian process techniques. This approach provides a novel non-parametric Bayesian framework with a data-dependent prior for supervised learning tasks. We also apply this framework to semi-supervised learning. Experimental results on several real world data sets verify the usefulness of this algorithm. [pdf]
K. Yu, W. Chu, S. Yu, V. Tresp and Z. Xu (2006) Stochastic relational models for discriminative link prediction, Neural Information Processing Systems (NIPS-19):1553-1560 (View Abstract)

We introduce a Gaussian process (GP) framework, stochastic relational models (SRM), for learning social, physical, and other relational phenomena where interactions between entities are observed. The key idea is to model the stochastic structure of entity relationships (i.e., links) via an interplay of multiple GPs, each defined on one type of entities. [pdf]
S. K. Shevade and W. Chu (2006) Minimum enclosing spheres formulations for support vector ordinal regression, IEEE International Conference on Data Mining (ICDM-06):1054-1058 (View Abstract)

We present two new support vector approaches for ordinal regression. These approaches find the concentric spheres with minimum volume that contain most of the training samples. [pdf]
W. Chu, Z. Ghahramani, R. Krause and D. L. Wild (2006) Identifying protein complexes in high-throughput protein interaction screens using an infinite latent feature model, Pacific Symposium on Biocomputing (PSB-11):231-242 (View Abstract)

We propose a Bayesian approach to identify protein complexes and their constituents from high-throughput protein-protein interaction screens. An infinite latent feature model that allows for multi-complex membership by individual proteins is coupled with a graph diffusion kernel that evaluates the likelihood of two proteins belonging to the same complex. Gibbs sampling is then used to infer a catalog of protein complexes from the interaction screen data. An advantage of this model is that it places no prior constraints on the number of complexes and automatically infers the number of significant complexes from the data. Validation results using affinity purification/mass spectrometry experimental data from yeast RNA-processing complexes indicate that our method is capable of partitioning the data in a biologically meaningful way.
S. S. Keerthi and W. Chu (2005) A matching pursuit approach to sparse Gaussian process regression, Neural Information Processing Systems (NIPS-18):643-650 (View Abstract)

In this paper, we propose a new basis selection criterion for building sparse GP regression models that provides promising gains in accuracy as well as efficiency over previous methods. Our algorithm is much faster than that of Smola and Bartlett, while, in generalization it greatly outperforms the information gain approach proposed by Seeger et al, especially on the quality of predictive distributions. [ps] [code]
W. Chu and Z. Ghahramani (2005) Preference learning with Gaussian processes, International Conference on Machine Learning (ICML-22):137-144 (View Abstract)

In this paper, we propose a probabilistic kernel approach to preference learning based on Gaussian processes. A new likelihood function is proposed to capture the preference relations in the Bayesian framework. The generalized formulation is also applicable to tackle many multiclass problems. [pdf] [ps] [zip] [code]
W. Chu and S. S. Keerthi (2005) New approaches to support vector ordinal regression, International Conference on Machine Learning (ICML-22):145-152 (View Abstract)

In this paper, we propose two new support vector formulations for ordinal regression, which optimize multiple thresholds to define parallel discriminant hyperplanes for the ordinal scales. Both approaches guarantee that the thresholds are properly ordered at the optimal solution. [pdf] [ps] [zip] [code]
W. Chu, Z. Ghahramani and D. L. Wild (2004) A graphical model for protein secondary structure prediction, International Conference on Machine Learning (ICML-21):161-168 (View Abstract)

In this paper, we present a graphical model that extends segmental semi-Markov models (SSMM) to exploit multiple sequence alignment profiles for protein structure prediction. A novel parameterized model is proposed as the likelihood function for the SSMM. By incorporating the information from long range interactions in beta-sheets, this model is capable of carrying out inference on contact maps. [pdf] [ps] [zip] [webserver]
W. Chu, Z. Ghahramani and D. L. Wild (2004) Protein secondary structure prediction using sigmoid belief networks to parameterize segmental semi-Markov models, European Symposium on Artificial Neural Networks (ESANN-05):81-86
W. Chu, S. S. Keerthi and C. J. Ong (2002) A general formulation for support vector machines, International Conference on Neural Information Processing (ICONIP-09)
W. Chu, S. S. Keerthi and C. J. Ong (2002) A new Bayesian design method for support vector classification, International Conference on Neural Information Processing (ICONIP-09)
W. Chu, S. S. Keerthi and C. J. Ong (2001) A unified loss function in Bayesian framework for support vector regression, International Conference on Machine Learning (ICML-18):51-58

Refereed Workshop

W. Chu and Z. Ghahramani (2005) Extensions of Gaussian processes for ranking: semi-supervised and active learning, Workshop Learning to Rank at (NIPS-18):29-34 (View Abstract)

Unlabelled examples in supervised learning tasks can be optimally exploited using semi-supervised methods and active learning. We focus on ranking learning from pairwise instance preference to discuss these important extensions, semi-supervised learning and active learning, in the probabilistic framework of Gaussian processes. [ps]
S. S. Keerthi, et al. (2002) A machine learning approach for the curation of Biomedical literature - KDD Cup 2002 (Task 1), SIGKDD Explorations Newsletter, 4(2)
W. Chu (2006) Model selection: an empirical study on two kernel classifiers, International Joint Conference on Neural Networks (IJCNN-06):1673-1679
L. Li, W. Chu, J. Langford, T. Moon, and X. Wang (2012) An unbiased offline evaluation of contextual bandit algorithms with generalized linear models, Journal of Machine Learning Research - Workshop and Conference Proceedings 26 (JMLR W&CP-26) (View Abstract)

Contextual bandit algorithms have become popular tools in online recommendation and advertising systems. Offline evaluation of the effectiveness of new algorithms in these applications is critical for protecting online user experiences but very challenging due to their ``partial-label'' nature. The purpose of this paper is two-fold. First, we review a recently proposed offline evaluation technique. Different from simulator-based approaches, the method is completely data-driven, is easy to adapt to different applications, and more importantly, provides provably unbiased evaluations. We argue for the wide use of this technique as standard practice when comparing bandit algorithms in real-life problems. Second, as an application of this technique, we compare and validate a number of new algorithms based on generalized linear models. Experiments using real Yahoo! data suggest substantial improvement over algorithms with linear models when the rewards are binary. [pdf]

Thesis

W. Chu (2003) Bayesian approach to support vector machines, Doctoral Dissertation, National University of Singapore (View Abstract)

In this thesis, we develop Bayesian support vector machines for regression and classification. This work can also be regarded as support vector variants of Gaussian processes. [pdf] [zip] [code]

2. Publications@Google Scholar

Journal Article & Book Chapter

T. Moon, W. Chu, L. Li, Z. Zheng, Y. Chang (2012) Online learning framework for refining recency search results with user click feedback, Transactions on Information Systems 30(4) (View Abstract)

W. Chu and S. S. Keerthi (2007) Support vector ordinal regression, Neural Computation 19(3):792-815 (View Abstract)

W. Chu, Z. Ghahramani, A. Podtelezhnikov and D. L. Wild (2006) Bayesian segmental models with multiple sequence alignment profiles for protein secondary structure and contact map prediction, IEEE/ACM Transactions on Computational Biology and Bioinformatics 3(2):98-113 (View Abstract)

W. Chu, Z. Ghahramani, F. Falciani and D. L. Wild (2005) Biomarker discovery with Gaussian processes in microarray gene expression data, Bioinformatics 2005(21):3385-3393 (View Abstract)

W. Chu and Z. Ghahramani (2005) Gaussian processes for ordinal regression, Journal of Machine Learning Research 6(Jul):1019-1041 (View Abstract)

W. Chu, C. J. Ong and S. S. Keerthi (2005) An improved conjugate gradient scheme to the solution of least squares SVM, IEEE Transactions on Neural Networks 16(2):498-501 (View Abstract)

W. Chu, S. S. Keerthi and C. J. Ong (2004) Bayesian support vector regression using a unified loss function, IEEE Transactions on Neural Networks 15(1):29-44 (View Abstract)

W. Chu, S. S. Keerthi and C. J. Ong (2003) Bayesian trigonometric support vector classifier, Neural Computation 15(9):2227-2254 (View Abstract)

W. Chu, S. S. Keerthi, C. J. Ong and Z. Ghahramani (2006) Bayesian support vector machines for feature ranking and selection, In I. Guyon, S. Gunn, M. Nikravesh, and L. Zadeh, editors, Feature Extraction, Foundations and Applications Springer:403-418

K. Duan, S. S. Keerthi, W. Chu, S. K. Shevade and A. N. Poo (2003) Multi-category classification by soft-max combination of binary classifiers, Multiple Classifier Systems (MCS-04) Lecture Notes in Computer Science 2709 Springer:125-134

Refereed Conference

B. Bi, H. Ma, B. Hsu, W. Chu, K. Wang and J. Cho (2015) Learning to recommend related entities to search users, ACM International Conference on Web Search and Data Mining (WSDM-08) (View Abstract)

J. Yan, W. Chu, R. W. White (2014) Cohort modeling for enhanced personalized search, ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR-37) (View Abstract)

H. Wang, X. He, M. Chang, Y. Song, R. W. White, W. Chu (2013) Personalized ranking model adaptation for web search, ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR-36) (View Abstract)

R. W. White, W. Chu, A. Hassan, X. He, Y. Song, H. Wang (2013) Enhancing personalized search by mining and modeling task behavior, International World Wide Web Conference (WWW-22) (View Abstract)

H. Wang, Y. Song, M. Chang, X. He, R. W. White, W. Chu (2013) Learning to extract cross-session search tasks, International World Wide Web Conference (WWW-22) (View Abstract)

P. Bennett, R. W. White, W. Chu, S. Dumais, P. Bailey, F. Borisyuk and X. Cui (2012) Modeling and measuring the impact of short and long-term behavior on search personalization, ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR-35) (View Abstract)

W. Chu, M. Zinkevich, L. Li, A. Thomas, and B. Tseng (2011) Unbiased online active learning in data streams, ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD-17) (View Abstract)

L. Zhang, J. Yang, W. Chu, and B. Tseng (2011) A machine-learned proactive moderation system for auction fraud detection, ACM Conference on Information Retrieval and Knowledge Management (CIKM-20 Short Paper) (View Abstract)

L. Li, W. Chu, J. Langford and X. Wang (2011) Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms, ACM International Conference on Web Search and Data Mining (WSDM-04) 297-306 (View Abstract) Winner of the Best Paper Award

W. Chu, L. Li, L. Reyzin, and R. E. Schapire (2011) Contextual bandits with linear payoff functions, International Conference on Artificial Intelligence and Statistics (AISTATS-14) (View Abstract)

T. Moon, L. Li, W. Chu, C. Liao, Z. Zheng and Y. Chang (2010) Online learning for recency search ranking using real-time user feedback, International Conference on Information and Knowledge Management (CIKM-19 Short Paper) 1501-1504 (View Abstract)

L. Li, W. Chu, J. Langford and R. E. Schapire (2010) A contextual-bandit approach to personalized news article recommendation, International World Wide Web Conference (WWW-19) 661-670 (View Abstract)

S.-T. Park and W. Chu (2009) Pairwise preference regression for cold-start recommendation, ACM Recommender Systems (RecSys-03):21-28 (View Abstract)

W. Chu and Z. Ghahramani (2009) Probabilistic models for incomplete multi-dimensional arrays, International Conference on Artificial Intelligence and Statistics (AISTATS-12):89-96 (View Abstract)

W. Chu and S.-T. Park (2009) Personalized recommendation on dynamic content using predictive bilinear models, International World Wide Web Conference (WWW-18):692-700 (View Abstract)

W. Chu, et al. (2009) A case study of behavior-driven conjoint analysis on Yahoo! Front Page Today Module, ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD-15 Industry Track):1097-1104 (View Abstract)

R. Silva, W. Chu and Z. Ghahramani (2007) Hidden common cause relations in relational learning, Neural Information Processing Systems (NIPS-20):1345-1352 (View Abstract)

K. Yu and W. Chu (2007) Gaussian process models for link analysis and transfer learning, Neural Information Processing Systems (NIPS-20):1657-1664 (View Abstract)

P. K. Shivaswamy, W. Chu and M. Jansche (2007) A support vector approach to censored targets, IEEE International Conference on Data Mining (ICDM-07):655-660 (View Abstract)

V. Sindhwani, W. Chu and S. S. Keerthi (2007) Semi-supervised Gaussian process classifiers, International Joint Conferences on Artificial Intelligence (IJCAI-20):1059-1064 (View Abstract)

W. Chu, V. Sindhwani, Z. Ghahramani and S. S. Keerthi (2006) Relational learning with Gaussian processes, Neural Information Processing Systems (NIPS-19):289-296 (View Abstract)

K. Yu, W. Chu, S. Yu, V. Tresp and Z. Xu (2006) Stochastic relational models for discriminative link prediction, Neural Information Processing Systems (NIPS-19):1553-1560 (View Abstract)

S. K. Shevade and W. Chu (2006) Minimum enclosing spheres formulations for support vector ordinal regression, IEEE International Conference on Data Mining (ICDM-06):1054-1058 (View Abstract)

W. Chu, Z. Ghahramani, R. Krause and D. L. Wild (2006) Identifying protein complexes in high-throughput protein interaction screens using an infinite latent feature model, Pacific Symposium on Biocomputing (PSB-11):231-242 (View Abstract)

S. S. Keerthi and W. Chu (2005) A matching pursuit approach to sparse Gaussian process regression, Neural Information Processing Systems (NIPS-18):643-650 (View Abstract)

W. Chu and Z. Ghahramani (2005) Preference learning with Gaussian processes, International Conference on Machine Learning (ICML-22):137-144 (View Abstract)

W. Chu and S. S. Keerthi (2005) New approaches to support vector ordinal regression, International Conference on Machine Learning (ICML-22):145-152 (View Abstract)

W. Chu, Z. Ghahramani and D. L. Wild (2004) A graphical model for protein secondary structure prediction, International Conference on Machine Learning (ICML-21):161-168 (View Abstract)

W. Chu, Z. Ghahramani and D. L. Wild (2004) Protein secondary structure prediction using sigmoid belief networks to parameterize segmental semi-Markov models, European Symposium on Artificial Neural Networks (ESANN-05):81-86

W. Chu, S. S. Keerthi and C. J. Ong (2002) A general formulation for support vector machines, International Conference on Neural Information Processing (ICONIP-09)

W. Chu, S. S. Keerthi and C. J. Ong (2002) A new Bayesian design method for support vector classification, International Conference on Neural Information Processing (ICONIP-09)

W. Chu, S. S. Keerthi and C. J. Ong (2001) A unified loss function in Bayesian framework for support vector regression, International Conference on Machine Learning (ICML-18):51-58

Refereed Workshop

W. Chu and Z. Ghahramani (2005) Extensions of Gaussian processes for ranking: semi-supervised and active learning, Workshop Learning to Rank at (NIPS-18):29-34 (View Abstract)

S. S. Keerthi, et al. (2002) A machine learning approach for the curation of Biomedical literature - KDD Cup 2002 (Task 1), SIGKDD Explorations Newsletter, 4(2)

W. Chu (2006) Model selection: an empirical study on two kernel classifiers, International Joint Conference on Neural Networks (IJCNN-06):1673-1679

L. Li, W. Chu, J. Langford, T. Moon, and X. Wang (2012) An unbiased offline evaluation of contextual bandit algorithms with generalized linear models, Journal of Machine Learning Research - Workshop and Conference Proceedings 26 (JMLR W&CP-26) (View Abstract)

Thesis

W. Chu (2003) Bayesian approach to support vector machines, Doctoral Dissertation, National University of Singapore (View Abstract)

email : email dot chuwei at gmail.com

2017.02.19