Modeling Online Auctions - Wolfgang Jank - E-Book

Modeling Online Auctions E-Book

Wolfgang Jank

0,0
119,99 €

oder
-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Explore cutting-edge statistical methodologies for collecting, analyzing, and modeling online auction data Online auctions are an increasingly important marketplace, as the new mechanisms and formats underlying these auctions have enabled the capturing and recording of large amounts of bidding data that are used to make important business decisions. As a result, new statistical ideas and innovation are needed to understand bidders, sellers, and prices. Combining methodologies from the fields of statistics, data mining, information systems, and economics, Modeling Online Auctions introduces a new approach to identifying obstacles and asking new questions using online auction data. The authors draw upon their extensive experience to introduce the latest methods for extracting new knowledge from online auction data. Rather than approach the topic from the traditional game-theoretic perspective, the book treats the online auction mechanism as a data generator, outlining methods to collect, explore, model, and forecast data. Topics covered include: * Data collection methods for online auctions and related issues that arise in drawing data samples from a Web site * Models for bidder and bid arrivals, treating the different approaches for exploring bidder-seller networks * Data exploration, such as integration of time series and cross-sectional information; curve clustering; semi-continuous data structures; and data hierarchies * The use of functional regression as well as functional differential equation models, spatial models, and stochastic models for capturing relationships in auction data * Specialized methods and models for forecasting auction prices and their applications in automated bidding decision rule systems Throughout the book, R and MATLAB software are used for illustrating the discussed techniques. In addition, a related Web site features many of the book's datasets and R and MATLAB code that allow readers to replicate the analyses and learn new methods to apply to their own research. Modeling Online Auctions is a valuable book for graduate-level courses on data mining and applied regression analysis. It is also a one-of-a-kind reference for researchers in the fields of statistics, information systems, business, and marketing who work with electronic data and are looking for new approaches for understanding online auctions and processes. Visit this book's companion website by clicking href="http://modelingonlineauctions.com/">here

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 578

Veröffentlichungsjahr: 2010

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Contents

Cover

Statistics in Practice

Title Page

Copyright

Dedication

Preface

Acknowledgments

Chapter 1: Introduction

1.1 Online Auctions and Electronic Commerce

1.2 Online Auctions and Statistical Challenges

1.3 A Statistical Approach to Online Auction Research

1.4 The Structure of this Book

1.5 Data and Code Availability

Bibliography

Chapter 2: Obtaining Online Auction Data

2.1 Collecting Data from the Web

2.2 Web Data Collection and Statistical Sampling

Bibliography

Chapter 3: Exploring Online Auction Data

3.1 Bid Histories: Bids versus “Current Price” Values

3.2 Integrating Bid History Data with Cross-sectional Auction Information

3.3 Visualizing Concurrent Auctions

3.4 Exploring Price Evolution and Price Dynamics

3.5 Combining Price Curves with Auction Information via Interactive Visualization

3.6 Exploring Hierarchical Information

3.7 Exploring Price Dynamics via Curve Clustering

3.8 Exploring Distributional Assumptions

3.9 Exploring Online Auctions: Future Research Directions

Bibliography

Chapter 4: Modeling Online Auction Data

4.1 Modeling Basics (Representing the Price Process)

4.2 Modeling The Relation Between Price Dynamics and Auction Information

4.3 Modeling Auction Competition

4.4 Modeling Bid and Bidder Arrivals

4.5 Modeling Auction Networks

Conclusion

Bibliography

Chapter 5: Forecasting Online Auctions

5.1 Forecasting Individual Auctions

5.2 Forecasting Competing Auctions

5.3 Automated Bidding Decisions

Bibliography

Bibliography

Color Plates

Index

Statistics in Practice

Advisory Editor

Marian Scott

University of Glasgow, UK

Founding Editor

Vic Barnett

Nottingham Trent University, UK

The texts in the series provide detailed coverage of statistical concepts, methods, and worked case studies in specific fields of investigation and study.

With sound motivation and many worked practical examples, the books show in down-to-earth terms how to select and use an appropriate range of statistical techniques in a particular practical field. Readers are assumed to have a basic understanding of introductory statistics, enabling the authors to concentrate on those techniques of most importance in the discipline under discussion.

The books meet the need for statistical support required by professionals and research workers across a range of employment fields and research environments. Subject areas covered include medicine and pharmaceutics; industry, finance, and commerce; public services; the earth and environmental sciences.

A complete list of titles in this series appears at the end of the volume.

Human and Biological Sciences

Brown and Prescott · Applied Mixed Models in Medicine

Ellenberg, Fleming and DeMets · Data Monitoring Committees in Clinical Trials: A Practical Perspective

Lawson, Browne and Vidal Rodeiro · Disease Mapping With WinBUGS and MLwiN

Lui · Statistical Estimation of Epidemiological Risk

1Marubini and Valsecchi · Analysing Survival Data from Clinical Trials and Observation Studies

Parmigiani · Modeling in Medical Decision Making: A Bayesian Approach

Senn · Cross-over Trials in Clinical Research, Second Edition

Senn · Statistical Issues in Drug Development

Spiegelhalter, Abrams and Myles · Bayesian Approaches to Clinical Trials and Health-Care Evaluation

Turner · New Drug Development: Design, Methodology, and Analysis

Whitehead · Design and Analysis of Sequential Clinical Trials, Revised Second Edition

Whitehead · Meta-Analysis of Controlled Clinical Trials

Earth and Environmental Sciences

Buck, Cavanagh and Litton · Bayesian Approach to Interpreting Archaeological Data

Cooke · Uncertainty Modeling in Dose Response: Bench Testing Environmental Toxicity

Gibbons, Bhaumik, and Aryal · Statistical Methods for Groundwater Monitoring, Second Edition

Glasbey and Horgan · Image Analysis in the Biological Sciences

Helsel · Nondetects and Data Analysis: Statistics for Censored Environmental Data

McBride · Using Statistical Methods for Water Quality Management: Issues, Problems and Solutions

Webster and Oliver · Geostatistics for Environmental Scientists

Industry, Commerce and Finance

Aitken and Taroni · Statistics and the Evaluation of Evidence for Forensic Scientists, Second Edition

Brandimarte · Numerical Methods in Finance and Economics: A MATLAB-Based Introduction, Second Edition

Brandimarte and Zotteri · Introduction to Distribution Logistics

Chan and Wong · Simulation Techniques in Financial Risk Management

Jank · Statistical Methods in eCommerce Research

Jank and Shmueli · Modeling Online Auctions

Lehtonen and Pahkinen · Practical Methods for Design and Analysis of Complex Surveys, Second Edition

Ohser and Mücklich · Statistical Analysis of Microstructures in Materials Science

1. Now available in paperback.

Copyright © 2010 by John Wiley & Sons, Inc. All rights reserved

Published by John Wiley & Sons, Inc., Hoboken, New Jersey

Published simultaneously in Canada

No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permission.

Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.

For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.

Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com.

Library of Congress Cataloging-in-Publication Data:

ISBN: 978-0-470-47565-2

To our families and mentors who have inspired and encouraged us, and always supported our endeavors:

Angel, Isabella, Waltraud, Gerhard and Sabina

–Wolfgang Jank

Boaz and Noa Shmueli, Raquelle Azran, Zahava Shmuely, and Ayala Cohen

–Galit Shmueli

Preface

Our fascination with online auction research started in 2002. Back then, empirical research using online auction data was just beginning to appear, and it was concentrated primarily in the fields of economics and information systems. We started our adventure by scrutinizing online auction data in a statistically oriented, very simple, and descriptive fashion, asking questions such as “How do we plot a single bid history?,” “How do we plot 1000 bid histories without information overload?,” “How do we represent the price evolution during an ongoing auction?,” “How do we collect lots and lots of data efficiently?”. During that period, we started to collaborate with colleagues (primarily non-statistician) who introduced us to auction theory, to its limitations in the online environment, and to modern technologies for quickly and efficiently collecting large amounts of online auction data. Taking advantage of our “biased” statistical thinking, we started looking at existing questions such as “What auction factors affect the final price?” or “How can we quantify the winner's curse?” in a new, data-driven way. Then, after having been exposed to huge amounts of data and after having developed a better understanding of how the online auction mechanism works, we started asking more “unusual” and risky questions such as “How can we capture the price dynamics of an auction?” or “How can we forecast the outcome of an auction in a dynamic way?” Such questions have led to fruitful research endeavors ever since. In fact, they have led to several rather unique innovations. First, since we posed these questions not only to ourselves but also to our PhD students, it has led to a new PhD specialization. Our graduating students combine a strong background in mathematics or statistics with training in problem solving related to business and electronic commerce. This combination is essential for competing and advancing in the current business analytics environment, as described in the recent monograph

Competing on Analytics by Davenport and Harris or many other articles and books on the same topic.

Our research has also led to the birth of a new research area called statistical challenges in eCommerce research. Our interactions and collaborations with many colleagues have led to a new annual symposium, now in its sixth year, that carries the same name (see also http://www.statschallenges.com). The symposium attracts researchers from different disciplines who all share the same passion: using empirical techniques to take advantage of the wealth of eCommerce data for the purpose of better understanding the online world.

There are several aspects that make online auction research a topic to be passionate about. Looking long and hard at data from online auctions, we have discovered surprising structures in the data, structures that were not straightforward to represent or to model using standard statistical tools. Two examples are the very unevenly spacing of event arrivals and the “semicontinuous” nature of online auction data. To handle such data and tease out as much information as possible from them, we adapted existing statistical methods or developed new tools and methodology. Our statistical approach to online auction research has resulted in a wide variety of methods and tools that support the exploration, modeling, and forecasting of auction data. While this book primarily focuses on auctions, the lessons learned can be applied more generally and could be of interest to any researcher studying online processes.

The book is intended for researchers and students from a wide variety of disciplines interested in applied statistical modeling of online auctions. On the one hand, we believe that this book will add value to the statistician, interested in developing methodology and in search of new applications—this book carefully describes several areas where statisticians can make a difference and it also provides a wealth of new online auction data. On the other hand, the book can add value to the marketer, the economist, or the information systems researcher, looking for new approaches to derive insights from online auctions or online processes—we are very careful in describing our way of scrutinizing online auction data to extract even the last bit of (possibly surprising) knowledge from them. To allow researchers to replicate (and improve) our methods and models, we also provide the code to our software programs. For this purpose, we make available related data and code at a companion website (http://ModelingOnlineAuctions.com) for those who are interested in hands-on learning. Our own learning started with data, and we therefore encourage others to explore online auction data directly.

Wolfgang Jank and Galit Shmueli January 2010

Acknowledgments

There are many people who helped and influenced us and we would like to thank them for making this book come to fruition:

First of all, we would like to thank our following students who have worked (and sometimes suffered) with us while developing many of the ideas and concepts described in this book: Phd students Shanshan Wang, Valerie Hyde, Shu Zhang, Aleks Aris, and Inbal Yahav, as well as Masters' students Brian Alford, Lakshmi Urimi, and others who participated in our first Research Interaction Team on online auctions.

We also thank our many co-authors and co-contributors. The generation of new knowledge and ideas, and their implementation and application would have never seen the light of day without our close collaborations. A special thanks to Ravi Bapna (University of Minnesota), who introduced us early on to the research community studying online auctions and to the immense potential of empirical methods in this field. We thank our University of Maryland colleagues Ben Shneiderman and Catherine Plaisant from the Human-Computer Interaction Lab, PK Kannan from the Marketing department, and Paul Smith from the Mathematics department. We also thank Ralph Russo (University of Iowa), Mayukh Dass (Texas Tech University), N. D. Shyamalkumar (University of Iowa), Paolo Buono (University of Bari), Peter Popkowski Leszczyc (University of Alberta), Ernan Haruvy (University of Texas at Dallas), and Gerhard Tutz (University of Munich).

We express our deepest gratitude to our many colleagues who have helped shape ideas in our head through fruitful discussions and for their enthusiasm and support for our endeavors: Paulo Goes (University of Connecticut), Alok Gupta (University of Minnesota), Rob Kauffman (University of Arizona), Ramayya Krishnan (Carnegie Mellon University), Ram Chellapa (Emory University), Anindya Ghose (NYU), Foster Provost (NYU), Sharad Borle (Rice University), Hans-Georg Mueller (University of California at Davis), Gareth James (University of Southern California), Otto Koppius (Erasmus University), Daryl Pregibon (Google), and Chris Volinsky (AT&T).

We also greatly thank our statistician colleagues Ed George (University of Pennsylvania), Jim Ramsay (McGill University), Steve Marron (University of North Carolina), Jeff Simonoff (NYU), David Steinberg (Tel Aviv University), Don Rubin (Harvard University), David Banks (Duke University), and Steve Fienberg (Carnegie Mellon University), who supported and encouraged our interdisciplinary research in many, many ways.

Finally, we thank Steve Quigley and the team at Wiley for their enthusiasm and welcoming of our manuscript ideas.

Chapter 1

Introduction

Online auctions have received an extreme surge of popularity in recent years. Websites such as eBay.com, uBid.com, or Swoopo.com are marketplaces where buyers and sellers meet to exchange goods or information. Online auction platforms are different from fixed-price retail environments such as Amazon.com since transactions are negotiated between buyers and sellers. The popularity of online auctions stems from a variety of reasons. First, online auction websites are constantly available, so sellers can post items at any time and bidders can place bids day or night. Items are typically listed for several days, giving purchasers time to search, decide, and bid. Second, online auctions face virtually no geographical constraints and individuals in one location can participate in an auction that takes place in a completely different location of the world. The vast geographical reach also contributes to the variety of products offered for sale—both new and used. Third, online auctions also provide entertainment, as they engage participants in a competitive environment. In fact, the social interactions during online auctions have sometimes been compared to gambling, where bidders wait in anticipation to win and often react emotionally to being outbid in the final moments of the auction.

Online auctions are relatively new. By an “online auction” we refer to a Web-based auction, where transactions take place on an Internet portal. However, evenbefore the advent of Internet auctions as we know them today, auctions were heldelectronically via email messages, discussion groups, and newsgroups. David Lucking-Reiley (2000) describes the newsgroup rec.games.deckmaster where Internet users started trading “Magic” cards (related to the game Magic: the Gathering) as early as 1995. He writes

By the spring of 1995, nearly 6,000 messages were being posted each week, making rec.games.tradingcards.marketplace the highest-volume newsgroup on the Internet. Approximately 90 percent of the 26,000 messages per month were devoted to the trading of Magic cards, with the remaining 10 percent devoted to the trading of cards from other games.

Lucking-Reiley (2000) presents a brief history of the development of Internet auctions and also provides a survey of the existing online auction portals as of 1998. The first online auction websites, launched in 1995, went by the names of Onsale and eBay. Onsale (today Egghead) later sold its auction service to Yahoo! and moved to fixed-price retailing. Yahoo! and Amazon each launched their own online auction services in 1999. Within 10 years or so, both shut down their auction services and now focus exclusively on fixed-price operations. (At the time of writing, Yahoo! maintains online auctions in Hong Kong, Taiwan, and Japan.) Thus, from 1995 until today (i.e., 2010) the consumer-to-consumer online auction marketplace has followed the pattern of eCommerce in general: An initial mushrooming of online auction websites was followed by a strong period of consolidations, out of which developed the prominent auction sites that we know today: eBay, uBid, or Swoopo (for general merchandize), SaffronArt (for Indian art), or Prosper (for peer-to-peer lending).

Empirical research of online auctions is booming. In fact, it has been booming much more compared to traditional, brick-and-mortar auctions. It is only fair to ask the question: “Why has data-driven research of online auctions become so much more popular compared to that of traditional auctions?” We believe the answer is simple and can be captured in one word: data! In fact, the public access to ongoing and past auction transactions online has opened new opportunities for empirical researchers to study the behavior of buyers and sellers. Moreover, theoretical results, founded in economics and derived for the offline, brick-and-mortar auction, have often proven not to hold in the online environment. Possible reasons that differentiate online auctions from their offline counterparts are the worldwide reach of the Internet, anonymity of its users, virtually unlimited resources, constant availability, and continuous change.

In one of the earliest examinations of online auctions (e.g., Lucking-Reiley et al., 2000), empirical economists found that bidding behavior, particularly on eBay, often diverges significantly from what classical auction theory predicts. Since then, there has been a surge in empirical analysis using online auction data in the fields of information systems, marketing, computer science, statistics, and related areas. Studies have examined bidding behavior in the online environment from multiple different angles: identification and quantification of new bidding behavior and phenomena, such as bid sniping (Roth and Ockenfels, 2002) and bid shilling (Kauffman and Wood, 2005); creation of a taxonomy of bidder types (Bapna et al., 2004); development of descriptive probabilistic models to capture bidding and bidder activity (Shmueli et al., 2007; Russo et al., 2008), as well as bidder behavior in terms of bid timing and amount (Borle et al., 2006; Park and Bradlow, 2005); another stream of research focuses on the price evolution during an online auction. Related to this are studies on price dynamics (Wang et al., 2008a, 2008b; Bapna et al., 2008b; Dass and Reddy, 2008; Reddy and Dass, 2006; Jank and Shmueli, 2006; Hyde et al., 2008; Jank et al., 2008b, 2009a, 2009b) and the development of novel models for dynamically forecasting auction prices (Wang et al., 2008a; Jank and Zhang, 2009a, 2009b; Zhang et al., 2010; Jank and Shmueli, 2010; Jank et al., 2006; Dass et al., 2009). Further topics of research are quantifying economic value such as consumer surplus in eBay (Bapna et al., 2008a), and more recently, online auction data are also being used for studying bidder and seller relationships in the form of networks (Yao and Mela, 2007; Dass and Reddy, 2008; Jank and Yahav, 2010), or competition between products, between auction formats, and even between auction platforms (Haruvy et al., 2008; Hyde et al., 2006; Jank and Shmueli, 2007; Haruvy and Popkowski Leszczyc, 2009). All this illustrates that empirical research of online auctions is thriving.

1.1 Online Auctions and Electronic Commerce

Online auctions are part of a broader trend of doing business online, often referred to as electronic commerce, or eCommerce. eCommerce is often associated with any form of transaction originating on the Web. eCommerce has had a huge impact on the way we live today compared to a decade ago: It has transformed the economy, eliminated borders, opened doors to innovations that were unthinkable just a few years ago, and created new ways in which consumers and businesses interact. Although many predicted the death of eCommerce with the “burst of the Internet bubble” in the late 1990s, eCommerce is thriving more than ever. eCommerce transactions include buying, selling, or investing online. Examples are shopping at online retailers such as Amazon.com or participating in online auctions such as eBay.com; buying or selling used items through websites such as Craigslist.com; using Internet advertising (e.g., sponsored ads by Google, Yahoo!, and Microsoft); reserving and purchasing tickets online (e.g., for travel or movies); posting and downloading music, video, and other online content; postings opinions or ratings about products on websites such as Epinions or Amazon; requesting or providing services via online marketplaces or auctions (e.g., Amazon Mechanical Turk or eLance); and many more.

Empirical eCommerce research covers many topics, ranging from very broad to very specific questions. Examples of rather specific research questions cover topics such as the impact of online used goods markets on sales of CDs and DVDs (Telang and Smith, 2008); the evolution of open source software (Stewart et al., 2006); the optimality of online price dispersion in the software industry (Ghose and Sundararajan, 2006); the efficient allocation of inventory in Internet advertising (Agarwal, 2008); the optimization of advertisers' bidding strategies (Matas and Schamroth, 2008); the entry and exit of Internet firms (Kauffman and Wang, 2008); the geographical impact of online sales leads (Jank and Kannan, 2008); the efficiency and effectiveness of virtual stock markets (Spann and Skiera, 2003; Foutz and Jank, 2009); or the impact of online encyclopedia Wikipedia (Warren et al., 2008).

Broad research questions include issues of privacy and confidentiality of eCommerce transactions (Fienberg, 2006, 2008) and other issues related to mining Internet transactions (Banks and Said, 2006), modeling clickstream data (Goldfarb and Lu, 2006), and understanding time-varying relationships in eCommerce data (Overby and Konsynski, 2008). They also include questions on how online experiences advance our understanding of the offline world (Forman and Goldfarb, 2008); the economic impact of user-generated online content (Ghose, 2008); challenges in collecting, validating, and analyzing large-scale eCommerce data (Bapna et al., 2006) or conducting randomized experiments online (Van der Heijden and Böckenholt, 2008); as well as questions on how to assess the causal effect of marketing interventions (Rubin and Waterman, 2006; Mithas et al., 2006) and the effect of social networks and word of mouth (Hill et al., 2006; Dellarocas and Narayan, 2006).

Internet advertising is another area where empirical research is growing, but currently more so inside of companies and to a lesser extent in academia. Companies such as Google, Yahoo!, and Microsoft study the behavior of online advertisers using massive data sets of bidding and bidding outcomes to more efficiently allocate inventory (e.g., ad placement) (Agarwal, 2008). Online advertisers and companies that provide services to advertisers also examine bid data. They study relationships between bidding and profit (or other measures of success) for the purpose of optimizing advertisers' bidding strategies (Matas and Schamroth, 2008).

Another active and growing area of empirical research is that of prediction markets, also known as “information markets,” “idea markets,” or “betting exchanges.” Prediction markets are mechanisms used to aggregate the wisdom of crowds (Surowiecki, 2005) from online communities to forecast outcomes of future events and they have seen many interesting applications, from forecasting economic trends to natural disasters to elections to movie box-office sales. While several empirical studies (Spann and Skiera, 2003; Forsythe et al., 1999; Pennock et al., 2001) report on the accuracy of final trading prices to provide forecasts, there exists evidence that prediction markets are not fully efficient, which brings up interesting new statistical challenges (Foutz and Jank, 2009).

There are many similarities between the statistical challenges that arise in the empirical analysis of online auctions and that of eCommerce in general. Next, we discuss some of these challenges in the context of online auctions; for more on the aspect of eCommerce research, see, for example, Jank et al. (2008a) or Jank and Shmueli (2008a).

1.2 Online Auctions and Statistical Challenges

A key reason for the booming of empirical online auctions research is the availability of data: lots and lots of data! However, while data open the door to investigating new types of research questions, they also bring up new challenges. Some of these challenges are related to data volume, while others reflect the new structure of Web data. Both issues pose serious challenges for the empirical researcher.

In this book, we offer methods for handling and modeling the unique data structure that arises in online auction Web data. One major aspect is the combination of temporal and cross-sectional information. Online auctions (e.g., eBay) are a point in case. Online auctions feature two fundamentally different types of data: the bid history and the auction description. The bid history lists the sequence of bids placed over time and as such can be considered a time series. In contrast, the auction description (e.g., product information, information about the seller, and the auction format) does not change over the course of the auction and therefore is cross-sectional information. The analysis of combined temporal and cross-sectional data poses challenges because most statistical methods are geared only toward one type of data. Moreover, while methods for panel data can address some of these challenges, these methods typically assume that events arrive at equally spaced time intervals, which is not at all the case for online auction data. In fact, Web-based temporal data that are user-generated create nonstandard time series, where events are not equally spaced. In that sense, such temporal information is better described as a process. Because of the dynamic nature of the Web environment, many processes exhibit dynamics that change over the course of the process. On eBay, for instance, prices speed up early, then slow down later, only to speed up again toward the auction end. Classical statistical methods are not geared toward capturing the change in process dynamics and toward teasing out similarities (and differences) across thousands (or even millions) of online processes.

Another challenge related to the nature of online auction data is capturing competition between auctions. Consider again the example of eBay auctions. On any given day, there exist tens of thousands of identical (or similar) products being auctioned that all compete for the same bidders. For instance, during the time of writing, a simple search under the keywords “Apple iPod” reveals over 10,000 available auctions, all of which vie for the attention of the interested bidder. While not all of these 10,000 auctions may sell an identical product, some may be more similar (in terms of product characteristics) than others. Moreover, even among identical products, not all auctions will be equally attractive to the bidder due to differences in sellers' perceived trustworthiness or differences in auction format. For instance, to bidders that seek immediate satisfaction, auctions that are 5 days away from completion may be less attractive than auctions that end in the next 5 minutes. Modeling differences in product similarity and their impact on bidders' choices is challenging (Jank and Shmueli, 2007). Similarly, understanding the effect of misaligned (i.e., different starting times, different ending times, different durations) auctions on bidding decisions is equally challenging (Hyde et al., 2006) and solutions are not readily available in classical statistical tools. For a more general overview of challenges associated with auction competition, see Haruvy et al. (2008).

Another challenge to statistical modeling is the existence of user networks and their impact on transaction outcomes. Networks have become an increasingly important component of the online world, particularly in the “new web,” Web 2.0, and its network-fostering enterprises such as Facebook, MySpace, and LinkedIn. Networks also exist in other places (although less obviously) and impact transaction outcomes. On eBay, for example, buyers and sellers form networks by repeatedly transacting with one another. This raises the question about the mobility and characteristics of networks across different marketplaces and their impact on the outcome of eCommerce transactions. Answers to these questions are not obvious and require new methodological tools to characterize networks and capture their impact on the online marketplace.

1.3 A Statistical Approach to Online Auction Research

In this book, we provide empirical methods for tackling the challenges described above. As with many books, we present both a description of the problem and potential solutions. It is important to remember that our main focus is statistical. That is, we discuss methods for collecting, exploring, and modeling online auction data. Our models are aimed at capturing empirical phenomena in the data, at gaining insights about bidders' and sellers' ehavior, and at forecasting the outcome of online auctions. Our approach is pragmatic and data-driven in that we incorporate domain knowledge and auction theory in a less formalized fashion compared to typical exposés in the auction literature. We make extensive use of nonparametric methods and data-driven algorithms to avoid making overly restrictive assumptions (many of which are violated in the online auction context) and to allow for the necessary flexibility in this highly dynamic environment. The online setting creates new opportunities for observing human behavior and economic relationships “in action,” and our goal is to provide tools that support the exploration, quantification, and modeling of such relationships.

We note that our work has been inspired by the early research of Lucking-Reiley et al. (2000) who, to the best of our knowledge, were the first to conduct empirical research in the context of online auctions. The fact that it took almost 9 years from the first version of their 1999 working paper until its publication in 2007 (Lucking-Reiley et al., 2007) shows the hesitation with which some of this empirical research was greeted in the community. We believe though that some of this hesitation has subsided by now.

1.4 The Structure of this Book

The order of the chapters in this book follows the chronology of empirical data analysis: from data collection, through data exploration, to modeling and forecasting.

We start in Chapter 2 by discussing different ways for obtaining online auction data. In addition to the standard methods of data purchasing or collaborating with Internet businesses, we describe the currently most popular method of data collection: Web crawling and Web services. These two technologies generate large amounts of rich, high-quality online auction data. We also discuss Web data collection from a statistical sampling point of view, noting the various issues that arise in drawing data samples from a website, and how the resulting samples relate to the population of interest.

Chapter 3 continues with the most important step in data analysis: data exploration. While the availability of huge amounts of data often tempts the researcher to directly jump into sophisticated models and methods, one of the main messages of this book is that it is of extreme importance to first understand one's data, and to explore the data for patterns and anomalies. Chapter 3 presents an array of data exploration methods and tools that support the special structures that arise in online auction data. One such structure is the unevenly spacing of time series (i.e., the bid histories) and their combination with cross-sectional information (i.e., auction details). Because many of the models presented in the subsequent chapters make use of an auction's price evolution, we describe plots for displaying and exploring curves of the price and its dynamics. We also discuss curve clustering, which allows the researcher to segment auctions by their different price dynamics.

Another important facet is the concurrent nature of online auctions and their competition with other auctions. We present methods for visualizing the degree of auction concurrency as well as its context (e.g., collection period and data volume). We also discuss unusual data structures that can often be found in online auctions: semicontinuous data. These data are continuous but contain several “too-frequent” values. We describe where and how such semicontinuous data arise and propose methods for presenting and exploring them in Chapter 3.

The chapter continues with another prominent feature of online auction data: data hierarchies. Hierarchies arise due to the structure of online auction websites, where listings are often organized in the form categories, subcategories, and subsubcategories. This organization plays an important role in how bidders locate information and, ultimately, in how listings compete with one another.

Chapter 3 concludes with a discussion of exploratory tools for interactive visualization that allow the researcher to “dive” into the data and make multidimensional exploration easier and more powerful.

Chapter 4 discusses different statistical models for capturing relationships in auction data. We open with a more formal exposition of the price curve representation, which estimates the price process (or price evolution) during an ongoing auction. The price process captures much of the activity of individual bidders and also captures interactions among bidders, such as bidders competing with one another or changes in a bidder's bidding strategies as a result of the strategies of other bidders. Moreover, the price process allows us to measure all of this change in a very parsimonious matter—via the price dynamics. Chapter 4 hence starts out by discussing alternatives for capturing price dynamics and then continues to propose different models for price dynamics. In that context, we propose functional regression models that allow the researcher to link price dynamics with covariate information (such as information about the seller, the bidders, or the product). We then extend the discussion to functional differential equation models that capture the effect of the process itself in addition to covariate information.

We then discuss statistical models for auction competition. By competition, we mean many auctions that sell similar (i.e., substitute) products and hence vie for the same bidders. Modeling competition is complicated because it requires the definition of “similar items.” We borrow ideas from spatial models to capture the similarity (or dissimilarity) of products in the associated feature space. But competition may be more complex. In fact, competition also arises from temporal concurrency: Auctions that are listed only a few minutes or hours apart from one another may show stronger competition compared to auctions that end on different days. Modeling temporal relationships is challenging since the auction arrival process is extremely uneven and hence requires a new definition of the traditional “time lag.”

Chapter 4 continues with discussing models for bidder arrivals and bid arrivals in online auctions. Modeling the arrival of bids is not straightforward because online auctions are typically much longer compared to their brick-and-mortar counterparts and hence they experience periods of little to no activity, followed by “bursts” of bidding. In fact, online auctions often experience “deadline effects” in that many bids are placed immediately before the auction closes. These different effects make the process deviate from standard stochastic models. We describe a family of stochastic models that adequately capture the empirically observed bid arrival process. We then tie these models to bidder arrival and bid placement strategies. Modeling the arrival of bidders (rather than bids) is even more challenging because while bids are observed, the entry (or exit) of bidders is unobservable.

Chapter 4 concludes with a discussion of auction networks. Networks have become omnipresent in our everyday lives, not the least because of the advent of social networking sites such as MySpace or Facebook. While auction networks is a rather new and unexplored concept, one can observe that links between certain pairs of buyers and sellers are stronger than others. In Chapter 4, we discuss some approaches for exploring such bidder–seller networks.

Finally, in Chapter 5 we discuss forecasting methods. We separated “forecasting” from “modeling” (in Chapter 4) because the process of developing a model (or a method) that can predict the future is typically different from retroactively building a model that can describe or explain an observed relationship.

Within the forecasting context, we consider three types of models, each adding an additional layer of information and complexity. First, we consider forecasting models that only use the information from within a given ongoing auction to forecast its final price. In other words, the first—and most basic—model only uses information that is available from within the auction to predict the outcome of that auction. The second model builds upon the first model and considers additional information about other simultaneous auctions. However, the information on outside auctions is not modeled explicitly. The last—and most powerful—model explicitly measures the effect of competing auctions and uses it to achieve better forecasts.

We conclude Chapter 5 by discussing useful applications of auction forecasting such as automated bidding decision rule systems that rely on auction forecasters.

1.5 Data and Code Availability

In the spirit of publicly (and freely) available information (and having experienced the tremendous value of rich data for conducting innovative research firsthand), we make many of the data sets described in the book available at http://www.ModelingOnlineAuctions.com. The website also includes computer code used for generating some of the results in this book. Readers are encouraged to use these resources and to contribute further data and code related to online auctions research.

Bibliography

Agarwal, D. (2008). Statistical challenges in Internet advertising. In Jank, W. and Shmueli, G. (eds.), Statistical Methods in eCommerce Research, Wiley, New York.

Banks, D. and Said, Y. (2006). Data mining in electronic commerce. Statistical Science, 21(2): 234–246.

Bapna, R., Goes, P., Gopal, R., and Marsden, J. (2006). Moving from data-constrained to data-enabled research: experiences and challenges in collecting, validating, and analyzing large-scale e-commerce data. Statistical Science, 21: 116–130.

Bapna, R., Goes, P., Gupta, A., and Jin, Y. (2004). User heterogeneity and its impact on electronic auction market design: an empirical exploration. MIS Quarterly, 28(1): 21–43.

Bapna, R., Jank, W., and Shmueli, G. (2008a). Consumer surplus in online auctions. Information Systems Research, 19: 400–416.

Bapna, R., Jank, W., and Shmueli, G. (2008b). Price formation and its dynamics in online auctions. Decision Support Systems, 44: 641–656.

Borle, S., Boatwright, P., and Kadane, J. B. (2006). The timing of bid placement and extent of multiple bidding: an empirical investigation using eBay online auctions. Statistical Science, 21(2): 194–205.

Dass, M., Jank, W., and Shmueli, G. (2009). Dynamic price forecasting in simultaneous online art auctions. In Casillas, J. and Martnez-López, F. J. (eds.), Marketing Intelligent Systems Using Soft Computing, Springer.

Dass, M. and Reddy, S. K. (2008). An analysis of price dynamics, bidder networks and market structure in online auctions. In Jank, W. and Shmueli, G. (eds.), Statistical Methods in eCommerce Research. Wiley, New York.

Dellarocas, C. and Narayan, R. (2006). A statistical measure of a population's propensity to engage in post-purchase online word-of-mouth. Statistical Science, 21(2): 277–285.

Fienberg, S. E. (2006). Privacy and confidentiality in an e-commerce world: data mining, data warehousing, matching and disclosure limitation. Statistical Science, 21(2): 143–154.

Fienberg, S. E. (2008). Is privacy protection for data in an eCommerce world an oxymoron? In Jank, W. and Shmueli, G. (eds.), Statistical Methods in eCommerce Research, Wiley, New York.

Forman, C. and Goldfarb, A. (2008). How has electronic commerce research advanced understanding of the offline world? In Jank, W. and Shmueli, G. (eds.), Statistical Methods in eCommerce Research, Wiley, New York.

Forsythe, R., Rietz, T. A., and Ross, T. W. (1999). Wishes, expectations, and actions: a survey on price formation in election stock markets. Journal of Economic Behavior & Organization, 39: 83–110.

Foutz, N. and Jank, W. (2009). Pre-release demand forecasting for motion pictures using functional shape analysis of virtual stock markets. Marketing Science, in press. Published online in Articles in Advance, December 2, 2009 DOI: 10.1287/mksc.1090.054210.1287/mksc.1090.0542.

Ghose, A. (2008). The economic impact of user-generated and firm-published online content: directions for advancing the frontiers in electronic commerce research. In Jank, W. and Shmueli, G. (eds.), Statistical Methods in eCommerce Research, Wiley, New York.

Ghose, A. and Sundararajan, A. (2006). Evaluating pricing strategy using e-commerce data: evidence and estimation challenges. Statistical Science, 21(2): 131–142.

Goldfarb, A. and Lu, Q. (2006). Household-specific regressions using clickstream data. Statistical Science, 21(2): 247–255.

Haruvy, E. and Popkowski Leszczyc, P. (2009). What does it take to make consumers search? Working Paper, Department of Marketing, Business Economics and Law, University of Alberta.

Haruvy, E., Popkowski Leszczyc, P., Carare, O., Cox, J., Greenleaf, E., Jap, S., Jank, W., Park, Y., and Rothkopf, M. (2008). Competition between auctions. Marketing Letters, 19(3--4): 431–448.

Hill, S., Provost, F., and Volinsky, C. (2006). Network-based marketing: identifying likely adopters via consumer networks. Statistical Science, 21(2): 256–276.

Hyde, V., Jank, W., and Shmueli, G. (2006). Investigating concurrency in online auctions through visualization. The American Statistician, 60: 241–250.

Hyde, V., Jank, W., and Shmueli, G. (2008). A family of growth models for representing the price process in online auctions. In Jank, W. and Shmueli, G. (eds.), Statistical Methods in eCommerce Research, Wiley, New York.

Jank, W. and Kannan, P. K. (2008). Spatial models for online mortgage leads. In Jank, W. and Shmueli, G. (eds.), Statistical Methods in eCommerce Research, Wiley, New York.

Jank, W. and Shmueli, G. (2006). Functional data analysis in electronic commerce research. Statistical Science, 21(2): 155–166.

Jank, W. and Shmueli, G. (2007). Modelling concurrency of events in on-line auctions via spatiotemporal semiparametric models. Journal of the Royal Statistical Society, Series C, 56(1): 1–27.

Jank, W. and Shmueli, G. (2008a). Statistical Methods in eCommerce Research, Wiley, New York.

Jank, W. and Shmueli, G. (2010). Forecasting online auctions using dynamic models. In Soares, C. and Ghani, R. (eds.), Data Mining for Business Applications, IOS Press, in press.

Jank, W. Shmueli, G. Dass, M. Yahav, I., and Zhang, S. (2008a). Statistical challenges in eCommerce: modeling dynamic and networked data. INFORMS Tutorials in Operations Research, 2008 edition, pp. 31–54.

Jank, W., Shmueli, G., and Wang, S. (2006). Dynamic, real-time forecasting of online auction via functional models. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD2006), Philadelphia, PA, August 20–23, 2006.

Jank, W., Shmueli, G., and Wang, S. (2008b). Modeling price dynamics in online auctions via regression trees. In Jank, W. and Shmueli, G. (eds.), Statistical Methods in eCommerce Research, Wiley, New York.

Jank, W. and Yahav, I. (2010). E-loyalty networks in online auctions. The Annals of Applied Statistics, in press.

Jank, W. and Zhang, S. (2009a). An automated and data-driven bidding strategy for online auctions. Technical Report, RH Smith School of Business, University of Maryland. Available at SSRN: http://ssrn.com/abstract=1427212.

Jank, W. and Zhang, S. (2009b). Competition in online markets: model selection for improved forecasting. Technical Report, RH Smith School of Business, University of Maryland.

Kauffman, R. and Wang, B. (2008). Developing rich insights on public internet firm entry and exit based on survival analysis and data visualization. In Jank, W. and Shmueli, G. (eds.), Statistical Methods in eCommerce Research, Wiley, New York.

Kauffman, R. J. and Wood, C. A. (2005). The effects of shilling on final bid prices in online auctions. Electronic Commerce Research and Applications, 4(2): 21–34.

Lucking-Reiley, D. (2000) Auctions on the Internet: what's being auctioned, and how? Journal of Industrial Economics, 48(3): 227–252.

Lucking-Reiley, D., Bryan, D., Prasad, N., and Reeves, D. (2007). Pennies from eBay: the determinants of price in online auctions. The Journal of Industrial Economics, 55(2): 223–233.

Lucking-Reiley, D., Bryan, D., and Reeves, D. (2000). Pennies from eBay: the determinants of price in online auctions. Working Paper 00-W03, Department of Economics, Vanderbilt University.

Matas, A. and Schamroth, Y. (2008). Optimization of search engine marketing bidding strategies using statistical techniques. In Jank, W. and Shmueli, G. (eds.), Statistical Methods in eCommerce Research, Wiley, New York.

Mithas, S., Almirall, D., and Krishnan, M. S. (2006). Do CRM systems cause one-to-one marketing effectiveness? Statistical Science, 21(2): 223–233.

Overby, E. and Konsynski, B. (2008). Modeling time-varying relationships in pooled cross-sectional eCommerce data. In Jank, W. and Shmueli, G. (eds.), Statistical Methods in eCommerce Research, Wiley, New York.

Park, Y.-H. and Bradlow, E. (2005). An integrated model for whether, who, when, and how much in Internet auctions. SSRN eLibrary.

Pennock, D. M., Lawrence, S., Giles, C. L., and Nielsen, F. A. (2001). The real power of artificial markets. Science, 291(5506): 987–988.

Reddy, S. K. and Dass, M. (2006). Modeling on-line art auction dynamics using functional data analysis. Statistical Science, 21(2): 179–193.

Roth, A. E. and Ockenfels, A. (2002). Last-minutes bidding and the rules for ending second price auctions: evidence from eBay and Amazon on the Internet. American Economic Review, 92: 1093–1103.

Rubin, D. B. and Waterman, R. P. (2006). Estimating the causal effects of marketing interventions using propensity score methodology. Statistical Science, 21(2): 206–222.

Russo, R. P., Shmueli, G., and Shyamalkumar, N. D. (2008). Models of bidder activity consistent with self-similar bid arrivals. In Jank, W. and Shmueli, G. (eds.), Statistical Methods in eCommerce Research, Wiley, New York, pp. 325–339.

Shmueli, G., Russo, R., and Jank, W. (2007). The BARISTA: a model for bid arrivals in online auctions. The Annals of Applied Statistics, 1(2): 412–441.

Spann, M. and Skiera, B. (2003). Internet-based virtual stock markets for business forecasting. Management Science, 49(10): 1310–1326.

Stewart, K., Darcy, D., and Daniel, S. (2006). Opportunities and challenges applying functional data analysis to the study of open source software evolution. Statistical Science, 21(2): 167–178.

Surowiecki, J. (2005). The Wisdom of Crowds. Random House Inc., New York.

Telang, R. and Smith, M. D. (2008). Internet exchanges for used digital goods. SSRN eLibrary.

Van der Heijden, P. and Böckenholt, U. (2008). Applications of randomized response methodology in eCommerce. In Jank, W. and Shmueli, G. (eds.), Statistical Methods in eCommerce Research, Wiley, New York.

Wang, S., Jank, W., and Shmueli, G. (2008a). Explaining and forecasting online auction prices and their dynamics using functional data analysis. Journal of Business and Economic Statistics, 26(2): 144–160.

Wang, S., Jank, W., Shmueli, G., and Smith, P. (2008b). Modeling price dynamics in eBay auctions using principal differential analysis. Journal of American Statistical Association, 103(483): 1100–1118.

Warren, R., Eiroldi, E., and Banks, D. (2008). Shared knowledge systems with value: statistical aspects of Wikipedia. In Jank, W. and Shmueli, G. (eds.), Statistical Methods in eCommerce Research, Wiley, New York.

Yao, S. and Mela, C. F. (2007). Online auction demand. SSRN eLibrary.

Zhang, S., Jank, W., and Shmueli, G. (2010). Real-time forecasting of online auctions via functional k-nearest neighbors. International Journal of Forecasting, in press.

Chapter 2

Obtaining Online Auction Data

2.1 Collecting Data from the Web

Where do researchers get online auction data? In addition to traditional channels such as obtaining data directly from the company via purchase or working relationships, the Internet offers several new avenues for data collection. In particular, the availability of online auction data is much wider and easier compared to ordinary “offline” auction data, which has contributed to the large and growing research literature on online auctions. Because transactions take place online in these marketplaces, and because of the need to attract as many sellers and buyers, information on ongoing auctions is usually made publicly available by the website. Moreover, due to the need of buyers and sellers to study the market to determine and update their strategies, online auction websites often also make publicly available data on historical auctions, thereby providing access to large archival data sets. Different websites vary in the length of available history and the type of information made available for an auction. For example, eBay () makes publicly available the data on all ongoing and recently closed auctions, and for each auction the data include the entire bid history (time stamp and bid amount) except for the highest bid, as well as information about the seller, the auctioned item, and the auction format. In contrast, SaffronArt (), which auctions contemporary Indian art, provides past-auction information about the winning price, the artwork details, and the initial estimate of closed auctions, but the bid history is available only during the live auction. On both eBay and SaffronArt websites, historical data can be accessed only after logging in.

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!