At the time I am writing this paragraph (2008A.D.), the keywords "RANK" and "WEB" seem to mislead sophisticated readers to the "Google Page Rank". This page is on a mathematical model of the Amazon Sales Rank, and mathematically has very little to do with the Google Page Rank. The two are mathematically VERY different, and I am interested on the perhaps less popular subject, in other words, on a subject in the long tail.
The following is a summary of what is planned to be explained in this page.
(added 2018/02)
When one visits Amazon.co.jp and look for a book, one would face a web page which looks typically like the right figure. (The original web page is rather large, so only a part of the page is shown in the right figure.)
The page is more or less similar to a corresponding page at Amazon.com (or a corresponding page in any other countries), except that the page is written in Japanese language. Similarly to the case in Amazon.com, there is a line in this page saying "Amazon.co.jp ranking: 384,057th among the books". The number is updated every hour, as is the case in Amazon.com.
Focusing attention on such a number perhaps does not interest many people, but there are more than one person deeply interested in Amazon ranking or Sales Ranks. M. Rosenthal accumulates a large data of Amazon.com sales ranks and summarizes in his web page a schematic relation between the rank numbers and the sales rates of books at Amazon.com. There are also research papers on econometrical studies based on time evolutions of sales ranks. J.Chevalier and A.Goolsbee, Measuring prices and price competition online: Amazon.com and BarnesandNoble.com, Quantitative Marketing and Economics 1 (2) (2003) 203-222, reports a method of estimating the sales rate of a book title from the book's sales ranks. Their method of relating sales rates to sales ranks is to buy, say three books, and wait for 3 weeks, then the rank number at that point should correspond to the sales rate of a book per week. Mathematically, this method assumes that the ranking number increases linearly in time and that the rate is proportional to the number of books ordered.
Let us see what is happening in Amazon.co.jp ranking number.
With a closer look into the time evolution of the ranking number at Amazon.co.jp, we immediately see that for Amazon.co.jp, the ranking number is not linear in time, nor is proportional to sales rate. In fact, it behaves very wildly.
Choose a book which sells one per a few weeks to a few month, and stick to the time evolution of its Amazon.co.jp ranking number for an year. If you have never done so, you might then be amazed to find a rather wild behavior as in the following schematic figure.
(The figure is a schematic drawing and oversimplifies the actual time evolution. However, it is based on actual observations, and the numbers are not exaggeration.) |
The vertical axis represents the ranking number (the higher up in the graph, the less popular). The ranking number 384,057 given as an example at the beginning of this web page, is about one third from top in this figure.
The main question is: " What could be a mathematical definition of a ranking number, such that would reproduce such a curve?"
It may be worthwhile to take a bit of abstract mathematical approach to consider this realistic question, because I have heard that even among the professional publishers not much is known about how Amazon bookstore is doing. In another econometrical research paper also using data from ranking numbers (again with linearity assumption), E.Brynjolfsson, Y.Hu, M.D.Smith, Consumer surplus in the digital economy: Estimating the value of increased product variety at online booksellers, Management Science 49 (2003) 1580-1596, it is written 'Internet retailers are extremely hesitant about releasing specific sales data'.
There is a saying in Japan "Better than hearing 100 times, see once". (Original meaning of course is, experience yourself, rather than trying to collect information of other people's experiences.) Here is a sample movie of the motion of stochastic ranking process:
movie (500KB, 30seconds, wmv file) |
For interested readers, click here for a simple explanation of the stochastic ranking process, a simple mathematical model explaining the wild and random time evolution of the Amazon.co.jp ranking numbers. A full mathematical definitions are in the original paper linked in the References at the end of this web page.
To summarize,
The stochastic ranking process does reproduce observed wild behaviors of the Amazon.co.jp ranking numbers. In fact, the schematic curve drawn in the figure above was calculated by the stochastic ranking process!
The movie or the definition of the model shows that each particle in the model (a book, in the case of online bookstores) randomly jumps to the top ranking and then increases the ranking number (moves gradually towards the less popular tail side), a behavior in common with the figure above with a wild curve. One might then doubt, if such is the case, then the Ranking number may not convey any serious information about the corresponding book's popularity (= sales rate = contribution to the bookstore = the Rank).
The answer to this question is, the time evolution of the number does convey a rich information.
A brief explanation would be as follows. A great hit title sells constantly, so even if the ranking number starts to increase, it quickly jumps back to top, so the number is constantly near the top. The average maximum increase in the number reflects (in a non-linear and rather complex, but in a calculable manner) the sales rate of the book.
On the other hand, for a very unpopular book (i.e., for a book in the far "long tail"), the mean interval between sales is longer than the observation time, and would not sell in the worst case, while one is watching at the book. This means that for a book title in the long tail, a standard short-time observation is too brief to find the book's true popularity, hence any reasonable index for popularity of sales rate should fluctuate.
With these thinking from the view point of the stochastic ranking process, the definition of the model works as a collect index of the popularity and sales rate of a book.
A mathematical model may not be practically useful if the model itself is too hard to solve. Fortunately, the stochastic ranking model is deterministic and solvable in the infinite particle limit, hence for a system with large particles (such as Amazon.co.jp with over 2 million books), the predictions of the model are not too difficult to find out.
Again, better than hearing 100 times, see once. Here is another sample movie of the motion of stochastic ranking process, this time 4 times large in file size and 3 time longer in the movie length:
movie (2MB, 90 seconds, wmv file) |
Allow me a little bit about mathematics. Not only an existence of an infinite particle limit, but also the time developement in the limit can be solved. A key observation is that limit quantities satisfy a system of partial differential equations for 1+1 dimensional mixed incompressible liquid driven by evaporation. Please refer to the original paper linked in the References at the end of this web page, for precise mathematical results.
A mathematical analysis combined with the actual time evolution data of Amazon.co.jp rankings suggests an interesting reality about long tail structures of the online bookstore.
Note that such a simple, rather abstract mathematical, model as stochastic ranking process fits rather well with actual data observed at Amazon.co.jp web. The figure below shows a result of observation of a series of time developement of Amazon.co.jp ranking of a book (77 black circles), together with a result of a best statistical fit to the observed data of a formula obtained as a mathematical consequence of stochastic ranking process (solid curve). The horizontal and vertical axis respectively denotes time in units of hour and the ranking. We see that the black circles and the solid curve is rather close, indicating a good chance that a simple model as stochastic ranking process works as a good model of time developement of actual Amazon.co.jp rankings.
Please refer to the original paper linked in the References at the end of this web page, for details of statistical fits. (I again skipped a lot in this paragraph, to avoid mathematical details...) A consequence of statistical fit clearly shows that, the Amazon.co.jp, in spite of Amazon bookstore's reputation as a pioneering company of web based long tail business model, is not a long tail, in that the vast amount of rarely sold books such as books on mathematics counts little in the total sales of the bookstore, and that the bookstore's sales is essentially supported by a very small titles of best-sellers, as in any standard bookstores.
References to previous studies must be noted.
The mathematical model, which so far has been named
stochastic ranking process, has in fact been noticed since mid 20th century:
M.L.Tsetlin,
Finite automata and models of simple forms of behaviour,
Russian Math. Surv. 18(4) (1963) 1-27.
This pioneering work seems unfortunatelly remained unnoticed for
about ten years, a reason perhaps is that the model has been
given as one of simple examples of automata covered by the general study
of the paper. Meanwhile, the model seems `re-discovered"
(studied without reference to Tsetlin) by
J.McCabe,
On serial files with relocatable records,
Oper. Res. 13 (1965) 609-618,
W.J.Hendricks,
The stationary distribution of an interesting Markov chains,
J. Appl. Probab. 9 (1972) 231-233.
The model seems to have become well-known after the work by Hendricks, and a first peak of published research papers appeared with a name move-to-front rules (see the right figure, which gives a number of papers I found and kept with the keyword move-to-front together with their relevant references, published within each bin of 4 years). The collection is far from complete, nor systematic, but the figure does give an idea that, after the 2 papers in the 1960s, there is a first small peak in the 1970s, and a second and large peak starting from 1980s which, from the observed tendency, may end by 2010. |
A main streem of the research in the move-to-fron rules is its application to theoretical studies of least-recently-used (LRU) caching in the research field of data structure in computer sciences. It is usually the case that there is a hierarchy in the data storage system, with small amount of expensive but quick memory devices, named cache memory, reserved for frequently called data. In using cache memory, one needs an algorithm to decide quickly and costlessly which pieces of data would be more frequently used, or equivalently, which data should be dropped off from the cache each time a new candidate data appears. One of the simplest (quickest and of low cost) algorithm named as LRU caching decides that each called data as likely to be called soon again, and places at the top position in the cache, and the least recently used data will be dropped to make room for the newly cached data. With a bit of thought, we see that this algorithm is equivalent to move-to-front rules (or stochastic ranking process). In this application, it is important to know the cache hit or cache fault probabilities, which evaluate search costs.
As we see, our original motivation was very remote from fashonable and important streem of studies, but a basic mathematical model turned out to be common. After going through these references, we found out that the main consequences of our study are new, in spite of this half a century long history of extensive researches related to the model. Our main results are, as introduced in this page so far,
We also note that, though our motivation had nothing to do with LRU caching, our results on the hydrodynamic limit can be successfully applied to this long existing problem. For details please see the original papers in the reference list at the end of this page, together with further list of references of extensive studies on move-to-front rules and application to LRU caching. Concerning the references, I heartly thank Dr. Nobuaki Sugimine for bringing the keyword `move-to-front" to my attention.
(added 2018/02) Here is a separate web page for major mathematical advance after I wrote the contents above in this page. It focuses on additional stochastic dependence among particles (processes) arising from position depndence of intensity functions.
The page is more for enthusiast on mathematical analysis technique (than, eg., for practical interest on how you can read business success in a specific company from ranking data :-).
270KB pdf file |
---|
180KB pdf file |
---|
120KB pdf file |
---|
240KB pdf file |
---|
200KB pdf file |
---|
270KB pdf file |
---|
480KB pdf file |
---|
160KB pdf file |
---|
180KB pdf file | arXiv:0908.3222 |
---|
210KB pdf file | arXiv:0804.1837 |
---|
170KB pdf file |
---|
150KB pdf file |
---|