A puzzle of apples and oranges

I first read the following wonderful little puzzle on Mathoverflow, where it was posted by David Futer. I don’t know who invented it or where it appeared first – if you have any information, please let me know. (Note: Toshihiro Shimizu points out the following link: AoPS)

There are 99 bags, each containing some number of oranges and some number of apples (any of these numbers can be zero). Show that we can always select 50 bags out of the 99, such that these 50 bags together will contain at least half of all the apples and at least half of all the oranges.

I like this puzzle because there are many different ways to solve it. Before reading on, do yourself a favor and try solving it yourself – it is not difficult, but very rewarding. If you find a proof that is different from those below, let me know and I’ll add it to the list.


Why look for different solutions, once we already have one? Partly because it’s fun, partly because each solution might give some new insight into the problem, partly because we can showcase different mathematical techniques, and perhaps most importantly, because each new proof lends itself to generalization slightly differently – so a new proof might lead to new results. The proofs below are sketchy in places, in order to keep them short – it can be instructive to try to fill in the gaps, or to try coming up with the proof after reading just the name of the technique.

Sometimes proving a more general statement is easier (at least psychologically – by removing distracting elements). In our case, it will be more “fruitful” to prove the following statement:

Given \(n\) bags that contain oranges and apples, we can select \( \lceil \frac{n+1}{2} \rceil \) bags such that they contain at least half of each type of fruit. (we only really care about the case when \(n\) is odd)

Here are all the proofs that I am aware of.

Proof 1. (sorting) (source: I vaguely recall seeing this proof on a math competition website but I can’t find it anymore. If you have any information, please let me know see AoPS link above).

Sort the \(n\) bags in decreasing order of the number of apples (break ties arbitrarily). Select the first bag, and from each following consecutive pair select the one with more oranges (break ties arbitrarily).
It is easy to see that the selected bags satisfy the required properties.

Proof 2. (induction) (source: idea from David Futer).

First we show that for \( n=3 \) the statement is true. Indeed, we can select the bag with the most apples and the one with the most oranges (if the two are the same, we also pick another bag).
Assume that the statement is true for \(n-2\). We then prove it for \(n\):
Consider a “good split” of the \(n-2\) bags into A and B, such that A has at least half of both fruits in A+B (such a split exists by the induction assumption). Now we try adding the other two bags, x and y.
If A has as many of both fruits as B+x or as B+y, we are done (we add the other bag to A).
Otherwise if B+x or B+y have as many of both fruits as A, we are done (we add both bags to B).
Otherwise if A has less apples than B+x and less apples than B+y we are done (add to A the one of x and y with more apples). Symmetric argument for oranges.
Otherwise if A has less apples than B+x and less oranges than B+y, it means that A has less of both fruits than B+x+y, which we treated already. Symmetric argument after swapping the terms “apples” and “oranges”. There are no other cases.

Proof 3. (contradiction or “canonical counter-example”)

Suppose there is no solution, then consider a set of \( \frac{n+1}{2} \) bags which has at least half of the apples (there is always such a set), and among such sets the one with the most oranges possible.
Let this set be A, and the remaining set B (A has at least half of apples and B has more than half of oranges).
Take the bag from A with the least oranges and move it to B. Now take the bag from B with the most oranges and move it to A. If we just moved the same bag back, it means that all bags originally in B have at most as many oranges as the bag with the least oranges in A. This contradicts the fact that B has more than half of all oranges.
Otherwise there are two cases: 1) the number of apples in A drops below half when we move a bag, which means that B becomes a solution, or 2) the number of apples in A remains at least half throughout, then after the swap we increased the number of oranges in A, which contradicts the statement that the chosen set had the highest such value.

Proof 4. (counting) (source: I read this proof written in probabilistic form by Erez Druk, after being asked by Dror Ozeri on a puzzle forum – as it is a closed forum, I cannot link to it)

I write this and the next proof in the n=99 case only, to avoid excessive notation. The number of 50-sets of bags is the same as the number of 49-sets \({99 \choose 49}\). Denote the number of 50-sets that contain at least half of all oranges by \(N_{50}\), and the number of 49-sets that contain at least half of all oranges by \(N_{49}\). We have \(N_{50}>N_{49}\), unless none of the bags contain any oranges, which is an easy case anyway (there is one-to-one mapping from 50-sets to 49-sets, such that each 50-set contains its mapped 49-set). Clearly, more than half of all 50-sets contain at least half of all oranges (otherwise the previous inequality would be violated: if an 50-set does not contain at least half of all oranges, then its complementing 49-set does).
The same argument holds for apples. As more than half of the 50-sets contain at least half of the oranges, and more than half of the 50-sets contain at least half of the apples, there is at least one 50-set containing at least half of both fruits.

Proof 5. (price function) (source: from David Futer)

Assign price \(x\) to an orange, and \(1-x\) to an apple. Select the 50 most expensive bags.
\(f(x) = (\mbox{# of oranges in top 50}) / (\mbox{total # of oranges})\)
\(g(x) = (\mbox{# of apples in top 50}) / (\mbox{total # of apples})\)

We have \(g(0) \geq 1/2, f(1) \geq 1/2\), and \(f\) is increasing, \(g\) is decreasing. Clearly there is some \(x\) for which both \(f\) and \(g\) are at least 1/2, which denotes a solution (if for some \(x\) both functions would have values below 1/2, then the top 50 would not be the top 50, as its complement would have more apples and more oranges, therefore higher total price).

Proof 6. (geometry – Ham-Sandwich) (source: from David Futer, and a variant explained by Dror Ozeri on the same forum)

We represent the bags as sets in the plane, with the restriction that a straight line can cross at most two of them (this can always be achieved). We somehow encode the number of oranges and apples in each bag. There are more possible variants here: map each fruit to a red or blue point (depending on its type); or map all oranges of a bag to a single point with a weight that tells the number of oranges, and similarly for all apples of a bag; or map the whole bag to a single point for which we define an “apple-measure”, and an “orange-measure”. Now we apply the Ham sandwich theorem to obtain a balanced cut (there are different variants of this theorem, we choose the one suitable for our representation). If any “bags” are cut by the obtained line, we move them to the side which has less bags. This gives a valid solution.


  • Prove the following more general statement. Which proof technique is suitable?
    Given \( n\) bags that contain \( f\) kinds of fruits, we can select \( \lceil \frac{n+f-1}{2} \rceil \) bags that contain at least half of all kinds of fruits.
  • Prove the following variant. Which proof technique is suitable?
  • Given 100 bags that contain apples and oranges, we can select 34 bags that contain at least one third of the apples and one third of the oranges.

  • Find a general statement with \(n\) bags, \(f\) kinds of fruits and finding a subset of the bags that contains at least a \(p\)-fraction of the fruits of each kind, for arbitrary \( 0 \le p \le 1\).
  • Assume that we know the contents of each bag. Which proof leads to an efficient algorithm for finding a good subset of bags in the sense described above? In particular, which proof leads to a linear time \(( O(n) )\) algorithm?
  • Can you come up with new proofs for the puzzle? If so, I would love to hear them. Do any of the proofs admit new generalizations?

Thanks to David Futer, Lavinia Dinu, Manuel Arora, and Tobias Mömke for interesting discussions about this puzzle.

Nonlinear speedometer

As everyone knows, excessive speed is one of the main causes of traffic accidents. One of the reasons for reckless speeding is that when we think of the possible impact of a collision, our intuition fools us. We tend to assume that if we go twice as fast (say, at 80 km/h instead of 40 km/h), then the impact of a collision will be twice as large. In fact, the impact of a collision is proportional not to the speed (V), but to the kinetic energy of the vehicle (m/2 * V²), which is transformed to work as the car decelerates to zero. Here m is the mass of the vehicle, which we cannot do much about, but the other term is the square of the velocity. This means that the “impact” of hitting a wall at 80 km/h is four times as large as that of the same collision at 40 km/h.

To capture this intuition, the idea of this post is a speedometer design, that scales as the square of the velocity, to give the driver a more realistic view of the effects of speeding.

nonlinear speedometer
(larger image)


  • after sketching this drawing, I found that in 1995 Goetz Kluge was thinking along similar lines and produced a similar design. There are however some differences – see next.
  • besides showing the speed in the more familiar, circular layout, I also made the decision to draw the smaller lines at uniform density (i.e. they are spaced proportionally to the kinetic energy, not to the velocity). This design might make it harder to estimate the exact speed, say between 40 km/h and 50 km/h, but it makes the increased kinetic energy easier to grasp – when we speed from 120 km/h to 130 km/h, the hand crosses more “small lines” than from 20 km/h to 30 km/h.
  • while I haven’t seen a similar speedometer in any real car, some of the existing speedometers are in fact nonlinear. Unfortunately, they seem to achieve the exact opposite effect to the design shown above. See this example. On the linked image, and in many other modern speedometers, the manufacturers try to put more resolution in at the lower half of the range, dilating the velocities between 0 and 80. My guess is that car makers do this, because the car accelerates faster at low speeds, so dilating the lower range makes the hand movement seem more uniform at all speeds – this is probably aesthetically more pleasing, but it might come at a cost of an increased (false) sense of security, and in effect a reduced safety.

    (I got this exactly wrong – thanks Tobias for pointing it out – it is actually the opposite, on the proposed speedometer design, the hand would move more evenly – so besides being safer, it would be more aesthetical as well, the only downside seems to be the reduced resolution in the low range)

Map of named colors

In this post I present a visualization of all named colors, ranging from the more common to the more obscure, including green, blue, yellow, orchid, lavender, turquoise, acquamarine, azure, etc. etc.

From various sources [1] I downloaded around 2600 such names, I merged the different lists, and slightly edited the entries to remove duplicates and to make the collection more consistent. Then I trained a SOM (Kohonen map) [2] with 6000 units, using the open source toolbox [3] for Matlab. Finally I exported the results with some custom scripts I wrote. This last part was done similarly to the visualization of languages described in my previous post. Otherwise the data was much more reasonable this time than in the previous post: 2600 data points, 3 features (red, green, and blue values) and no missing entries.

The first plot shows the trained SOM map with each hexagonal unit being colored according to its location in the three-dimensional RGB space. The other three maps show only the red, green, and blue values of the units, respectively.

exp_som.png (~6 MB)
Figure 1. SOM map of named colors.

The second plot contains the actual names of the colors and the corresponding hexadecimal RGB values. The goal was to place similar colors near each other as much as possible.

exp_som_text_scale.png (~12 MB)
exp_som_text.svg (~3 MB) (NEW: svg format)
Figure 2. Map with color names.


Name That Color project (and references therein) by
Chirag Mehta.
Wikipedia: List of colors.

Color Space Dimensionality Reduction project by Aubrey Jaffer and references therein. This project includes two dimensional maps of colors created using different techniques.

In this project SOM is used to visualize colors, but only a handful of colors are shown.

Some of the colors in the final list I used seem to have been submitted by users at the Colourlovers website.

[2] Scholarpedia: http://www.scholarpedia.org/article/Kohonen_network
Wikipedia: http://en.wikipedia.org/wiki/Self-organizing_map

[3] SOM Toolbox: http://www.cis.hut.fi/projects/somtoolbox/

Visualizing the languages of the world

I recently came across the WALS (World Atlas of Language Structures) data set [1] which contains structural information about 2676 languages and dialects [2] from all around the world. The WALS website contains a detailed explanation of all the features and it also shows on the world map the geographical distribution of the languages and of the different features. As soon as I saw the data I started thinking about how to visualize it differently in order to highlight similarities between languages and language families.

When laypeople (myself included) naively think about similarities between languages, they mostly consider similarities between words (such as between the English “brother” and German “Bruder”). As the WALS data set ignores the vocabulary, and describes instead the deeper, structural aspects of language, I was hoping to obtain some interesting visualizations.

This post describes in detail the steps I took to visualize the data. If you want to see the resulting images only, scroll down to the end.

1. The data set

The WALS data set contains 2676 languages and 192 features, mostly related to phonology and grammar. For example, one feature describes whether a language has one, two or three grammatical genders, another feature refers to the inflection of nouns, yet another refers to the usual position of the verb in the sentence, and so on.

The features seem to have been meticulously collected and curated, however, many values are missing. More precisely, out of roughly half a million possible entries, only 15% are available. It is not clear whether a missing feature means that it has not been observed, or that it does not make sense for a given language. I assume it is mostly the first case: for languages spoken by very few people it must be difficult to collect reliable data and on some languages there are few publications available. However, there are also cases when a feature is not applicable to all languages. For example, one of the features is “Nasal Vowels in West Africa” which is missing for all languages outside of West Africa. It would improve the analysis if we could distinguish between these two types of missing values, but as I had insufficient information to do this labeling myself, I simply considered all missing values of the same type.

Besides the structural description of the languages, the data set contains the following meta-data: the classification of languages into families, subfamilies and genera, the ISO-code of each language and the geographical location where the language is spoken. The latter information is not always useful, as it points to a single location. In case of Russian, for example, this location is Moscow. For Portuguese, there is no separate entry for Brazilian Portuguese, yet the geographic location indicates only Lisbon.

2. Preprocessing

Looking into the data, one thing that can be observed is that a few languages share the same ISO-code – these seem to be closely related dialects. When this is the case, it seems that only one of the languages has the full description, and for the others only the differences are encoded. Therefore, as a first preprocessing step I copied over features between pairs of languages that have the same ISO-code, whenever a feature was present for one, but not for the other language. Overall, this operation affected the proportion of missing values only marginally (about 100 languages of the 2676 have had some feature copied over). However, this again raises the question of the exact meaning of missing data. If there are other cases when some features are missing because they are “too obvious”, that could distort our analysis – however, we have to live with this uncertainty for now.

As a next preprocessing step, I “binarized” the data. All the features in the data set are “categorical”. For example, feature #52, called “Comitatives and Instrumentals” [3] has the following possible values: 1 for “Identity”, 2 for “Differentiation”, and 3 for “Mixed”.
Without getting into the details of what this feature (or other features) really mean, it is clear that the assignment of numerical values is quite arbitrary here. Therefore, to perform numerical operations on the data further on, it made sense to split this feature into three. The new features (called 52.1, 52.2, 52.3) encode whether the value of this feature is 1, 2, or 3. (for example, a value of 2 is transformed into 010 and 3 is transformed into 001).

Some of the features, however are of “ordinal” type. For example, the first feature has the following five values: 1 “Small”, 2 “Moderately small”, 3 “Average”, 4 “Moderately large”, 5 “Large”. In this case (and for some other features of such type) it is more sensible to binarize the data as follows: 1 becomes 10000, 2 becomes 11000, …, 5 becomes 11111. That is, we split the feature into different parts that encode whether the value is larger than some threshold. I binarized all features in one way or the other (overall I used the “ordinal” method for 12 features of the 192).

I performed these preprocessing steps with some short Matlab scripts I wrote, and in the end I obtained a binary data set with 1141 columns and 2676 rows and (still) most of the entries missing.

3. Dimensionality reduction

Besides the missing entries, the biggest obstacle in the way of visualizing the data is that it lives in 1141 dimensions. As a first step, I reduced the number of dimensions to 30, to make the data easier to handle. Typically, this is performed using a linear projection method called principal component analysis (PCA) [4]. When some data is missing, usually a simple “imputation” method is used, such as deleting the columns or rows with missing entries, or replacing the missing data with column averages. However, when such a large portion of the data is missing, these simple tricks are insufficient.

Instead, we can use a nonlinear version of PCA, tuned especially for the case with many missing values. This method, while reducing the number of dimensions, also attempts to “fill in” the missing values. In this case, the conceptual simplicity and efficiency of PCA is mostly lost, but the methods are still quite efficient. The theory and assumptions behind such methods are described in this paper written by A.Ilin and T.Raiko [5] (disclaimer: former colleagues). Implementations are available in an open source toolbox for Matlab [6] developed by the same authors.

Using PCA when the data is binary is not quite the optimal choice, and there exist variants especially tuned for binary data (see for example our 2009 paper [7] for some theory and experiments), however, as my goal here was not prediction or learning, just visualization, I decided to stick to the simpler methods.

After this operation, we are left with only 30 “aggregate” features and no missing data. We are now ready to visualize the data.

4. Visualization

I first decided to look at some very small subset of the data, therefore I selected the 24 rows corresponding to Romance languages [8] according to their “genus” label. To visualize these data points, I trained a SOM (Kohonen map) [9], using the open source toolbox [10] available for Matlab. This method embeds a two-dimensional grid into the higher dimensional (in this case 30 dimensions) space, such that the grid is “close” to the data points. At the same time the neighbors in the grid exert some “pulling” on each other, which serves as a kind of regularization. In the end we can visualize the grid points in the so-called U-Matrix. For visualization I wrote some custom scripts that plot the output that I saved from the SOM toolbox.

Figure 1. Romance languages U-Matrix.

Here the grid is hexagonal and has 40 cells. On the image we can see each of the 24 data points (Romance languages) mapped to the closest grid cell (in the 30 dimensional space). The color of a cell is indicative of the distances from neighboring cells (brighter color means larger distance), in this way if there are clear clusters, they will appear as dark valleys separated by bright “ridges”.

Our hope with the visualization is that pairs of languages that are similar would be mapped nearby and pairs that are different would be mapped far away. Unfortunately, in general, not all of these constraints can be satisfied, so we can’t always draw conclusions from the fact that two languages appear close to each other. This is where the coloring of the U-Matrix should help. In many cases, though, it seems that proximity does indeed correspond to linguistic similarity (here, the dialects of Romansch [13] are mapped together, just as Romanian together with Moldavian, etc.)

To get some intuition on how the languages were laid out on the map, I made a plot of the original features on the same map. This is on the next image. There are 192 copies of the same hexagonal map (one for each feature). The locations of the languages are the same as in the U-Matrix. On these maps the value of the feature is indicated for each language. The descriptions of the features can be found on the WALS website [11]. Looking at these maps we can check for two languages that appear nearby, on which features they agree. Black is for “missing data”.

Figure 2. Romance languages feature map.

Let’s also plot the geographic location of the languages on the same map. Note however, that this information was not used during the training phase. Here again the languages appear in the same positions as before, and their geographic location in degrees is indicated by color. The goal of this plot is to see if there is some correlation between geographic distance and linguistic difference. We can read off some obvious facts from the plots, such that Canary Islands Spanish is both the westernmost and the southernmost, and Moldavian is the second easternmost after Ladino [13]. Otherwise the number of data points is a bit too small here to observe interesting correlations.

Figure 3. Romance languages Lat/Lon.

On the two previous types of visualizations a hexagonal cell is colored with a single color if there is a single language mapped to it, or if all languages in that cell share the same value for a feature. Otherwise a small pie-chart is displayed showing the share of each value.

Now let’s look at a larger subset. I selected all languages from the Indo-European family of languages [12], according to their “family” label. This meant 176 rows in the data set. The visualizations are similar to the earlier ones, but now I also made an extra plot showing the different genera within this family. Here the black color shows whether a language belongs to a given genus. It can be observed that the clustering on the SOM is in many places consistent with the classification information. This means that the usual taxonomy of languages is largely consistent with the structural linguistic data, which is hardly surprising. There are however, some unlikely pairs of languages placed close to each other on the map. One could examine the data to check in each case whether there is really some structural similarity, or we can simply write it off as noise. We can also observe a quite sharp East/West separation between languages, and a bit less pronounced North/South separation (the latter picture is complicated by an outlier – the southernmost language of the family is Afrikaans [13]). For many of the features a nice clustering of the values can be observed.

Figure 4. Indo-European languages U-Matrix. (large file)

Figure 5. Indo-European languages feature map. (very large file)

Figure 6. Indo-European languages Lat/Lon.

Figure 7. Indo-European languages genera.

Finally, let’s visualize all languages from the data set in a similar way. To restrict attention to spoken languages only, I removed the 40 sign languages from the data set. There remain 2636 languages. The resulting maps are very large, but quite informative, again, there is an apparent clustering that closely resembles the known taxonomy of languages, but other similarity relations can also be observed on the map. As the maps became very large, here I only include the U-Matrix.

Figure 8. All languages U-Matrix. (very large file)

Please note that two languages can be placed nearby even if there is very little similarity between them, especially if one of the languages has very few features filled in. This is a constraint of the method used, but also inherent to the problem itself – we lose information by reducing the number of dimensions. So please think twice before using these plots for drawing conclusions about the relatedness of very distant exotic languages or for supporting such theories :)

To (partially) overcome the previously mentioned problem, I also plot the data when restricted to languages having relatively few missing entries. This also makes the maps somewhat more managable in size. Here are the plots for the data sets having less than 160 missing features.

Figure 9. All languages with sufficient data: U-Matrix.

Figure 10. All languages with sufficient data: Lat/Lon.

Figure 11. All languages with sufficient data: families.

All figures: [link to album]


[1] Dryer, Matthew S. & Haspelmath, Martin (eds.). 2011. The World Atlas of Language Structures Online. Munich: Max Planck Digital Library.

[2] “A language is a dialect with an army and a navy” – Max Weinreich
The question of what constitutes a language and the distinction between language and dialect is a difficult, and often emotionally and politically charged question – for example, in this data set Romanian and Moldavian appear as different languages, whereas many consider them to be too similar even to be considered separate dialects.

[3] Stolz, Thomas & Stroh, Cornelia & Urdze, Aina. 2011. Comitatives and Instrumentals.
In: The World Atlas of Language Structures Online. Max Planck Digital Library, chapter 52.

[4] https://en.wikipedia.org/wiki/Principal_component_analysis

[5] Ilin, Raiko: Practical Approaches to Principal Component Analysis in the Presence of Missing Values, JMLR, 2010.

[6] PCA with Missing Values software: http://users.ics.aalto.fi/alexilin/software/

[7] Kozma, Ilin, Raiko: Binary Principal Component Analysis in the Netflix Collaborative Filtering Task, MLSP, 2009.

[8] https://en.wikipedia.org/wiki/Romance_languages

[9] Scholarpedia: http://www.scholarpedia.org/article/Kohonen_network
Wikipedia: http://en.wikipedia.org/wiki/Self-organizing_map

[10] SOM Toolbox: http://www.cis.hut.fi/projects/somtoolbox/

[11] WALS features: http://wals.info/feature

[12] https://en.wikipedia.org/wiki/Indo-European_languages

[13] Romansch: http://en.wikipedia.org/wiki/Romansh_language
Ladino: http://en.wikipedia.org/wiki/Judaeo-Spanish
Afrikaans: http://en.wikipedia.org/wiki/Afrikaans

Disk intersection game

Here’s a small puzzle/game that I made, using a simple geometric concept. You can try it here: disk intersection game. The game idea is loosely inspired by the game planarity, but it is also significantly different.

disk game screenshot

There are some disks, and you can move them around arbitrarily (drag with the mouse). The goal is to make all of them “good”. When a disk is “good”, its color turns to green. The other colors indicate different levels of “badness”, as shown in the legend to the right. With some experimentation you can probably discover what you need to do to achieve this, but here is an explanation of the idea:

There is a hidden “model”, created randomly, which you need to discover. The model determines how the disks need to be interconnected, meaning which pairs of disks have to intersect each other (overlap with each other). If you “realize” this model, by moving the disks in such a position as to intersect each other in the required way, you win the game. The colors of the disks indicate how far the current configuration is from the correct model. This is computed as follows: if a certain disk should intersect disk A and no other disks, but instead it intersects disk B and C, and it doesn’t intersect A, then its “badness” is 3, so it has the color corresponding to the number 3. In other words, “badness” shows how many errors there are due to a given disk. When the “badness” of all disks is 0, you have solved the puzzle.

If you find the color codes hard to recognize, you can click the “#” button, and the “badness” will appear as a number. You can also ask for help three times during the game. Clicking the help button will display the model that has to be realized (i.e. the connections between the disks, shown as lines). There are easy and hard games, in a hard game the model tends to be more complicated, therefore harder to discover.

Try the disk intersection game.


  • I came up with the idea for the game, while working on a problem related to unit disk graphs. This is the first iteration of the game, so probably the gameplay and the design could be significantly improved. If you have any suggestions, please let me know in the comments.
  • There is nothing special about disks here, maybe other intersection graphs would make more interesting (or more challenging) games.
  • One reason I wrote the game, besides wanting to try out the idea, was to get familiar with HTML5 Canvas – in the future I’d like to make another, somewhat more complex game. As this was my first encounter with Canvas, and because I wrote the game in one sitting (~3 hours), the implementation is a bit rough around the edges. Please let me know if you find any bugs or if you have suggestions.

Books I read in 2012

Here is the list of books that I read (in English) during 2012, or at least those that I found interesting enough to describe in one or two sentences each.

Part 1

Philip K. Dick: Do Androids Dream of Electric Sheep? The inspiration behind the Blade Runner movie, including replicants and the Voight-Kampff test but excluding the Tannhauser Gate and C-Beams. Wonderfully coherent science fiction, both deeper and psychologically better motivated than the movie.

Andre Agassi: Open: An Autobiography. Entertaining and motivating. The inner dialogues during tennis matches are especially interesting, as is the (one-sided) description of the rivalry with Sampras. His great comeback is inspiring and fun to read, the tone becomes slightly preachy towards the end.

Jessica Livingston: Founders at Work: Stories of Startups’ Early Days. Interviews with famous and obscure founders of companies. The selection is excellent, the interviewees are diverse, and each interview touches on some different aspect of startup-life, although there are many recurring themes. The book is consistently good, radiating with optimism.

Jack Kerouac: On the Road. A book that needs no introduction, obviously. Probably it means different things to different people, I just found it thoroughly entertaining and the style refreshingly sharp and concentrated. The characters in the plot seem to be searching for the ultimate experience with varying levels of success.

Part 2

Wilhelm Reich: The Mass Psychology of Fascism. Unlike Reich’s later, controversial stuff, this book is lucid and coherent, the main theses are quite convincingly argued. The strictly historical account is interesting in itself, but the book also explores more general themes, like the connection between the appeal of totalitarianism, fake morality (especially around sexuality), irrational mysticism. (1933)

John Maynard Keynes: The General Theory of Employment, Interest and Money. As Keynes appears to be as influential as ever, this book is probably more often debated than actually read. Which is a pity, although not an easy read, the book is written with amazing clarity. The style is elegant and old-fashioned, academic and precise: Keynes is careful to delimit the range of applicability of what he says (I suppose most detractors take his ideas outside of this range). But once you accept the boundaries he sets, there is a certain inevitability to his claims: given this and that, “ceteris paribus”, this and that relation has to hold between this and that economic quantity. The book is worth reading for the precise definitions of economic terms alone. It would be foolish to claim that I fully understood (let alone retained) a large fraction of the ideas in the book, and I did skip pages in some of the chapters that I found less interesting, but overall, it was a joy following through the arguments. (1936)

F. A. Hayek: The Road to Serfdom. Just as in the case of Keynes, the debates surrounding this book seem to be much less interesting than the ideas in the book itself. First, one would assume that the advocation of “economic liberalism” would be at odds with “keynesianism”. However, Keynes himself warmly praises the book in his review. Here, the main theme is the relation between individual and state. Hayek convincingly argues that the difference between the political “far left” and “far right” is less interesting than the difference between totalitarianism and classical liberalism. Hayek derides many forms of centralized planning, and describes the harmful mechanisms set in motion by them, as well as the transition from an all-is-allowed-except-… to an all-is-forbidden-except-… society. The historical accounts, particularly about Germany, are insightful, however, his predictions seem to have been either mistaken, or at least on a different timescale than plausibly assumed (or perhaps too influenced by the times in which the book was written). (1944)

Primo Levi: Survival in Auschwitz. Originally titled “If This Is a Man”, the book relates the arrest, incarceration and eventual liberation of the author from the Auschwitz death camp. It describes the dehumanizing experience in an understated, precise way. Despite the tragedy beyond description, the book offers a vision of hope and humanity.

Part 3

Rosamund Stone Zander, Benjamin Zander: The Art of Possibility. A very nice collection of anecdotes and distilled advice, mostly about how to defuse conflicts and how to look at difficult situations from a new perspective. The authors draw from their vast experience in music, education and counselling/consulting. It does get too self-congratulatory at places, but as far as self-help-feel-good-type books go, it is surprisingly good.

Victor Pelevin: Omon Ra. Quirky, cynical, intelligent science fiction. At the surface, a satire of the Soviet space program (or rather, of its caricature), in reality, a more universal tale about aspirations, ideology, and the human condition.

George Orwell: Animal Farm. Again, a classic about which it is impossible to say anything new. Despite being intented (apparently) as a satire of events in a concrete place and time, the book is as relevant and as universally accessible today as ever. (And this isn’t something new to say about it either.)

John E. Littlewood: Littlewood’s Miscellany. “The surprising thing about this paper is that a man who could write it – would.” The book contains many similar quips, more or less connected to mathematics or the process of doing mathematics. There are also many interesting and deep mathematical puzzles (some of them have become classics since the book was written), linguistic paradoxes, short anecdotes about mathematicians, amusing mathematical errors, unintentionally funny formulations from books or papers, references to contemporary results. The book is a collection of fragments assembled by Littlewood, refreshingly lacking any serious organization. Written in 1953, it seems to have stood the test of time, the few parts that feel outdated do so only because of the attention to detail of the author that seems uncommon today.

Part 4

I.M. Gelfand, Mark Saul: Trigonometry. It is often claimed that in order to effectively teach a topic, one needs to know more than what is being taught. Taking this advice to the extreme, here one of the great mathematicians of the twentieth century teaches early high-school level trig. The result is predictably pleasing: the right mix of formal clarity and informal insight, it seems to anticipate and answer any question the reader might have and provides the right level of challenge at every point. If this much thought went into every textbook, the state of math education would be quite different. Part of a series on high school math by the same author.

G-C. Rota: Indiscrete Thoughts. A wonderful collection of essays loosely related to mathematics. The first part contains anecdotes about famous mathematicians of the twentieth century, often focusing on their character flaws and amusing aspects of their life – part of the reason for “indiscrete” in the title. Most interesting – and perhaps least indiscreet – is a moving tribute to Stanislaw Ulam, mixed with nostalgic observations about pre-war Central Europe. The second part contains various essays about topics such as the philosophy of mathematics and science, mathematical discovery, philosophy of the mind, aesthetics in mathematics, etc. While less colourful and entertaining than the first half of the book, Rota manages to keep it readable and interesting for the most part. The book ends on a lighter tone, with observations about concrete mathematical fields, tips on how to do mathematics, and career advice of sorts ranging from time-management to creativity – refreshingly unconventional and thought-provoking.

G. Boltjansky, I. Gohberg: Results and Problems in Combinatorial Geometry. A beautiful small book containing three general geometric problems, which are shown to be intimately related. The discussion is entirely elementary, using only early high school level mathematics, but the book highlights deep connections between disparate fields and ends with difficult open problems.

Douglas Hofstadter: Metamagical Themas. A collection of Hofstadter’s columns for Scientific American. The essays are self-contained and easy to read. Due to the format, the author had less of a tendency of trying to tie up all loose ends, than in Gödel-Escher-Bach, maybe for this reason, I enjoyed the book much more than GEB. Some of the topics, such as the “prisoner’s dilemma” may have been new at the time Hofstadter wrote about them, but have become mainstream and have been discussed endlessly since then. Other topics include self-reference, typography, analogies, Rubik’s cube, sexism in language, consciousness, creativity, Lisp. Due to the format, the level of interestingness in the book is uneven, but I found most of the chapters stimulating and thought-provoking. The richness of ideas and the variety of topics make this my favorite book of 2012.

What was the best book you read in 2012?

Obstruction game

Some years ago I was reading about the game of Nim, then a bit more about the theory of impartial games, and as the rabbithole seemed to go deeper and deeper, I needed some pretext to continue. So I came up with the rules of a simple paper-pencil game that I could try to analyze with the theory that I was reading about. I never managed to fully analyze the game, but it seemed to be reasonably fun to play anyway. Later I found that some very similar games have already been described.

The rules are simple: two players take turns in marking squares on a grid (say, of size 8*8). You can only mark a square if all of its neighbors (including the diagonal neighbors) are empty. The first player unable to move loses.

More details of the game are here.

Recently, I had the pleasure of discovering a website that contains the descriptions of many paper and pencil games, most of them playable on the website against the computer. The site is maintained by David Johnson-Davies. In fact, I found out about the website from him, when he told me that he also included the game I described above. The game is filed under the name “Obstruction game” which very well describes what it is about. You can even play it against the computer. Here is the website with all the games and here is the link to the Obstruction game. There is even a description with some useful strategy-tips.

Happy playing!

Ordering cyclic graphs

Suppose there are n sports teams, some of them play each other, in each game one team wins and the other loses and in the end we want to order the teams such that if A beats B than A is before B in the order. If two teams A and B played each other multiple times but, say, A clearly dominated B, we consider as if they would have played once and A beat B. If they played each other several times and there is no clear winner, or if they played only one draw, we consider as if they would have played twice and once A won, once B.

We can represent the whole situation as a directed graph with n vertices {1, …, n} and an edge from i to j if i won against j. Clearly, there are no self-loops. If the graph has no cycles, we can efficiently compute a topological sorting of the vertices and we are done. What if the graph has cycles (two teams that played a draw, or a sequence of teams where A beat B, B beat C, …, X beat A), but we still want to output a reasonable ordering? What is a reasonable ordering?

This is the vertex ordering problem and we can formulate many different criteria. I’ll go through some variants and mention for each of them, whether they are NP-hard or polynomially solvable. It turns out that almost all of them are NP-hard, and those which are not are quite trivial (plus the topological sorting mentioned before, which is also polynomially – in fact, linearly – computable). I’ll try to refer to sources where I found some, where I didn’t find any, the result is simple enough to be considered folklore. Not all the criteria that I list are reasonable – a reasonable criterion should actually lead to the topological sorting when there are no cycles. Did I forget some sensible criteria? Let me know what you think.

The general set-up is the following: we search for a permutation f:[n] -> [n], such as to minimize or maximize some quantity as follows:

1. The first criterion is that we want to minimize/maximize some reasonable function of the number of wins and the number of losses of a team. This includes the methods used in most sport tournaments where wins and loses are worth some number of points and in the end the teams are ranked by their accumulated number of points. This criterion is easy, because we can just compute the indegree/outdegree of each vertex, compute the necessary function values and sort vertices according to it.

Now we move to some more interesting criteria.

  • We either minimize (MIN) or maximize (MAX) some quantity.
  • The quantity is either the SUM, the MAXimum, the MINimum or the COUNT of something.
  • That something is either the length of EDGEs, the length of BACK-EDGEs, or SIGNED length of EDGEs.

This gives 2*4*3=24 cases.

For example, if i->j means that there is an edge from i to j:

MIN COUNT EDGE refers to    

MAX SUM EDGE refers to    

MIN SUM BACK-EDGE refers to    

MAX MIN SIGNED-EDGE refers to     .

These are all for directed graphs, as mentioned earlier. Before starting to enumerate all these cases, let’s define the same problem for undirected graphs. For undirected graphs there are less cases, as we don’t have to consider separately any edge or back-edge or signed edge. Thus there are just 2*4=8 possibilities here.

Now let’s see what we’ve got, numbering continuously all the cases.

Undirected graphs

2. MIN SUM: This is called Minimum Linear Arrangement, a known NP-hard problem.

3. MIN MAX: This is called Minimum Bandwidth, a known NP-hard problem.

4. MIN MIN: makes no sense, it can always be 1.

5. MIN COUNT: makes no sense, it is always the number of edges.

6. MAX SUM: this is equivalent to 2. on the complement graph, therefore NP-hard.

7. MAX MAX: makes no sense, it can always be n-1.

8. MAX MIN: This is the anti-bandwidth problem, studied in literature, see here, or here (leads to paywall). It is NP-hard. This can be seen via the following reduction: “Does G have a Hamiltonian path?” is the same as “can the vertices of G be arranged such that there are edges between neighbors?” is the same as “can the vertices of G be arranged such that the complement of G has anti-bandwith at least 2?”. If we could answer the last question efficiently, we could answer the first.

9. MAX COUNT: makes no sense, it is always the number of edges.

Now we move on to the original question:

Directed graphs

10. MIN SUM EDGE: NP-hard. From 2. if we replace every undirected edge with a directed edge.

11. MIN SUM BACK-EDGE: NP-hard. From 2. if we replace every undirected edge with two directed edges.

12. MIN SUM SIGNED-EDGE: this is quite interesting. Notice that we can replace a path a->b->c with a single edge a->c without changing the sum. If we do this sufficiently many times we will be left with vertices with only incoming or only outgoing edges. To minimize the sum we clearly have to put vertices with outgoing edges first, then vertices with incoming edges. Otherwise we could make a simple replacement that would decrease the sum. Furthermore, vertices with outgoing edges have to be ordered in decreasing order of degree, vertices with incoming edges ordered in increasing order of degree. Again, otherwise, a simple swap would decrease the sum. That’s all.. Observe that what we’ve got is the same as ordering the original vertices in decreasing order of [outderee-indegree]. In this sense, this is a special case of 1.

13. MAX SUM EDGE: NP-hard. From 6. if we replace every edge with a directed edge.

14. MAX SUM BACK-EDGE: NP-hard. From 6. if we replace every edge with two directed edges.

15. MAX SUM SIGNED-EDGE: solution is the reverse order of 12.

16. MIN MAX EDGE: NP-hard. From 3. if we replace every undirected edge with a directed edge.

17. MIN MAX BACK-EDGE: NP-hard. From 3. if we replace every undirected edge with two directed edges.

18. MIN MAX SIGNED-EDGE: NP-hard. From 3. if we replace every undirected edge with two directed edges. Known as Minimum Directed Bandwidth.

19. MAX MAX EDGE: makes no sense, it can always be n-1.

20. MAX MAX BACK-EDGE: makes no sense, it can always be n-1.

21. MAX MAX SIGNED-EDGE: makes no sense, it can always be n-1.

22. MIN MIN EDGE: makes no sense, it can always be 1.

23. MIN MIN BACK-EDGE: makes no sense, it can always be 1 and has to be at least 1 if there are cycles.

24. MIN MIN SIGNED-EDGE: makes no sense, it can always be -(n-1).

25. MAX MIN EDGE: NP-hard. From 8. if we replace every undirected edge with a directed edge.

26. MAX MIN BACK-EDGE: NP-hard. From 8. if we replace every undirected edge with two directed edges.

27. MAX MIN SIGNED-EDGE: NP-hard. From 8. if we replace every undirected edge with two directed edges.

28. MIN COUNT EDGE: makes no sense, it is always the number of edges.

29. MIN COUNT BACK-EDGE: NP-hard, known as Minimum Feedback Arc Set.

30. MIN COUNT SIGNED-EDGE: makes no sense, it is always the number of edges.

31. MAX COUNT EDGE: makes no sense, it is always the number of edges.

32. MAX COUNT BACK-EDGE: NP-hard, it is the reverse order of 29.

33. MAX COUNT SIGNED-EDGE: makes no sense, it is always the number of edges.

Phew :) Did I forget any interesting criteria?

From the point of view of the original motivation (a ranking of teams) the criteria that seem to some extent reasonable are: 11, 12, 17, 18, 29.

Self-counting sentences II

Let’s revisit the topic of the previous post on self-counting sentences. We looked at sentences like this:

In this sentence the number of occurrences of 1 is __, of 2 is __, ..., of n is __.

Let’s go up one more step on the ladder of abstraction, to arrive at sentences of the type:

In this sentence, there are 3 numbers appearing 1 time, 1 number appearing 2 times, 1 number appearing 3 times, 0 numbers appearing 4 times.

In this sentence, there are 2 numbers appearing 1 time, 3 numbers appearing 2 times, 0 numbers appearing 3 times, 0 numbers appearing 4 time.

Observe, that both say the truth about themselves. Below are three puzzles. In all cases the task is to fill in the gaps such as to make the sentence say the truth and the questions are: for what values of n does a solution exist and what are the solutions (if they exist).

In this sentence there are __ numbers occurring 1 time, __ numbers occurring 2 times, ..., __ numbers occurring n times.

In this sentence, from the numbers 0,1, ..., n-1, there are __ numbers occurring underlined 0 time, __ numbers occurring underlined 1 time, ..., __ numbers occurring underlined n-1 times.

In this sentence there are __ numbers occurring underlined 1 time, __ numbers occurring underlined 2 times, ..., __ numbers occurring underlined n times.

Note: I submitted the first variant of the problem to Kömal, where it appeared in the 2013 September issue (Hungarian, English – unfortunately, as of this writing, the English version appears mistaken there). Later, Toshihiro Shimizu pointed out that a formulation essentially equivalent to the second variant above appeared earlier as an IMO problem (Combinatorics 5)







The second problem is equivalent to the first problem. There exists a solution to problem 1, if and only if the same sequence of numbers is a solution to problem 2 (note that in problem 2 the counting starts from 0). Therefore we only look at the first and the third problems.

Solution of problem 1:

For small values of n, we can find the solutions by trial and error or with a small program. We denote the numbers of the solution as f(1), f(2), …, f(n).

For n=1, there is a single solution: f(1)=2.

For n=2, there is a single solution: f(1)=4, f(2)=0.

For n=3, there is a single solution: f(1)=4, f(2)=1, f(3)=0.

For n=4, there are two solutions (see the beginning of the post).

For n=5, n=6, there are no solutions.

For n=7, there are two solutions: f(1)=4, f(2)=3, f(3)=0, f(4)=1, f(5)=0, f(6)=0, f(7)=0, and f(1)=5, f(2)=1, f(3)=1, f(4)=1, f(5)=0, f(6)=0, f(7)=0.

For n=8, there are two solutions: f(1)=5, f(2)=2, f(3)=1, f(4)=1, f(5)=0, f(6)=0, f(7)=0, f(8)=0, and f(1)=5, f(2)=3, f(3)=0, f(4)=0, f(5)=1, f(6)=0, f(7)=0, f(8)=0.

For every n>=9 there are exactly three solutions:
a) f(1)=n-3, f(2)=3, f(n-3)=1, and the remaining values 0,
b) f(1)=n-2, f(2)=1, f(4)=1, f(n-4)=1, and the remaining values 0,
c) f(1)=n-3, f(2)=2, f(3)=1, f(n-4)=1, and the remaining values 0.

These solutions clearly work. Let’s prove that no other solutions exist.

We can observe two properties of the sequence f(i):

Property 1: ∑ i * f(i) = 2n. (where i goes from 1 to n)
There are 2n numbers in the sentence and each i*f(i) term counts how many numbers are there with multiplicity i. To have a multiplicity greater than n would mean that all gaps are filled with the same value, which cannot yield a true sentence for any value. Thus the possible multiplicities are between 1 and n, therefore ∑ i * f(i) gives the total number of numbers in the sentence, which is 2n.

Property 2: ∑ f(i) = n+1. (where i goes from 1 to n)
Clearly, for all i, f(i)>=0 and f(i)<=n. If for all i, f(i)>0 would hold, ∑ i * f(i) would be larger than 2n, violating Property 1. Therefore, some of the f(i) values have to be equal to 0, which means that all numbers 0,1,…,n appear in the sentence. Furthermore, all numbers that appear in the sentence have multiplicity between 1 and n. Thus, ∑ f(i) gives the total number of distinct values, which is n+1.

Now we prove that a) b) c) are the only solutions for n>=9.

If f(1)>n-2,
it must be that f(1)=n-1, but now n-1 occurs twice, so to maintain that n-1 values occur once, we need to write n-1 in every other position as well. This is not a correct solution.

If f(1)=n-2,
we must have exactly one other nonzero value (besides n-2) written in the gaps. We cannot write n-2 anywhere else, as that would violate Property 2. To get a sum of n+1, the possibilities are writing one time 3, or three times 1. Writing 3 anywhere doesn’t make the sentence correct, so it remains to write three 1’s. But then the number of 0’s will be n-4, the number of 1’s will be 4 and the number of (n-2)’s will be 2. This gives us solution b).

If f(1)=n-3,
we have two other nonzero values (besides n-3) written in the gaps. They also have to add up to 4 (Property 2). That can be as 3+1 or as 2+1+1. The first case means that 1,3 and n-3 have multiplicities 2 and 0 has multiplicity n-3. The second case means that 2 and n-3 have multiplicity 2, 0 has multiplicity n-4 and 1 has multiplicity 3. These cases uniquely lead to solutions a) and c).

If f(1)<n-3,
suppose f(1)=n-k (where k>3). The value 0 has to appear more than one time (otherwise the sum of Property 1 would be too large). Therefore, among the values 1, …, n, there are k distinct values which are written somewhere in the gaps (to make their multiplicities larger than one in the whole sentence). We know that n-k appears in the first gap, so we have k-1 other values in the gaps starting from the second. Let’s denote these values by a1 < a2 < . . .  < ak − 1. Since they are written in the gaps after the first one, it follows that there are at least a1 + a2 + . . .  + ak − 1 distinct numbers that appear more than once in the sentence. Even if one of them is n-k, and one of them is 0, that still leaves S = a1 + a2 + . . .  + ak − 1 − 2 distinct numbers appearing more than once (therefore, also in the gaps). Since a1 ≥ 1, we have S >= k(k-1)/2 – 2. Even if the smallest of these numbers is 1, we get that f(i) >> n+1, for k>3. This violates Property 2, therefore no solution of this kind is possible.

Solution of problem 3:

For n=1, we have f(1)=1, unique solution.
For n=2, we have f(1)=2, f(2)=0, unique solution.
For n=3, we have f(1)=1, f(2)=1, f(3)=0, unique solution.
For n=4, we have f(1)=2, f(2)=1, f(3)=0, f(4)=0, unique solution.

For n>=5, we have exactly two solutions:
a) f(2)=1, f(n-2)=1, the remaining values 0
b) f(1)=2, f(n-2)=1, the remaining values 0.


Similarly to the first problem, now we have the property:
∑ i * f(i) = n.

Clearly f(n)=0, otherwise all values would have to be equal, which is not a correct solution.
Also, f(n-1)=0. The only alternative would be f(n-1)=1, and all other values equal to zero, but this is not a correct solution either.

If f(n-2)=1 we need that ∑ i * f(i) over all other values is 2. This leads to solutions a) and b) only.
f(n-2)>=2 is not possible, as it would violate the property for n>=5.

Now suppose f(n-k) is the last nonzero value, for k>=3. This means that there are k numbers that are not equal to one of the most frequent values. If 0 is one of the most frequent values, then there are k nonzero values. All of them cannot be equal to 1, as that would not be a correct solution, so ∑ i * f(i) >= k(k-1)/2 + n-k + 1, which is larger than n for k>=3, violating the required property. On the other hand, if less than half of the elements are 0, the property is clearly violated, which shows that there are no other solutions.

Self-counting sentences

In this sentence the number of occurrences of 1 is 2, of 2 is 3, of 3 is 2, of 4 is 1.

This self-referential, true sentence is a variant of a puzzle attributed to the logician Raphael Robinson, popularized in Douglas Hofstadter’s book Metamagical Themas. In the book the puzzle appears in the following form:

Fill in the gaps such as to make the sentence true:

In this sentence the number of occurrences of the digit 0 is __, of digit 1 is __, of digit 2 is __, ..., of digit 9 is __.

With some trial and error you can find two different solutions, and it has been shown that no other solutions are possible. The problem has also been studied using some theory from the field of dynamical systems. Here are two interesting links, both containing the solution of the puzzle.

In this post, however, I would like to look at a slightly different puzzle. Instead of counting digits, I am counting the numbers as indivisible units. If, for instance, “15” appears in the sentence, I count it as “15”, not as one “1” and one “5”. In this way the problem is independent of the base in which the numbers are represented.

In both of the following two puzzles the task is to fill in the gaps such as to make the sentences true. In the first sentence the values can be between 1 and n (including 1 and n), in the second sentence the values can be between 0 and n-1 (both included). For what values of n are the problems solvable? How many solutions are there?

In this sentence the number of occurrences of 1 is __, of 2 is __, ..., of n is __.

In this sentence the number of underlined occurrences of 0 is __, of 1 is __, of 2 is __, ..., of n-1 is __.

Think about the puzzles for some time, or scroll down for the solution. Let me know if you find a simpler solution. I also wrote a follow-up post, with a different variant of the problem.







It can be shown that the two problems are equivalent: let’s denote the numbers written in the gaps of the first sentence by f(1), …, f(n). It can be seen easily that f(1), …, f(n) form a solution for the first problem if and only if f(1)-1, …, f(n)-1 form a solution for the second problem. So it’s enough to look at the first puzzle only.

With some case-based analysis we can show that there is no solution for the cases n=1, n=2, n=3 and n=6.
For n=4, there are two solutions, the one given in the beginning of the post and the following: f(1)=3, f(2)=1, f(3)=3, f(4)=1.

For n=5, the solution is given by the sequence: f(1)=3, f(2)=2, f(3)=3, f(4)=1, f(5)=1 and it is unique. These can be verified manually.

For every n>=7, there is a solution, given by f(1)=n-3, f(2)=3, f(3)=2, f(n-3)=2, and f(k)=1, for every k other than 1,2,3, and n-3. Let’s prove that this is the only solution:

Suppose there is some other solution.
If f(1)=n-3, then f(n-3)>=2.
—> If f(n-3)=2, then we need f(2)>=3.
——> If f(2)=3, then we get the same solution.
——> If f(2)>3, then there exist distinct x and y (also different from n-3, 1 and 2), for which f(x)=f(y)=2, so we have a value different from 1 at places 1,2,x,y,n-3, which means that we don’t have enough places to put 1’s, to satisfy f(1)=n-3.
—> If f(n-3)>2, then we have some x (different from 1 and n-3), for which f(x)=n-3. But then there are n-4 places where we need to put x, again not leaving enough places to satisfy f(1)=n-3.

If f(1)>n-3, then f(f(1))>=2.
—> If f(f(1))=2, then we need f(2)>=3 and f(f(2))>=2. We ran out of places for 1’s, since 1,2,f(2) and f(1) are distinct and they map to values different from 1.
—> If f(f(1))>2, then there is some x (other than 1) with f(x)=f(1)>=n-2, which means that we need to fill at least n-3 places with the value x, not leaving enough place for the 1’s.

The interesting case remains when f(1)<n-3. This condition means that in at least 5 places we need to write a number different from 1.
Suppose there are exactly k places, corresponding to the indices x1, …, xk, where we have an entry different from 1. This means that f(x1), …, f(xk) are different from 1 and k>=5. Let x1=1, since we know that in place 1 we have an entry different from 1.

Observe that among f(x1), …, f(xk) there have to be exactly k-1 distinct elements. This is because we need a value different from 1 in all places 1, f(x1), …, f(xk), and unless there are exactly k such places, we reached a contradiction. Since we have k-1 distinct elements all different from 1, the maximum among them is at least k and the second largest element is at least k-1. This means that there is a number t different from 1, such that f(t)>=k-1, therefore, t appears as a count in at least k-2 places. If out of k counts we have k-2 equal, we cannot have k-1 distinct values, so we reached a blatant contradiction. Therefore this case is impossible and the solution given above is unique.