How to make the IEEEtran bibliography font smaller

It’s ridiculously over spaced by default, and in addition to correcting this, you may want to reduce the font size to get that extra half page you desperately need to eke out to meet the page limit.

To accomplish these two goals, put the following in your preamble:

\renewcommand{\IEEEbibitemsep}{0pt plus 2pt}

I know just slightly more than exactly enough to get myself in trouble, so caveat utilitor.

Morals, smorals

I just spent several disappointing minutes on Amazon checking out two books by Pinker and Harris recommended on one of the atheist podcasts I follow, both with theses relating to the development of morals. Unsurpisingly, I found a scathing review of Harris’s book— full disclosure: I *really* *really* don’t like the guy, ever since I bought his book “The End of Faith” on recommendation from another atheist podcast, and was disgusted by some comments he made to the effect of how it’s ok to preemptively kill Muslims —, but I was surprised to also find several criticisms of Pinker’s book that make the case that he engages in a horrendous amount of cherry-picking. So, where can I find a scientifically rigorous look at morals that delves into philosophical implications of an atheistic worldview? I don’t know…

But the point of this post was less to kvetch and more to point readers to the full text of Crane Brinton’s “History of Western Morals“, one of my favorite books by one of my favorite authors. I wish I’d discovered this sooner.

Spark for Linear Algebra: don’t

Here are some pretty pictures of empirical orthogonal functions (principal components in climatology jargon) calculated for the 3D field of ocean temperatures using data collected over 32 years, generated by computing a rank 20 PCA of a 2 Tb matrix.

surface temperature portion of first EOF

surface temperature portion of first EOF

surface temperature portion of second eof

surface temperature portion of second eof

surface temperature portion of third eof

surface temperature portion of third eof

mean sea surface temperature

mean sea surface temperature

first two years of first temporal eof

first two years of first temporal eof

second temporal eof

first two years of second temporal eof

first two years of third temporal eof

first two years of third temporal eof

I’ve been doing a lot of programming recently, implementing truncated SVDs in Spark on large datasets (hence the pictures above). I’m new to this game, but I suspect that Spark should only be used for linear algebra if you really need linear algebra as part of your larger Spark workflow: in my experience, you have to fiddle with too many esoteric knobs (e.g., several different memory settings, several network timeout settings, using the correct type of serialization at the correct location within your code) that are dependent on the size and nature of your data to make Spark run without strange errors, much less run efficiently. Then there’s the annoying JVM array size restriction: arrays are indexed by 4-bit integers, so can have at most 2^31-1 (about 2.147 billion) entries in an array. Thus a matrix with 6 million rows can only have about 360 columns, and the largest square array you can form is about 46K squared. Another way to look at this is that an array of Java floats can hold at most about 8.5 Gb. Compare this with C/C++ where your array size is essentially limited only by the amount of memory on your machine.

If *all* you’re doing is linear algebra, I recommend using a purpose-built tool, like Skylark (which we’re working on making callable from Spark). Possibly the only serious advantage Spark has over MPI-based solutions is that it can theoretically work with an unlimited amount of data, since it doesn’t need to store it all in RAM. You just have to be very careful how you process it, due to the aforementioned JVM array size restrictions.

Installing Spark with Hadoop 2 using spark-ec2

YARN does not seem to be configured correctly when you use the spark-ec2 script to install a Spark cluster on EC2. Here’s my short workaround for getting YARN to work (with a simple python script at the bottom):

  1. launch a cluster with e.g. spark-ec2 -k <keyname> -i <keyfile> -s --instance-type=<type> --placementgroup=<placementgroupname> --hadoop-major-version=2 --copy-aws-credentials launch <clustername>
    This automatically copies your AWS access keys into the core-site.xml configuration file for Hadoop so you can pull data from S3 into hdfs. Unfortunately, Hadoop is configured to use Yarn, but the Yarn installation is broken. The next couple steps will fix this (they are outlines, run the python script at the bottom after exporting your AWS keys in step 3 to implement them).
  2. ssh into the cluster master; you can use spark-ec2 get-master <clustername> to get the public dns for the master
  3. export AWS_ACESS_KEY_ID=<key> and export AWS_SECRET_ACCESS_KEY=<key>
  4. shut down yarn, the (ephemeral) hdfs, tachyon and spark
  5. change the mapred-site.xml and yarn-site.xml configuration files of (ephemeral) hdfs to correctly configure YARN
  6. open the ports 8025,8030,8040 (and maybe 8033, 9000) of the master group to the slave group
  7. copy the (ephemeral) hdfs configuration files to all the slave machines
  8. start up (ephemeral) hdfs, yarn, tachyon, then spark in this order

You should now be able to pull data from s3 (using s3n:// urls) to hdfs, use hadoop, run spark jobs, etc.

Run this python script on the master to implement steps 4–8. You may need to open some ports manually in the master security group (8033, 9000) etc … check the yarn log files under /mnt/ephemeral-hdfs on the master and a slave if you have issues.

My thoughts on the confederate flag

The Confederate flag is literally a symbol of treason. Not only that, but the most treasonous act in the entire history of the United States.

Anyone who buys into the ‘States Rights’ explanation for the Secession either doesn’t understand what rights we’re talking about, or doesn’t care: the right to slavery. Elide motivations all you want, the southern states withdrew because with the entry of the western states to the union and the North blocking the expansion of slavery into the west, they felt their way of life and economic foundations being threatened. ‘States Rights’ is a euphemism for the right to slavery. If you disagree: off the top of your head list forme two other ‘states rights’ that motivated the secession.

Put the issue of motivation aside. What good came of the Confederacy? At the end of the day, their treasonous secession led directly to the death of hundreds of thousands, devastated the south, and increased the national debt from $60 million before the war to $2.7 billion afterwards! Why defend a symbol of such a wretched and shameful attempt to tear apart the United States? As you say, the only good I can think of coming from the Civil War is the Emancipation Proclamation. But somehow I doubt anyone’s flying the flag to represent that.

It goes without saying that individuals can do whatever they want on their own property— just as it goes without saying that I reserve the right to think you’re grossly insensitive at best and racist at worst and most likely—, but the Confederate flag should not be flown on government property.

A quick thought on Supernatural and some other tv shows

Just finished season 9 of Supernatural. You’ve got to give that show credit for being one of the few that *demands* a deus ex machina ending. Anything less, after all this fighting over who’s going to take God’s place and Castiel’s moments of mysterious grace, would be a let down. I can’t wait to see what season 10 has to offer.

While we’re on the topic of shows that appropriate Christian mythology for their own ends, I want to say that Messengers is crap. We’re supposed to believe that God turned some humans into angels and sent the Devil to earth to tempt some other humans into becoming the Four Horsemen of the Apocalypse, all because he’s testing humanity? What a convoluted and cruel way for an omnipotent being to decide what he wants to do next. It seems ridiculous: objectively, no more ridiculous than a strictly biblical eschatology, but the latter has the weight of time and belief to lend it a patina of respectability. The plot of Messengers is both silly and not original … a combination it’s hard to get excited over. The actual execution is also pretty bland.

Continuing in the theme, I’m looking forward to the Lucifer show. D. B. Woodside’s character — the angel trying to pressure Lucifer back into hell — reminds me of the angel from Constantine, but hopefully that’ll be the only point of similarity between the two shows. The concept of Lucifer turning to crime fighting is almost too much for me to wrap my head around (I can only imagine the reaction of various Christian action organizations), but I have a history of enjoying shows that revolve around immortals assisting cops. I hope he ends up having more powers than just the ability to convince people to tell him their innermost desires.

Marco Polo: I approve, so far

I finally got around to watching Marco Polo. This is perhaps surprising news, as anyone who knows me could guess this would be right up my alley: almost superhuman martial arts (at least in the promo material), an eastern setting, and a clash of nations …

I’m on episode three, and I’m suprised to say that I’m not at all disappointed! I’m enjoying seeing the tensions building within his empire as Kublai Khan tries to preserve the Mongolian spirit of his empire while incorporating the disparate cultures and religions of his client states. In particular, Khan wants to incorporate the learning and wisdom of the Chinese, but his warlords are uneasy with the changes they see in his court and they think he is moving away from the Mongol ways. Jingim, Khan’s half-Chinese son and heir, is particularly affected by this cultural conflict: the warlords of the Empire see him as weak and not Mongol enough to be the next Khan, and he himself blames his father for not raising him to be more Chinese.

I haven’t yet decided how I feel about the portrayal of the women in this series. On the one hand, it’s completely believable that in the societies depicted at that time women had no path to influence other than through the bedroom. On the other, the sexual intrigue seems like an excuse to show a lot of writhing naked women.

Probably the aspect of the show that excites me the most is the casting. There’s only ONE white person on the cast, that being of course Marco Polo. He is an outsider in a vast world that does not involve white people at all… and he has no real power to influence events. He is, literally, an observer. It’s refreshing to see a big budget American production tackle a story from another culture and make authentic casting decisions. I really hope Marco Polo doesn’t end up being another Great White Savior.

Wilkinson on a priori error analysis

I’ve been reading a lot of NLA lately (e.g., a recent paper on communication-avoiding RRQR), and necessarily brushing up on some details I paid scant attention to in my NLA courses, like the details of the different types of pivoting. Which led me to this quote by a famous numerical analyst:

There is still a tendency to attach too much importance to the precise error bounds obtained by an a priori error analysis. In my opinion, the bound itself is the least important part of it. The main object of such an analysis is to expose the potential instabilities, if any, of an algorithm so that hopefully from the insight thus obtained one might be led to improved algorithms. Usually the bound itself is weaker than it might have been because of the necessity of restricting the mass of detail to a reasonable level and because of the limitations imposed by expressing the errors in terms of matrix norms. A priori bounds are not, in general, quantities that should be used in practice. Practical error bounds should usually be determined by some form of a posteriori error analysis, since this takes full advantage of the statistical distribution of rounding errors and of any special features, such as sparseness, in the matrix.

Can I get an amen? This could be the epigraph of the career I’m building. I strive for a priori analyses— whether they are of algorithms or physical systems—, because in the best cases, they enhance our understanding of the factors relevant to our problems. I seek them out in others’ work and try to provide them in my own because I’m deeply skeptical of purely empirical results: without sufficient theory, how do you know you’re not just avoiding inputs that would expose some failing in your idea? This is why I’m an applied mathematician.

Nystrom vs Random Feature Maps

I haven’t seen a truly convincing study comparing Nystrom approximations to Random Feature Map approximations. On the one hand, a NIPS 2012 paper compared the two and argued that because the bases Nystrom approximations use are adaptive to the problem, whereas those used by RFMs are not, Nystrom approximations are more efficient.

This is an indisputable point, but the experiments done in the paper are not convincing: they used the same number of samples in Nystrom approximations as random features in RFMS. Instead, the fair comparison is to allot both methods the same number of FLOPs; since Nystrom methods involve an additional pseudoinversion of a (huge, for a large number of samples) matrix, one can potentially use more random features than sample points for the same number of FLOPs. Also, as always, it is important to choose an appropriate kernel — this paper only considered RBF kernels.

On the other hand, recently IBM researchers have used large-scale RFM approaches to get state-of-the-art performance on vision and speech tasks. Their results use the simplest of RFM approaches: linear regression on top of a very large number (~400K) random fourier features. The key to their success is a well-engineered ADMM approach to parallelizing the solution of the system. It’s not clear to me that a similar approach couldn’t be used to scale up a Nystrom-based solution and obtain similar results. Also, I’ve not seen anyone implement Wainwright et al.’s divide and conquer approach to kernel regression; theoretically, this could also be used to distribute the cost of a truly large-scale Nystrom implementation.

Personally, I’m of the opinion that a well-engineered Nystrom solution (using uniform sampling, even) should always outperform a well-engineered RFM solution. But, I’m interested in seeing this convincingly demonstrated.