Remember when a 20kB image took a minute to load? Back then, when dinosaurs were roaming the earth?
Data has become big.
Today we have more data than ever before, more data in fact than we know how to analyze or even handle. Big data is a big topic. Big data changes the way we do science and the way we think about science. Big data even led Chris Anderson to declare the End of Theory:
“We can stop looking for models. We can analyze the data without hypotheses about what it might show. We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot.”
That was 5 years ago. Theory hasn’t ended yet and it’s unlikely to end anytime soon. Because there is slight problem with Anderson’s vision: One still needs the algorithm that is able to find patterns. And for that algorithm, one needs to know what one is looking for to begin with. But pattern finding algorithms for big data are difficult. One could say they are a science in themselves, so theory better not ends before having found them.
Those of us working on the phenomenology of quantum gravity would be happy if we had data at all, so I can’t say the big data problem is big on my mind, but I have a story to tell. Alexander Balatsky recently took on a professorship in condensed matter physics at Nordita, and he told me about a previous work of his that illustrates the challenge of big data in physics. It comes with an interesting lesson.
Electron conducting bands in crystals are impossible to calculate analytically except for very simplified approximations. Determining the behavior of electrons in crystals to high accuracy requires three-dimensional many-body calculations of multiple bands and their interactions. It produces a lot of data. Big data.
You can find and download some of that data in the 3D Fermi Surface Database. Let me just show you a random example example of Fermi surfaces, this one being for a gold-indium lattice:
The Fermi-surface roughly speaking tells you how electrons are packed. Pretty in a nerdy way, but what is the relevant information here?
The particular type of crystal Alexander and his collaborators, Hari Dahal and Athanasios Chantis, were interested in are so-called non-centrosymmetric crystals which have a relativistic spin-splitting of the conducting bands. This type of crystal symmetry exists in certain types of semiconductors and metals and plays a role in unconventional superconductivity that is still a theoretical challenge. Understanding the behavior of electrons in these crystals may hold the key to the production of novel materials.
The many-body, many-bands numerical simulation of the crystals produces a lot of numbers. You pipe them into a file, but now what? What really is it that you are looking for? What is relevant for the superconducting properties of the material? What pattern finding algorithm do you apply?
Let’s see...
The human eye, and its software in the visual cortex, is remarkably good in finding patterns, so good in fact it frequently finds patterns where none exist. And so the big data algorithm is to visualize the data and let humans scrutinize it, giving them the possibility to interact with the data while studying it. This interaction might mean selecting different parameters, different axes, rotating in several dimensions, changing colors or markers, zooming in and out. The hardware for this visualization was provided by the Los Almos-Sandia Center for Integrated Nanotechnologies, VIZ@CINT; the software is called ParaView and shareware. Here, big data meets theory again.
Intrigued about how this works in practice, I talked to Hari and Athanasios the other day. Athanasios recalls:
“I was looking at the data before in conventional ways, [producing 2-dimensional cuts in the parameter space], and missed it. But in the 3-d visualization I immediately saw it. It took like 5 minutes. I looked at it and thought “Wow”. To see this in conventional ways, even if I had known what to look for, I would have had to do hundreds of plots.”
The irony being that I had no idea what he was talking about. Because all I had to look at was a (crappy print of) a 2-dimensional projection. “Yes,” Athanasios says, “It’s in the nature of the problem. It cannot be translated into paper.”
So I’ll give it a try, but don’t be disappointed if you don’t see too much in the image because that’s the reason d’être for interactive data visualization software.
3-d bandstructure of GaAs. Image credits: Athanasios Chantis. |
The two horizontal axis in the figure show the momentum space of the electrons into the directions away from the high symmetry direction of the crystal. It has a periodic symmetry, so you’re actually seeing four times the same patch, and in the atomic lattice this pattern goes on to repeat. In the vertical direction, there are two different functions shown simultaneously. One is depicted with the height profile whose color code you see on the left and shows the energy of the electrons. The other function shown (rescaled) in the colored bullets, is the spin-splitting of three different conduction bands; you see them in (bright) red, white and pink. Towards the middle of the front, note the white band getting close to the pink one. They don’t cross, but instead they seem to repel and move apart again. This is called an anti-crossing.
The relevant feature in the data, the one that’s hard if not impossible to see in two dimensional projections, is that the energy peaks coincide with the location of these anti-crossings. This property of the conducting bands, caused by the spin-splitting in this type of non-centrosymmetric crystals, affects how electrons travel through the crystal, and in particular it affects how electrons can form pairs. Because of this, materials with an atomic lattice of this symmetry (or rather, absence of symmetry) should be unconventional superconductors. This theoretical prediction has meanwhile been tested experimentally by two independent groups. Both groups observed signs of unconventional pairing, confirming at a strong connection between noncentrosymmetry and unconventional superconductivity.
This isn’t the only dataset that Hari studied by way of interactive visualization, and not the only case where it wasn’t only helpful but necessary to extract scientific information. Another example is this analysis of a data set from the composition of the tip of a scanning tunnel microscope, as well as a few other projects he has worked on.
And so it looks to me that, at least for now, the best pattern-finding algorithm for these big data sets is the eye of a trained theoretical physicist. News about the death of theory, it seems, have been greatly exaggerated.
Add a comment