Tag: data mining
Today marks the midpoint between the September equinox and the coming solstice. The days are continuing to increase in length in the southern hemisphere, so there are a lot of bright, sunny days still ahead. If we count the September equinox as 0 degrees in Earth’s orbit about the Sun, then we’re now 45 degrees along (or π/4 radians, for you geeks)—we’re now hitting the stride of spring, in an astronomical sense.
In the spirit of this ongoing cosmic dance, I’ve again returned to the idea of participating in the astronomy community, this time once I complete my honours programme.
No, I don’t intend to dump computer science for astrophysics. Instead, I’m hoping for a threesome.
My current research involves data mining—that is, analysing data (via computational methods) in search of interesting patterns. In other words, we extract knowledge from raw data using methods such as clustering (i.e. “What groupings do the data naturally fall into?”), classification (i.e. “Can we construct a set of rules to classify new instances?”), etcetera. Closely related is the field of informatics, which aims to build and study information systems—indeed, a data mining component is often crucial to such systems.
Two emerging subfields are bioinformatics and geoinformatics. Bioinformatics (in practice) is primarily concerned with analysing and managing the masses of data involved in molecular biology, such as is found in genetic studies; geoinformatics, similarly, deals with analysing and managing Earth-centric data—if you’ve ever used a GPS, you’ve already brushed up against it.
Astronomy is taking its cues from both these subfields, introducing the idea of astroinformatics. Take a look at Google Sky (if you haven’t already) and you’ll immediately get a taste of the potential. But it’s much bigger than that…
Consider the planned Large Synoptic Survey Telescope (LSST). To quote Wikipedia:
Allowing for maintenance, bad weather, etc., the camera is expected to take over 200,000 pictures (1.28 petabytes uncompressed) per year, far more than can be reviewed by humans. Managing and effectively data mining the enormous output of the telescope is expected to be the most technically difficult part of the project.
1.28 petabytes is equal to 1.28 million gigabytes. Now, if you’re lucky enough to have a 2 terabyte harddrive, that’s still 600 times your harddrive’s size in raw image data, per year. So we need to design systems to effectively mine these data, because humans just can’t manage an avalanche of data on that scale.
But even on a smaller scale, astronomical data are being gathered from other sources—enough to still overwhelming the research community. Researchers must be able to effectively manage and meaningfully analyse what is already there. There’s no point in having petabytes of data if you can’t make sense of it.
So astronomy and astrophysics need computer science desperately.
That’s where I come in.
Once I complete my honours, I’m hoping to help design and develop systems that will aid the astronomy community in making new discoveries that were heretofore unachievable. With the advent of such projects as the National Virtual Observatory in the U.S., we suddenly have an amazing opportunity to understand the heavens like never before, and by applying techniques from artificial intelligence, we can allow computers to do the grunt-work while humans make the big leaps in discovery.
It’s an exciting time to be involved in the scientific community, and I’m glad to be a part of it.