Most of the projects that my coworkers and I work on involve analyses of big datasets with information about algal specimens. One of Tom Schils‘ projects that I’m helping out with aims to sketch an image of the geographical patterns of seaweed diversity using a combination of tools. Tom’s been accumulating floristic information that we are now trying to complement with DNA sequence data to characterize how species diversity and phylogenetic diversity are distributed on earth.
The Hawaiian Algal Database is a superb resource of information about — you guessed it — Hawaiian algae. The data were generated, compiled and put online by Alison Sherwood and Gernot Presting of the University of Hawaii at Manoa, and a report about the dataset was published in BMC Plant Biology. It’s a specimen-centered database that has all sorts of metadata including geographical coordinates, information about the collection site, and in many cases DNA sequences of up to 3 markers from different genomes (yes, algae have 3, at least).
Because the data are available only through the online HADB interface, Tom encouraged me to write a script to download the information we needed to integrate the Hawaiian data with ours. I wrote a Perl script that uses the LWP library to download and store the information in a more analysis-friendly format. In case anyone is interested, I’m linking the script here.
I downloaded information for the 221 specimens of brown algae, 238 of green algae and 2163 of red algae in the dataset. What’s absolutely great is that for the reds, 61% of the samples have been sequenced; that’s 1333 sequenced specimens belonging to 213 unambiguously named species! Unfortunately far fewer specimens of greens (43) and browns (25) were sequenced.
So, HADB is a great addition to the data we have from Genbank and other sources and will no doubt help us understand the geographical distribution of algal phylogenetic diversity. Thanks to the Hawaii group for generating these data and making them available.