Expediting Results

Bioinformatics can offer efficient, automated, accelerated analysis of genetic data
By Erin Voegele | January 30, 2012

Many biorefining companies engineer more than equipment and mechanical processes in technology development. They also engineer, design or breed microorganisms, whether algae, yeast or bacteria. To design the most efficient, cost-effective process technology possible, these companies must ensure that the microorganisms they employ are highly effective at performing whatever conversion necessary. This often involves extensive work with gene sequencing and analysis.

These operations can result in a huge amount of raw data that needs to be organized and analyzed. A variety of software solutions—on the market and in open-source formats—are available to aid researchers in analyzing genetic data sets. However, a unique software solution offered by Denmark-based CLC bio is proving to be the best software solution for a variety of biorefining companies, including Sapphire Energy Inc., BioGasol ApS, and Synthetic Genomics Inc.

One common way CLC bio’s bioinformatics solutions is used by industry, says Cecilie Boysen, the company’s senior consulting manager of the Americas, is that scientists identify a gene they want to optimize to make a particular organism perform better under industrial conditions. A company working with algae might want to optimize its ability to produce oil or grow under very specific conditions. Similarly, a company developing a cellulosic fermentation production technology may need to optimize its yeast or bacteria to produce under certain temperature variables.

The technologies and processes to perform this type of selection activity generally result in an overwhelming amount of data. Many companies have traditionally spent days or weeks analyzing it in a semi-automated fashion using several stand-alone software packages. In some cases, fully manual work is also completed using Excel spreadsheets. According to Boysen, the bioinformatics solution offered by CLC bio essentially serves as a pipeline that automates all of these activities. The result is an analysis process that is significantly less time-intensive, and less prone to human error. 

CLC bio has been offering bioinformatics solutions since its founding in January 2005. According to Lasse Görlitz, the company’s director of communications, CLC bio was established by two brothers in Denmark. “The original idea was to make some powerful analysis tools and put them into the hands of scientists who can interpret the resulting data,” he says. In the past, molecular biologists and other researchers had to collaborate with programmers to analyze genetic data. This is because the algorithms used were not in the form of a graphical interface. Rather, they needed to be scripted and programmed, not only in the beginning, but throughout the analysis process assuming that the algorithms had to be tweaked to better perform. “You had to be able to script and program to actually run these analyses,” Görlitz said. “We wanted to be able to put the tools in the hands of those who actually had the knowledge to interpret the data.”

In the first three years of the company’s existence, approximately 1 million users downloaded the program. Considering the relatively small size of the scientific community needing this type of analysis tool, Görlitz calls the reach of the program outstanding and formerly unheard of in the sector. When next-generation gene sequencing became available in 2007, CLC bio decided to also serve that need in the scientific community. Next-gen sequencing is vastly less expensive than traditional methods and results in even greater amounts of data. CLC bio’s decision to service that need has paid off, says Görlitz, as the company’s software sales grew more than 30 percent last year, with profits growing by more than 50 percent.

Traditionally, the bioinformatics solution has been widely used by those in several sectors, including plant research, but interest from the biorefining community is relatively new. Görlitz stresses that his company is confident its presence in the sector will continue to grow dramatically.


Görlitz says CLC bio’s bioinformatics solution is organism-agnostic. Any living organism with a DNA code has genes that can be sequenced. If the genes can be sequenced, CLC bio’s product can analyze and assemble the data. “One of the key strengths of our platform is that it caters to the agricultural sector, specifically plant genomes,” Görlitz adds. “While it might sound strange, plant genomes are typically a lot bigger than those of mammals. So, some of these plant species actually require some very sophisticated lab-scale setups to handle and analyze all of these large genomes, and to interpret that data.”

The system is an open platform, Görlitz says, which means that a customer can integrate almost any system into it. This includes open-source algorithms that some researchers may be using to analyze data. It could also be third-party solutions that are already installed in a lab. The bioinformatics solution essentially integrates the flow of data between all of these systems. “Our architecture is extremely flexible in that way,” he adds. “We can go into virtually any organization, look at what they already have, look at what their needs are, and then add our platform and tailor it so everything is strung together” into a pipeline the organization can tweak on its own.

Our system focuses on sequence analysis, Görlitz says. “You used to have to use a lot of different software packages to do different types of analyses. That is what we have actually integrated into our platform. Once you are doing sequence analysis, you are pretty much staying within our software platform…If a customer has developed its own specific tool, we can just integrate that with the platform. You are still using the same framework and you don’t have to import/export or convert data, which are tedious tasks. You can actually focus on the science and being productive.”

Benefits, Industry Feedback

The platform not only increases the accuracy of analysis, it also vastly speeds up the process. Görlitz points out that whenever people have to perform repetitive manual tasks, errors occur. If those tasks are completed by a computer, the potential for inconsistencies is reduced.

The benefits of speed are even more pronounced. “The speed is actually something that is very unique about our software,” Görlitz says. A good example of this is with the human genome. A common open-source algorithm used to analyze the human genome is called ABYSS. “If you want to use that to assemble a human genome, you would have to have it running on 20 computers for seven days. If you use our algorithm, you can do it on a single computer in about seven hours.” And these are not super computers, he stresses, just relatively standard computers already installed in most laboratory operations.

The speed is “a tremendous advantage” for biofuel companies that are “in pipeline mode,” Görlitz says, “where you simply have a lot of data all the time, and you keep procuring data and you want to analyze it. Time is money.”

In December, CLC bio announced that Sapphire Energy installed its bioinformatics solution. “At the core of our science operations and biology operations here, we institute a high throughput of biology capabilities, meaning we look at thousands and thousands of different traits simultaneously,” says Tim Zenk, Sapphire Energy’s vice president of corporate affairs. “Our objective in working with CLC bio was to bring the biotechnology informatics capability in-house so we can categorize and speed up the process of identifying traits and genes that we are looking for.”

According to Zenk, the solution implemented at Sapphire Energy required minimal customization and allowed Sapphire Energy employees to immediately go to work on a relatively standard platform they were already used to. While some customization was required to accommodate the specific genes his company is looking for, Zenk says implementation of the solution required very little work on the side of Sapphire Energy. “We implemented our internal intrascience intranet that enables our scientists to access it through the enterprise network,” he says. “CLC bio customized the software to meet our particular high-throughput biology needs, focused in on the algae genomes that we are looking for.”

BioGasol also recently installed a bioinformatics solution. The cellulosic ethanol company is working to develop a thermophilic strain of bacteria. According to Torbjørn Jensen, BioGasol research scientist, his company looked into other competing solutions, but found CLC bio’s to be the best package for its needs. “When we do this bioinformatic work, we use a lot of different functions and all of these functions are integrated into the same program,” he says. “That was one of our main focuses.” Jensen says his company began using the product last year and is very satisfied with its performance because it’s an affordable, robust solution. “The team behind it has been really open and eager to help.”

Görlitz says his company is continually working to update and improve its bioinformatics solution, and there are always new protocols and new technologies being introduced in genome sequencing. CLC bio spends a great deal of time and energy ensuring that its platform can integrate with new improvements in the sequencing space. “One of the really new, promising things that scientists are excited about is our new feature, coming out later this year, that allows you to compare and visualize differences between different genomes and different genome databases,” he explains. “It may sound trivial, but it is extremely complicated when you take into account the quantity of data you have to crunch to do this. It’s nothing anyone else has demonstrated they can do, so this is something that is unique to our software.”

Author: Erin Voegele
Associate Editor, Biorefining Magazine
(701) 540-6986