(Corrects name of MMRF, adding "Research," in 19th paragraph)
By Sharon Begley
NEW YORK, June 5 Amazon.com Inc is in a
race against Google Inc to store data on human DNA,
seeking both bragging rights in helping scientists make new
medical discoveries and market share in a business that may be
worth $1 billion a year by 2018.
Academic institutions and healthcare companies are picking
sides between their cloud computing offerings - Google Genomics
or Amazon Web Services - spurring the two to one-up each other
as they win high-profile genomics business, according to
interviews with researchers, industry consultants and
That growth is being propelled by, among other forces, the
push for personalized medicine, which aims to base treatments on
a patient's DNA profile. Making that a reality will require
enormous quantities of data to reveal how particular genetic
profiles respond to different treatments.
Already, universities and drug manufacturers are embarking
on projects to sequence the genomes of hundreds of thousands of
The human genome is the full complement of DNA, or genetic
material, a copy of which is found in nearly every cell of the
Clients view Google and Amazon as doing a better job storing
genomics data than they can do using their own computers,
keeping it secure, controlling costs and allowing it to be
The cloud companies are going beyond storage to offer
analytical functions that let scientists make sense of DNA data.
Microsoft Corp and International Business Machines
are also competing for a slice of the market. The
"cloud" refers to data or software that physically resides in a
server and is accessible via the internet, which allows users to
access it without downloading it to their own computer.
Now an estimated $100 million to $300 million business
globally, the cloud genomics market is expected to grow to $1
billion by 2018, said research analyst Daniel Ives of investment
bank FBR Capital. By that time, the entire cloud market should
have $50 billion to $75 billion in annual revenue, up from about
$30 billion now.
"The cloud is the entire future of this field," Craig
Venter, who led a private effort to sequence the human genome in
the 1990s, said in an interview. His new company, San
Diego-based Human Longevity Inc, recently tried to import
genomic data from servers at the J. Craig Venter Institute in
The transmission was so slow, scientists had to resort to
sending disks and thumb drives by FedEx and human messengers, or
"sneakernet," he said. The company now uses Amazon Web Services.
So does a collaboration between Regeneron Pharmaceuticals
Inc and Pennsylvania-based Geisinger Health Systems to
sequence 250,000 genomes. Raw DNA data is uploaded to Amazon's
cloud, where software from privately-held DNAnexus assembles the
millions of chunks into the full, 3-billion-letter long genome.
DNAnexus's algorithms then determine where an individual
genome differs from the "reference" human genome, the company's
chief scientist Dr. David Shaywitz said, in hopes of identifying
new drug targets.
HOSTING FOR FREE
Showing how important Google and Amazon view this business,
and how they hope to use existing customers to lure future ones,
each is hosting well-known genomics datasets for free.
Neither company discloses the amount of genomics data it
holds, but based on interviews with analysts and genomic
scientists, as well as the companies' own announcements of what
customers they've won, Amazon Web Services may be bigger.
Data from the "1000 Genomes Project," an international
public-private effort that identified genetic variations found
in at least 1 percent of humans, reside at both Amazon and
Google "without charge," said Kathy Cravedi of the U.S. National
Institutes of Health (NIH), one of the project's sponsors.
Other paying clients with a more specific focus are picking
Google, for instance, won a project from the Autism Speaks
foundation to collect and analyze the genomes of 10,000 affected
children and their parents for clues to the genetic basis of
Another customer is Tute Genomics, whose database of 8.5
billion human DNA variants can be searched for how frequently
any given variant appears, what traits it's associated with and
how people with a certain variant respond to particular drugs.
Amazon is hosting the Multiple Myeloma Research Foundation's
project to collect complete-genome sequences and other data from
1,000 patients to identify new drug targets. It also won the
Alzheimer's Disease Sequencing Project, which has similar aims.
Amazon charges about $4 to $5 a month to store one full
human genome, and Google about $3 to $5 a month. The companies
also charge for data transfers or computing time, as when
scientists run analytical software on stored data.
Amazon's database-analysis tool, Redshift, costs 25 cents an
hour or $1,000 per terabyte per year, the company said. A
terabyte is 1 trillion bytes, or 1,000 gigabytes, about enough
to hold 300 hours of high-quality video.
Another part of the cloud services' pitch to would-be
customers is that their analytic tools can fish out genetic gold
- a drug target, say, or a DNA variant that strongly predicts
disease risk - from a sea of data. Any discoveries made through
such searches belong to the owners of the data.
"On the local university server it might take months to run
a computationally-intense" analysis, said Alzheimer's project
leader Dr. Gerard Schellenberg of the University of
Pennsylvania. "On Amazon, it's, 'how fast do you need it done?',
and they do it."
Another selling point is security. Universities are
"generally pretty porous," said Ryan Permeh, chief scientist at
cybersecurity company Cylance Inc, of Irvine, California, and
the security of federal government computers is "not at the top
of the class."
While academic and pharmaceutical research projects are the
biggest customers for genomics cloud services, they will be
overtaken by clinical applications in the next 10 years, said
Google Genomics director of engineering David Glazer.
Individual doctors will regularly access a cloud service to
understand how a patient's genetic profile affects his risk of
various diseases or his likely response to medication.
"We are at that transition point now," Glazer said.
Matt Wood, general manager for Data Science at Amazon Web
Services, sees cloud demand in genomics now as "a perfect
storm," as the amount of data being created, the need for
collaboration and the move of genomics into clinical care
Experts on DNA and data say without access to the cloud,
modern genomics would grind to a halt.
Bioinformatics expert Dr. Atul Butte of the University of
California, San Francisco, said that now, when researchers at
different universities are jointly working on NIH and other
genomic data, they don't have to figure out how to make their
computers talk to each other. In March, NIH cleared the way for
major research on the cloud when it began allowing scientists to
upload important genomic data.
"My response was, it's about time," Butte said.
(Reporting by Sharon Begley and Caroline Humer; Editing by
Michele Gershberg and John Pickering)