By Elizabeth Dwoskin
Even in hype-filled Silicon Valley, few buzz phrases are
freighted with higher expectations than big data. Salespeople are
knocking on the doors of Fortune 500 companies, promising to help
them analyze a mounting flood of information from websites,
smartphones, social networks and an increasing array of
sensor-laden devices.
A brick-and-mortar retailer, for instance, might discover that a
returning customer, based on her purchase history, social-media
feed and location, is pregnant and ping her smartphone with a
discount on diapers the moment she enters the store.
Underpinning the big-data craze is Hadoop, a software suite
named for a toy elephant belonging to the son of a Yahoo programmer
who helped develop the software in the mid-2000s. While traditional
databases like those offered by Oracle Corp. store predefined
information in rows and columns on individual servers, Hadoop can
spread uncategorized data across a network of thousands of cheap
computers, making it a less costly, more scalable way to catalog
multiplying streams of input.
The software, distributed under an open-source license, is free
to use, share and modify, and many vendors, from database stalwarts
like Microsoft Corp. to analytics services like Splunk Corp., have
embraced it to push big data beyond its Silicon Valley
stronghold.
The market for big-data tools may be valued at $41.5 billion by
2018, International Data Corp. researchers say. Investors have
poured over $2 billion into businesses built on Hadoop, including
Hortonworks Inc., which went public last week, its rivals Cloudera
and MapR Technologies, and a growing list of tiny startups.
Yet companies that have tried to use Hadoop have met with
frustration. Bank of New York Mellon used it to locate glitches in
a trading system. It worked well enough on a small scale, but it
slowed to a crawl when many employees tried to access it at once,
and few of the company's 13,000 information-technology workers had
the expertise to troubleshoot it. David Gleason, the bank's chief
data officer at the time, said that while he was a proponent of
Hadoop, "it wasn't ready for prime time."
"The dirty secret is that a significant majority of big-data
projects aren't producing any valuable, actionable results," said
Michael Walker, a partner at Rose Business Technologies, which
helps enterprises build big-data systems. According to a recent
report from the research firm Gartner Inc., "through 2017, 60% of
big-data projects will fail to go beyond piloting and
experimentation."
It turns out that faith in Hadoop has outpaced the technology's
ability to bring big data into the mainstream. Demand for Hadoop is
on the rise, yet customers have found that a technology built to
index the Web may not be sufficient for mainstream big-data tasks,
said Nick Heudecker, research director for information management
at Gartner.
It can take a lot of work to combine data stored in legacy
repositories with the data stored in Hadoop. And while Hadoop can
be much faster than traditional databases for some purposes, it
often isn't fast enough to respond to queries immediately or to
work on incoming information in real time. Satisfying requirements
for data security and governance also poses a challenge.
"Venture capitalists were sold on this idea that Hadoop was
going to supplant traditional database technology in the
enterprise," Mr. Heudecker said. "But enterprises didn't just jump
on the bandwagon."
Even as Hortonworks' IPO boosts the technology's profile, a new
generation of tools is emerging to fill the gaps.
Hortonworks has suffered not only from immature technology but
also from a firm commitment to base its business on free software.
The company's revenue comes mainly from providing tech support to
companies experimenting with Hadoop. In a fast-growing market that
requires specialized expertise, Hortonworks is positioning itself
to offer highly qualified assistance at a competitive price.
In November, Hortonworks reported its revenue for the first nine
months of 2014 was $33.4 million--far short of the $100 million
that Chief Executive Rob Bearden had said in March he expected for
the year. It racked up an $87 million loss in the period, nearly
double its loss in the previous quarter and a number that "set the
new high-water mark for the scale of operating losses public
investors are willing to tolerate," said Amplify Partners founder
Sunil Dhaliwal.
Hortonworks priced its first batch of public stock 34% below
what investors had paid in a private funding round in March. The
move underscored some observers' doubts about the prospects for a
company based solely on Hadoop. But investors in last Friday's IPO
pushed Hortonworks's capitalization to $1.1 billion, excluding
stock awarded to employees.
"It's hard to sell free stuff," said John Schroeder, chief
executive of rival MapR. Although many startups have sprung up to
commercialize open-source software, only one public company in that
line is widely regarded as successful: Red Hat, which distributes
and supports the open-source Linux operating software. And Red Hat
doesn't look that successful compared with leading companies, from
Amazon to VMWare, that augment open-source software with
proprietary code, notes Peter Levine, a general partner at
Andreessen Horowitz.
In an interview Friday, Hortonworks's Mr. Bearden said the
company's IPO was "certainly validating that open source is an
incredibly viable business model."
Hortonworks rivals MapR and Cloudera offer proprietary
accessories to Hadoop intended to make it more valuable to large
companies. Cloudera, which pioneered the Hadoop market in 2008, has
raised more than $1 billion at a valuation of about $4.1 billion.
MapR, founded the following year, has raised $174 million. Both Mr.
Schroeder and Cloudera CFO Jim Frankola acknowledged challenges in
bringing Hadoop to corporate America. "We've learned what Hadoop is
good at and what Hadoop is not good at," Mr. Frankola said.
Meanwhile, enterprises are eager to forge into areas where
Hadoop falls short, especially tasks that require processing
incoming data in real time, such as using smartphone location data
to offer just-in-time deals when a prospective customer enters a
store.
For corporate big-data projects, Hadoop may be only one arrow in
an expanding quiver. A host of new startups is emerging to address
the technology's weaknesses. Databricks, with $47 million in
venture funding, commercializes Spark, which is open-source
software that's more adept than Hadoop at handling real-time data.
Altiscale, with $42 million, offers Hadoop as a service delivered
in the cloud. Splice Machine, which has raised $22 million, makes a
tool that queries Hadoop as though it were a traditional database.
Other tools, including the recent Google spinoff Metanautix, aim to
supplant Hadoop entirely.
The Hadoop vendors are responding with improvements and
additions. Hortonworks spearheaded an update that lets other
applications run on top of Hadoop. Cloudera and MapR have extended
the software with proprietary, enterprise-grade features like
automatic backup, and MapR is building solutions tailored to
specific industries, including financial services, health care and
telecommunications. All three will contend with an increasingly
chaotic, rapidly evolving marketplace.
"Right now, there's a whole alphabet soup of technologies out
there, which in many ways makes the market more confusing," says
T.M. Ravi, founder of The Hive, an incubator for big-data
companies. "In the end, there may be room for one stand-alone
company--if that."
Deborah Gage and Shira Ovide contributed to this article.
Write to Elizabeth Dwoskin at elizabeth.dwoskin@wsj.com
Access Investor Kit for Gartner, Inc.
Visit
http://www.companyspotlight.com/partner?cp_code=P479&isin=US3666511072
Access Investor Kit for Google, Inc.
Visit
http://www.companyspotlight.com/partner?cp_code=P479&isin=US38259P5089
Access Investor Kit for Google, Inc.
Visit
http://www.companyspotlight.com/partner?cp_code=P479&isin=US38259P7069
Access Investor Kit for Oracle Corp.
Visit
http://www.companyspotlight.com/partner?cp_code=P479&isin=US68389X1054
Access Investor Kit for Red Hat, Inc.
Visit
http://www.companyspotlight.com/partner?cp_code=P479&isin=US7565771026
Subscribe to WSJ: http://online.wsj.com?mod=djnwires