Can BigData reduce fraud?

Too much data can pose security concerns, and it can become overwhelming to manage. Verizon, in its latest Data Breach Investigations Report, finds most organizations get overwhelmed with too much data.

Chris Novak, who works in Verizon’s investigative response unit, says most organizations struggle to collect the right data and properly store it. “They don’t necessarily know where they have data … and how it’s being handled,” he says.

Like all organizations, financial institutions struggle with data. But many banks outsource data management to help ensure the data they collect is, in theory, protected and properly managed.

Can Data Reduce Fraud?

Here’s the question: Could institutions take advantage of their data to support fraud prevention? Experts at credit reporting bureau Experian say yes.

Experian is pushing ID theft management in a new way: to help banks prevent and detect fraud. Keir Breitenfeld, director of product management within Experian’s decision analytics team, says banking institutions are doing better jobs of capturing data.

“Institutions are saying, ‘We have to have a more enterprise-level approach,'” he says. “They know they need to warehouse data, so they can bring channels together, from a cost perspective and customer experience perspective.”

But the residual effect is that banks have a lot more data at their fingertips to track accountholders, rather than just accounts, for fraud.

The ability to capture data and warehouse it has improved so much that credit bureaus now have the ability to provide customized scores for individual accountholders.

So, the more data banks can leverage about new accountholders, in particular, the better their chances are of detecting fraud.

If banks routinely compare the data they collect about customers with information credit bureaus store, they could improve their fraud detection rates on new accounts by 20 percent or more, Breitenfeld says.

“If you can monitor accounts after they are opened, you can better detect fraud.”

How reduce the cost of data ?

“How can you control the cost of data growth?”

Is your company an “Information Hoarder”?  If so, answer the five questions below and you’ll be able to reduce the cost of your data.

1 – Do you need the data that you have?

The first step is to determine whether you need all of the data in your production systems.  You might be surprised by the answer.  Often, up to 85% of data in production systems is old and not used.  Information lifecycle management software will manage the lifecycle (i.e., the expiry date) of your information, as well as archive it to lower-cost storage alternatives.  The cost savings is immediate and significant.

2- Do you know the expiry date for your data?

All data has an expiry date.  Or at least it should.  Most organizations do not determine the expiry date for their data.  The result?  They keep data indefinitely.  There’s a simple rule on the TV show Hoarders – if you haven’t worn an item of clothing in the past year, throw it away.  The same rule applies to data.  If you haven’t accessed it recently, consider deleting or archiving it.

3- What is the cost of managing your data?

Every database has a cost.  Do you know the cost of yours?  There are several published performance benchmarks for relational database software.  The performance of the database is directly correlated to cost.  Migrating to another database may be easier than you think, as some vendors have invested in portability and migration capabilities.

4- The quality of data carries a cost

The poorer the quality, the higher the cost.  Seems like an obvious relationship, doesn’t it?  But not always.  Cleaning address data will correct  downstream errors, such as return mail costs and postage discount savings.  Aside from the direct cost savings, consider the indirect cost of inefficiency.  How much time do your employees spend investigating and correcting data errors?

5- Unifying fragmented data reduces cost

In the typical organization, many data entities are fragmented across dozens of systems.  Customers, products, locations, suppliers, to name just a few.  The fragmentation drives a cost that is not easy to detect.  It’s the cost of your employees manually searching for data in multiple systems, the cost of duplicating data entry in multiple systems, and the cost of the inevitable errors that result from fragmentation and duplication of effort.  Unifying data into one master data system often has far reaching cost savings implications.

There are many ways to reduce the cost of your data.  If you are looking for “the low hanging fruit” and a fast ROI, then take a look at the data you have today and determine whether you need all of it.  Likely the answer will be “no”, and information lifecycle management software will help you realize an immediate cost savings.



Great article from  David Corrigan posted on his blog:

White House is Spending Big Money on Big Data

Story brought by

It’s typical in an election year to see an administration spend money on new initiatives. A new cost cutting initiative unveiled back in March has generally gone un-noticed by the main stream technology media. Called the “Big Data Research and Development Initiative” the program is focused on improving the U.S. Federal governments ability to extract knowledge and insights from large and complex collections of digital data, the initiative promises to help solve some the Nation’s most  pressing challenges.


The program includes several federal agencies including NSF, HHS/NIH,DOE, DOD, DARPA and USGS who pledge more than $200 million in new commitments that they promise will greatly improve the tools and techniques needed to access, organize, and glean discoveries from huge  volumes of digital data.

In a statement  Dr. John P. Holdren, Assistant to the President and Director of the White House Office of Science and Technology Policy said “In the same way that past Federal investments in information-technology R&D led to  dramatic advances in supercomputing and the creation of the Internet, the initiative we  are launching today promises to transform our ability to use Big Data for scientific  discovery, environmental and biomedical research, education, and national security.”



One of the more interesting aspects of this project is the use of public cloud infrastructure, as in cloud computing services provided by the private industry. Confusing I know. A great example of this plan in action is The National Institutes of Health who announced that the world’s largest set of data on human genetic variation – produced by the international 1000 Genomes Project – is  now freely available on the Amazon Web Services (AWS) cloud. At 200 terabytes – the  equivalent of 16 million file cabinets filled with text, or more than 30,000 standard DVDs  – the current 1000 Genomes Project data set is a prime example of big data, where  data sets become so massive that few researchers have the computing power to make  best use of them. AWS is storing the 1000 Genomes Project as a publicly available  data set for free and researchers only will pay for the computing services that they use.

According to a recent article on this is part of a larger strategy to reduce the number of federal data centers from the current  3,133 data centers sliced by “at least 1,200” by 2015, representing a roughly 40% cutback at a $5 billion savings. This also extends the work started with the administration’s Cloud First Policy outlined last year as part of The White House’s Federal Cloud Computing Strategy.

In a world which that is more dependant on data then ever before the stakes are high and so is the money. It will be interesting to follow this initiative over the coming months.




Forbes: VMware Can Disrupt Big Data Analytics With Cetas Acquisition

VMware Can Disrupt Big Data Analytics With Cetas Acquisition

Trefis Team, Contributor

VMware, the leading virtualization company has acquired Cetas, an early stage startup focused on making access to big data analytics easier and cheaper. Terms of the deal haven’t been disclosed yet. [1] VMware competes withMicrosoftOracle and Citrix in the virtualization space.

Pure Play Cloud Platform

Unlike most big data applications, Cetas’ software is designed to run on virtual resources like Amazon Web Services and VMware’s vSphere. With this software application, there is no need to sell physical servers along with the software, making this easier to scale and cheaper to use. This acquisition makes sense for VMware as its applications are deployed on vSphere.

This makes the business model an Analytics-as-a-Service model which opens it up to small and medium businesses. By deploying on the cloud, the application will not only be cheaper, it is potentially much faster and easily scalable as its primary resources are virtual.

Cetas will continue to operate as a startup under the VMware umbrella and plans to integrate the software more tightly into the VMware product suite.

More Than Just Big Data Analytics

The leading providers of big data analytics areHewlett-Packard and International Business Machines, but VMware’s foray into this space might not be to compete with the leading players. The strategic target of these acquisitions is most likelyAmazon and OpenStack. The Platform-as-a-service offering can be greatly enhanced if it came with a built in analytics tool such as the one provided by Cetas.

By integrating this into vSphere, it makes the bundled offering cheaper to run than say deploying an application on Amazon Web Services and buying an analytics engine on top of that. Integration may be an issue as well so the combined offering will seem more attractive to most buyers. [2] We expect sales of the platforms to improve because of this acquisition and it is unlikely that analytics will be a major source of revenue in the short term.

We have a $109 Trefis price estimate for VMware, which is slightly above the current market price.

Originally posted on

Are you ready for the era of ‘big data’?

In the below  article, published recently by McKinsey,it appears states that in the US, across most business sectors, companies with more than 1000 employees store, on average, over 235 terabytes of data – more data than contained in the entirety of the US Library of Congress.


From a recent article from McKinsey Are you ready for the era of ‘big data’?

Radical customization, constant experimentation, and novel business models will be new hallmarks of competition as companies capture and analyze huge volumes of data. Here’s what you should know.

The top marketing executive at a sizable US retailer recently found herself perplexed by the sales reports she was getting. A major competitor was steadily gaining market share across a range of profitable segments. Despite a counterpunch that combined online promotions with merchandizing improvements, her company kept losing ground.

When the executive convened a group of senior leaders to dig into the competitor’s practices, they found that the challenge ran deeper than they had imagined. The competitor had made massive investments in its ability to collect, integrate, and analyze data from each store and every sales unit and had used this ability to run myriad real-world experiments. At the same time, it had linked this information to suppliers’ databases, making it possible to adjust prices in real time, to reorder hot-selling items automatically, and to shift items from store to store easily. By constantly testing, bundling, synthesizing, and making information instantly available across the organization—from the store floor to the CFO’s office—the rival company had become a different, far nimbler type of business.

Welcome in the era of ‘big data’

VMware buys big data startup Cetas

VMware has acquired Cetas, a Palo Alto, Calif.-based big data startup that provides analytics atop the Hadoop platform. Terms of the deal haven’t been disclosed yet, but Cetas is an 18-month-old company with tens of paying customers, including within the Fortune 1000, that didn’t need to rush into an acquisition. So, why did VMware make such a compelling offer?

Because VMware is all about applications, and big data applications are the next big thing. Hypervisor virtualization is the foundation of everything VMware does, but it’s just that — the foundation. VMware can only become the de facto IT platform within enterprise data centers if applications can run atop those virtualized servers.

That’s why VMware bought SpringSource, GemStone and WaveMaker, then actual application providers Socialcast and SlideRocket. It’s why VMware developed vFabric andcreated the Cloud Foundry platform-as-a-service project and service to make it as easy as possible to develop and run applications.

Cetas deployed on-premise

Cetas is the logical next step, a big data application that’s designed to run on virtual resources – specifically Amazon Web Services and VMware’s vSphere. In fact, Co-Founder and CEO Muddu Sudhakar told me, its algorithms were designed with elasticity in mind. Jobs consume resources while they’re running and then the resources go away, whether the software is running internally or in the cloud. There’s no need to sell physical servers along with the software.

It doesn’t hurt, either, that Cetas can help VMware compete on bringing big data to bear on its own infrastructure software. As Splunk’s huge IPO  illustrated, there’s a real appetite for providing analytics around operational data, coming from both virtual machines and their physical hosts. In this regard, Cetas will be like the data layer that sits atop virtual servers, application platforms and the applications themselves, providing analytics on everything.

Sudhakar said this type of operational analysis is one of Cetas’ sweet spots, along with online analytics a la Google and Facebook, and enterprise analytics. The product and includes many algorithms and analytics tools designed for those specific use cases out of the box (it even gives some insights automatically), but also allows skilled users to build custom jobs.

Going forward, Sudhakar said Cetas will continue to operate as a startup under the VMware umbrella — which means little will change for its customers or business model — while also working to integrate the software more tightly with the VMware family.

Big-data investors look for the next Splunk

SAN FRANCISCO (Reuters) – Splunk Inc’s impressive debut on Nasdaq Thursday, where it doubled its $17 initial public offering price, has investors suddenly paying attention to a sector that has grown in relative obscurity: big data.

Essentially a catch-all term, big data refers to the ability to collect and analyze massive amounts of information on almost every dimension of the human experience.

Splunk allows companies to analyze data cheaply and simply, compared to the expensive data warehouses and specialized, hard-to-deploy technology they might have needed in the past.

From general business aspects to narrow sectors ranging from retail to healthcare to climate, the possibility of capitalizing off this data has the investment community excited.

Investors “know a heck of a lot more about big data now than they did two weeks ago, and they’ll know a heck of a lot more in a month than they do now,” said John Connors, a former Microsoft investor who now works as a venture capitalist at Ignition Partners, where he headed up the firm’s Splunk investment.

Some backers of big-data companies didn’t realize they had such a hot ticket until well after they made their initial investment.

Dave Hornik, an early backer of Splunk and who also backs traffic-information company Inrix, said he did not think of Inrix as a big data investment at first.

“But it turns out it is absolutely about big data,” he said.

Most of the pure-play big data companies are still a couple of years away from IPOs, said Asheem Chandna at Greylock Partners, which invests in big data companies Cloudera and Sumo Logic. But he predicts stepped-up mergers and acquisitions of smaller big data companies in the short term, as well as blockbuster IPOs in the future.

“We’re at the start of a decade-long run around new opportunities in big data,” he said, singling out sub-areas such as analytics, business intelligence and automated pattern detection.

Greylock Partners fields around two dozen pitches a month from big data start-ups, more than double the rate of a year ago, Chandna noted.

“Splunk’s IPO is going to have a huge impact,” said Ted Tobiason, who runs technology equity capital markets at Deutsche Bank. “You can’t ignore the valuation.”

Splunk shares opened at $32, up 256 percent from the mid-point of the company’s original IPO filing, valuing the business at about eight times forecast 2013 revenue. That is a lot higher than Jive Software, a recent hot software IPO that priced its offering at 4.7 times 2013 revenue, Tobiason noted.

While there are few pure-play big data companies nearing IPOs soon, Splunk’s stock-market splash may encourage venture capital firms to invest more when big data firms look for new rounds of financing, Tobiason said.

VC firm Accel Partners launched a big data fund headed by Ping Li recently, but investors in other VC firms are likely asking how they plan to get more involved in the sector, he added.

It may also help big data start-ups to persuade more talented software engineers to jump ship from established tech companies, Tobiason said.

“This is the expansion of a huge market and you’re going to have a lot more winners like Splunk,” said Rob Ward of venture capital firm Meritech Capital, which invests in Cloudera and another big data company called Tableau Software.

Tableau is growing quickly and is likely considering an IPO some time next year, Ward added.

Curt Monash, an independent tech industry analyst, said he is busy meeting several big data companies, including Cloudera, Couchbase and Hortonworks.

“Cloudera probably has the second-best traction after Splunk,” Monash said. “They have doubled in every metric you can measure them by. They have about 220 employees and a number of subscription customers. Subscription is the majority of their revenue now, which they are happy about.”

Couchbase is part of a group of companies including 10gen that offer a hot type of database technology that can handle massive amounts of variable data, he added.

Other companies doing well in the big data space include MetaMarkets and Infobright, according to Monash.

Here are a handful of promising big-data businesses highlighted by venture capitalists, bankers and tech industry experts:

Cloudera: Helps other companies, including Nokia, Qualcomm and Groupon, store and crunch big data using Hadoop, a popular type of open-source software.

Cloudera is run by Michael Olson, former CEO of database company Sleepycat Software, which was acquired by Oracle in 2006. Cloudera’s Chief Scientist is Jeff Hammerbacher, who built Facebook’s data team.

While most people complain about the avalanche of data spewing from social networks and other sources, Hammerbacher thinks there is not enough data in the world.

Cloudera raised $40 million in Series D funding in late 2011 led by Frank Artale of Ignition Partners and existing investors Accel, Greylock, Meritech Capital Partners and In-Q-Tel, known as the investment arm of the CIA.

Hortonworks: Formed from the team that helped develop Hadoop as an open-source project inside Yahoo several years ago. The company is trying to get Hadoop used by as many people as possible, and it boldly predicts the technology will process half of the world’s data within the next five years.

Hortonworks has support, training and partner programs to help other companies learn how to use Hadoop. CEO Rob Bearden was COO SpringSource and JBoss, two successful open source companies. The CTO and co-founder is Eric Baldeschwieler, who led the evolution of Hadoop at Yahoo.

Hortonworks has been coy about its funding. However, Benchmark Capital general partner Peter Fenton is an investor and sits on the company’s board of directors.

Sumo Logic: Think of this company as a next-generation Splunk, building in cloud-based capabilities right from the start. Currently, Splunk is premise-based, meaning it uses on-site computer servers rather than remote, cloud-based servers.

The company raised $15 million earlier this year from investors including Greylock and Sutter Hill Ventures. Chandna at Greylock Partners says on top of its existing business, Sumo Logic’s cloud-based technologies mean it could one day sell anonymized industry insights gleaned from across the spectrum of companies it does business with.

Inrix: This company represents one of many industry-specific big data plays venture capitalists are making. It uses several hundred thousand receivers based around the roads to gather millions of pieces of data every hour on factors such as how fast cars are moving, whether they are slowing down or speeding up, whether windshields are on, and so on.

Together, the data gives a complete picture of traffic patterns now, and likely ones in the future. The data can be sold to entities ranging from GPS makers to municipalities who need to plan roads. Its latest funding round was last year, when it raised $37 million led by Kleiner Perkins Caufield & Byers and August Capital.

10gen: The company develops MongoDB, an open-source database that can handle lots of different types of data, partly because it does not require information to be store in traditional tables and rows.

10gen was founded by former DoubleClick Founder and CTO Dwight Merriman and former DoubleClick engineer and ShopWiki Founder and CTO Eliot Horowitz.

10gen raised $20 million in Series D funding last year from Sequoia Capital, Flybridge Capital and Union Square Ventures.

MetaMarkets: This company helps online media businesses analyze high volumes of streaming data in areas such as online advertising, gaming and social media.

MetaMarkets is run by Michael Driscoll, who led Dataspora, which delivers data science to telecom companies, insurers and retail banks.

MetaMarkets raised $6 million last year from investors including IA Ventures, Village Ventures and True Ventures.

(Reporting By Sarah McBride and Alistair Barr; Editing by Bernard Orr)

(c) Copyright Thomson Reuters 2012. Check for restrictions at:

Splunk IPO Raises $229.5 million ($1.6 billion valuation)

From businessweek

Splunk Inc. (SPLK) (SPLK), a maker of software that helps companies analyze Web data, surged on its first day of trading after pricing its shares 70 percent above the originally proposed range in an initial public offering.

The stock, listed on the Nasdaq Stock Market under the symbol SPLK, rose 88 percent to $32 at 11:19 a.m. in New York. The San Francisco-based company raised $229.5 million in its IPO, selling 13.5 million shares at $17 apiece, it said in a statement yesterday. Through the sale, Splunk gained a market value of $1.57 billion.

Splunk is the first of the so-called big data companies to go public, providing software that helps businesses monitor and analyze data to improve service, cut operations costs and reduce security risks. Revenue in the fiscal year that ended Jan. 31 rose 83 percent to $121 million. Splunk’s net loss widened to $11 million from $3.8 million a year earlier as the company stepped up spending on sales and marketing.

Most technology companies that have gone public this year have gained since their offerings. Among the best performers are Guidewire Software Inc. (GWRE) (GWRE), which has more than doubled, and Millennial Media Inc., which has climbed 47 percent. Infoblox Inc., a network and data-services provider, and Proofpoint Inc., a maker of security software, are also scheduled to start trading this week.

Big Competitors

Splunk said in its filing that it has a variety of competitors, including some of the world’s biggest software and Web companies. The company namedGoogle Inc. (GOOG) (GOOG) and Adobe Systems Inc. in the Web analytics market, and business intelligence vendors EMC Corp. (EMC) (EMC) and International Business Machines Corp.

Splunk is backed by venture firms August Capital, JK&B Capital, Ignition Partners and Sevin Rosen Funds. The company raised $25 million in 2007 to expand sales and marketing, build its international operations and develop partnerships.

Founded in 2004 by Erik Swan, Rob Das and Michael Baum, Splunk’s software was used by more than 3,700 customers, including Bank of America Corp., Zynga Inc. and Harvard University, as of Jan. 31.

The company is led by Godfrey Sullivan, who joined as chief executive officer in 2008. He was previously CEO of Hyperion Solutions Corp. and helped sell the company for $3.3 billion to Oracle Corp. (ORCL) (ORCL) in 2007. Sullivan replaced co-founder Baum, who left Splunk and is now a venture partner at investment firm Rembrandt Venture Partners.

On April 16, Splunk increased the proposed price range for its IPO to $11 to $13 apiece, from a prior target of $8 to $10.

Jaspersoft Closes Record Fiscal Year

MarketWatch recently underline the Jaspersoft record fiscal year, pushed by the Big Data market

Significant sales growth, strategic partnerships, and global expansion are highlights of substantial momentum for leading open source Business Intelligence (BI) company


 Jaspersoft added nearly 500 new customers, capitalizing on industry demand for business intelligence solutions around Big Data, embeddable BI, and data-driven applications. New customers in 2011 include CardSmith, Push to Test, the National Institute of Food and Agriculture, Revol Wireless, and QMetry.

During 2011, Jaspersoft advanced strategic partnerships with IBM, EMC and Red Hat to support its Big Data and Platform-as-a-Service (PaaS) initiatives. These partnerships, aligned with innovative new products, delivered easy and powerful reporting capabilities using any data source and a choice of deployment models including on-premises and public, private or hybrid clouds.

Key 2011 Jaspersoft milestones included:

Release of Jaspersoft 4, the first end-to-end BI platform for web applications;

Release of the first native reporting BI platform for any Big Data system;

Release of Jaspersoft 4.1, featuring a new GUI for relational, non-relational, and OLAP data sources;

Release of Jaspersoft Studio, an Eclipse-based version of the popular iReport development environment;

Release of Jaspersoft 4.2 with iPad support and the BI industry’s first iOS mobile SDK;

Release of Jaspersoft 4.5 with superior integrated Big Data Analytics and a host of related performance improvements;

Splunk filling for IPO and may be value $1 billion

According to Bloomberg, Splunk Inc. hired Morgan Stanley to lead an initial public offering that may value the software maker at about $1 billion.

For the record, Godfrey Sullivan has served as Splunk’s CEO since 2008. Prior to Splunk, Sullivan was CEO of Hyperion Solutions Corp., which he helped sell to Oracle for $3.3 billion in 2007.

Splunk might quickly be revealed as a first choice target, Dell ? Microsoft ? IBM ?  Webtrends ? Oracle ? SAP or EMC ?  Seems it’s time for Big Data technologies to capitalize.

Splunk currently has over 3300 customers including Bank of America, Zynga, and Comcast.