Ted Kummert announced on 10/12/2011 in his PASS Summit 2011 keynote a partnership with Hortonworks to port Apache Hadoop to SQL Azure by the end of 2011. From the Microsoft Expands Data Platform With SQL Server 2012, New Investments for Managing Any Data, Any Size, Anywhere press release of the same date:
Microsoft is committed to helping customers manage any data, any size, anywhere with the SQL Server data platform, Windows Server and Windows Azure. Hortonworks has a rich history in leading the design and development of Apache Hadoop. Their experience and expertise in this space helps us accelerate our delivery of our Hadoop based distribution on Windows Server and Windows Azure while maintaining compatibility and interoperability with the broader ecosystem.
Ted posted Microsoft Expands Data Platform to Help Customers Manage the ‘New Currency of the Cloud’ at 9:00 AM:
This morning, I gave a keynote at the PASS Summit 2011 here in Seattle, a gathering of about 4,000 IT professionals and developers worldwide. I talked about Microsoft’s roadmap for helping customers manage and analyze any data, of any size, anywhere — on premises, and in the private or public cloud.
Microsoft makes this possible through SQL Server 2012 and through new investments to help customers manage ‘big data’, including an Apache Hadoop-based distribution for Windows Server and Windows Azure and a strategic partnership with Hortonworks. Our announcements today highlight how we enable our customers to take advantage of the cloud to better manage the ‘currency’ of their data.
We often talk about the economics of the cloud, detailing how customers can achieve unmatched economies of scale by taking advantage of public or private cloud architectures. As an example, an enterprise with a small incubation project could theoretically take it to production overnight, thanks to the elasticity and scalability benefits of the cloud.
As we turn more and more to the cloud, data becomes its currency. The exchange of data is the heart of all cloud transactions, and, as in a real-world economy, more value is created whenever data is generated or consumed. But there are new business challenges that this currency creates: How do we deal with the scope and scale of the data we manage? How do we deal with the diversity of types and sources of data? How do we most efficiently process and gain insight from datasets ranging from megabytes to petabytes?
How do we bring the world’s data to bear on the tasks of the enterprise, as businesses ask themselves questions like: “What can data from social media sites tell me about the sentiment of my brands and products?” And, how do we enable all end-users to gain the critical business insights they need – no matter where they are and what device they are using? Customers need a data platform that fully embraces the cloud, the diversity and scale of data both inside and outside of their ‘firewall’ and gives all end-users a way to translate data into insights – wherever they are.
Microsoft has a rich, decades-long legacy in helping customers get more value from their data. Beginning with OLAP Services in SQL Server 7, and extending to SQL Server 2012 features that span beyond relational data, we have a solid foundation for customers to take advantage of today. The new addition of an Apache Hadoop-based distribution for Windows Azure and Windows Server is the next building block, seamlessly connecting all data sizes and types. Coupled with our new investments in mobile business intelligence, and the expansion of our data ecosystem, we are advancing data management in a whole new way. …
Ted introduced Hortonworks’ Eric Baldeschwieler who reported “Yahoo now has 40,000 computers running Apache Hadoop”, “Over 80 percent of new data being generated is from unstructured sources” and “Hadoop could be storing half the world’s data within five years.”
Kummert said a Community Technology Preview (CTP) of the Hadoop-based service for Windows Azure will be available by the end of 2011, and a CTP of the Hadoop-based service for Windows Server will follow in 2012.
Denny Lee demonstrated a HiveQL query against log data in a Hadoop for Windows database with a HiveODBC driver that Ted Kummert said will be available as a CTP next month (November 2011):
Denny’s Revelations – rolling the hard six to SQL BI and Hadoop post of 11/12/2011 provides more information on Apache Hadoop in SQL Azure and SQL Server:
Okay! With today’s Ted Kummert’s Day 1 Keynote of the SQL Server PASS Summit 2011, I had the honor of demonstrating how SQL BI and Hadoop rock together! As you can see from the Port 25 Microsoft, Hadoop, and Big Data and the Microsoft News Center for SQL Server 2012 posts there are a number of cool things that are happening:
- It started with the Hadoop connectors for SQL Server and PDW. Key call out here is that these connectors are bi-directional to allow data movement back and forth between SQL Server and Hadoop.
- Windows Server and Windows Azure optimized Hadoop distributions; out of the box (or cloud), the distributions includes support for HDFS, Hive, Pig-Latin, FTP, etc.
- Our partnership with Hortonworks to help us push forward faster with optimizing Hadoop to run on Windows as noted in their post Bringing Apache Hadoop to Windows.
- As part of the demo today, I showed the integration of the SQL BI stack with Hadoop by having PowerPivot (for Excel and SharePoint) interact with Hadoop for Windows cluster via Hive and the soon to be released HiveODBC driver.
- Not shown today, but just as cool will be the release of the Excel Hive Add-in
More information will be posted at www.microsoft.com/bigdata as it becomes available, eh?!
Cool, so why did I use “embrace Hadoop”?
A key call out during my conversation with Ted during the keynote is that our offering is 100% compatible with Apache Hadoop – if your code works on Apache Hadoop then it will work on ours and vice versa. But, it’s not just about the code, it’s also about this shift that we are embracing the open source community!
Our VB moment in Big Data
So why is Big Data / Hadoop important for a BI dude or dudette?
I’ll probably have a number of posts to for this question alone, but let me give you one answer right now – this is an excerpt from my post: “Hadoop: A movement, not just a technology”
Why am I excited about Hadoop and Big Data even though I’m a Microsoft BI person for most of my career? Because first and foremost, BI is all about making sense of the information. And the greatness of Big Data isn’t just about exploring, understanding, and asking even more questions of this information, but doing it in distribution (vs. silos) and putting more emphasis on the data (i.e. this is where the real IP is)
Any other cool information on Big Data at SQLPASS this week?
Both Ted Kummert and David DeWitt’s keynotes will cover Big Data. If you cannot attend, check out the SQL Server PASS Summit 2011 Live Streaming. As well, there are two breakout sessions on Big Data, both on Thursday:
- AD-216-M: Overview of Big Data on Windows and Windows Azure by Saptak Sen
- BIA-408-A: SQLCAT: Tier-1 BI in the world of Big Data by Thomas Kejser and myself – with special guest Kenneth Lieu from Yahoo!
Also don’t forget that I will be hosting the Big Data table at the Birds of Feather luncheon and a bunch of us will be floating around the Big Data Kiosk in the product pavilion.
Whew! I think that’s it for today!
For more details about Big Data in the Cloud, see my Choosing a cloud data store for big data (June 2011) and Microsoft’s, Google’s big data [analytics] plans give IT an edge and links to Resources (August 2011) for SearchCloudComputing.com.