Windows Azure Blog
Microsoft Cloud Computing Platform
Microsoft Store
  • Home
  • Windows Azure Team Blog
You are here : Windows Azure Blog » OakLeaf Systems » Introducing Apache Hadoop Services for Windows Azure

Introducing Apache Hadoop Services for Windows Azure

Posted On Monday, January 30, 2012 By rss. Under OakLeaf Systems    

imageThe SQL Server Team (@SQLServer) announced Apache Hadoop Services for Windows Azure, a.k.a. Apache Hadoop on Windows Azure and Hadooop on Azure, at the Profesional Association for SQL Server (PASS) Summit in October 2011.

Val Fontama’s Availability of Community Technology Preview (CTP) of Hadoop based Service on Windows Azure post of 12/14/2011 described how to obtain an invitation the CTP:

imageIn October at the PASS Summit 2011, Microsoft announced expanded investments in “Big Data”, including a new Apache Hadoop™ based distribution for Windows Server and service for Windows Azure. In doing so, we extended Microsoft’s leadership in BI and Data Warehousing, enabling our customers to glean and manage insights for any data, any size, anywhere. We delivered on our promise this past Monday, when we announced the release of the Community Technology Preview (CTP) of our Hadoop based service for Windows Azure.

imageToday this preview is available to an initial set of customers. Those interested in joining the preview may request to do so by filling out this survey. Microsoft will issue a code that will be used by the selected customers to access the Hadoop based Service. We look forward to making it available to the general public in early 2012. Customers will gain the following benefits from this preview:

  • imageBroader access to Hadoop through simplified deployment and programmability. Microsoft has simplified setup and deployment of Hadoop, making it possible to setup and configure Hadoop on Windows Azure in a few hours instead of days. Since the service is hosted on Windows Azure, customers only download a package that includes the Hive Add-in and Hive ODBC Driver. In addition, Microsoft has introduced new JavaScript libraries to make JavaScript a first class programming language in Hadoop. Through this library JavaScript programmers can easily write MapReduce programs in JavaScript, and run these jobs from simple web browsers. These improvements reduce the barrier to entry, by enabling customers to easily deploy and explore Hadoop on Windows.
  • Breakthrough insights through integration Microsoft Excel and BI tools. This preview ships with a new Hive Add-in for Excel that enables users to interact with data in Hadoop from Excel. With the Hive Add-in customers can issue Hive queries to pull and analyze unstructured data from Hadoop in the familiar Excel. Second, the preview includes a Hive ODBC Driver that integrates Hadoop with Microsoft BI tools. This driver enables customers to integrate and analyze unstructured data from Hadoop using award winning Microsoft BI tools such as PowerPivot and PowerView. As a result customers can gain insight on all their data, including unstructured data stored in Hadoop.
  • Elasticity, thanks to Windows Azure. This preview of the Hadoop based service runs on Windows Azure, offering an elastic and scalable platform for distributed storage and compute.

We look forward to your feedback! Learn more at www.microsoft.com/bigdata.
Val Fontama
Senior Product Manager
SQL Server Product Management


Following is a step-by-step tutorial for running the first process of the 10GB GraySort sample project:

1. After you receive your invitation code, navigate to https://www.hadooponazure.com/ and log-in with your Windows Live ID and invitation code to open the Account page with the Request a New Cluster content active. Type a globally unique DNS Name for your cluster, hadoop1 for this example, select a Cluster Size (Large for this example), and type a administrative Username, Password and password confirmation:

imageClick screen captures for a full-size (1024×768-px) image.

Note: There is no charge for Windows Azure resources used during the CTP, so you don’t need to provide a credit card to create your cluster.

2. When your cluster is provisioned, the Account page’s content changes to include tiles to create a new job as well as access your cluster by different methods:

image

Note: You must renew your cluster every three days.

3. Click the Samples tile to open the Account/Samples page, which describes the currently available samples.

image

4. The GraySort MapReduce sample is a useful starting point because it runs in a reasonably short time (about 4 minutes with a Large cluster), so click the 10GB GraySort tile to open its Account/SampleDetails page, which describes the sample:

image

5. Click the Deploy to Your Cluster button to automatically populate text boxes with values for the TeraGen program, which generates 10 GB of data:

image

Note: If you have tried SQL Azure Labs’ Microsoft Codename “Data Numerics” CTP, you’ll notice that the process for creating the Hadoop cluster and executing the first MapReduce job is much more automated that that described in my Deploying “Cloud Numerics” Sample Applications to … post of 1/28/2012 (updated 1/30/2012).

5. Click the Execute Job button to run the TeraGen program, which initially displays this Job Info page:

image

6. After a few seconds, the program begins adding lines of Debut Output for the 50 maps in increments close to 1 percent:

image

Note: Hadoop automatically repairs the failures reported above, but it’s surprising that lines for 78 and 79 percent are missing.

7. When processing completes, click the left-arrow button to return to the Account page with a tile for the TeraGen process added:

image

8. Click the Job History tile to display a summary of the preceding operation, which confirms successful completion with an Exit Code = Ok cell:

image

9. Click the left-arrow button to return to the main Accounts page and click the Manage Cluster tile to display total storage used (30 GB) and other data source options (Data Market, Windows Azure blob storage, and Amazon S3):

image

Stay tuned for more details about the second and third MapReduce operations.


Apache Hadoop on Windows Azure Resources

Wesley McSwain posted a Apache Hadoop Based Services for Windows Azure How To Guide to the TechNet wiki on 12/13/2011. The latest update when this post was written was 1/18/2012. Here’s its content:

This content is a work in progress for the benefit of the Hadoop Community. Please feel free to contribute to this wiki page based on your expertise and experience with Hadoop.

If you have any questions, please use the groups DL http://tech.groups.yahoo.com/group/hadooponazurectp/

Table of Contents
  • How-Tos
  • FAQs
  • More Information
  • Blogs / Twitter to Follow
How-Tos
  1. Setup your Hadoop cluster
    • Provision a temporary Hadoop cluster on Microsoft’s Elastic Map Reduce Portal
    • Provision a new Hadoop cluster on your Windows Azure subscription.
    • Provision a new Hadoop Cluster on your on-premise hardware.
  2. Running Sample Jobs
    • How to run Sample Pi Estimator Job
    • Running Sample WordCount Hadoop Job with a few twists
    • Running 10GB Sort Hadoop Job with Teragen, TeraSort and TeraValidate Options
  3. Writing your own Job and running on Cluster
    • Writing your very own WordCount Hadoop Job in Java and deploying to Windows Azure Cluster
    • Running a JavaScript Map/Reduce Job
  4. Job Administration
    • Understanding MapReduce Job administration by running 10GB Sort Hadoop Job with TeraSort Option
  5. Interactive Console:
    • Tasks with the Interactive JavaScript Console
      • How to create and run a JavaScript Map Reduce Job
    • Tasks with Hive on the Interactive Console
      • How to run Hive Queries from the Interactive Console
  6. Remote Desktop
    • How to remote login to Hadoop Cluster
    • Using The Hadoop Command Shell
    • View the Job Tracker
    • View Hdfs
  7. Connecting Windows Azure Blob Storage from Hadoop Cluster
    • Configuring Hadoop Cluster to connect with Azure Storage
    • Running a Hadoop job using Azure Storage as Input and Output paramteres
  8. Open Ports
    • How to connect Excel to Hadoop on Azure via HiveODBC
    • How to FTP data to Hadoop on Azure
    • How to SFTP data to Hadoop on Azure
  9. Manage Data
    • Import Data from Data Market
    • Setup ASV – Use your Windows Azure Blob Storage Account
    • Setup S3 – Use your Amazon S3 account
  10. Apache Hadoop on Windows Azure:
    • Tips and Tricks to manage your Hadoop Cluster
    • Running Apache Pig (Pig Latin) at Apache Hadoop on Windows Azure
  11. Scenarios
    • Querying a Web Log File via HiveQL
FAQs
  • Frequently asked questions with Hadoop on Windows Azure
More Information
  • See Apache Hadoop On Windows.
Blogs / Twitter to Follow

Below are some blogs to follow on Hadoop on Azure [links added]

  • Alexander Stojanovic (Founder and [General Manager]) of Hadoop on Azure and Windows), @stojanovic, http://conceptualorigami.blogspot.com/
  • Dave Vronay, @davevr, http://dvronay.blogspot.com/2011/12/design-of-portal-for-hadooponazurecom.html
  • Brad Sarsfield, @bradoop
  • Denny Lee, @dennylee, http://dennyglee.com/
  • Avkash Chauhan, @avkashchauhan, http://blogs.msdn.com/b/avkashchauhan/

Technorati Tags: Windows Azure,Apache Hadoop on Windows Azure,Apache Hadoop,Hadoop,MapReduce,Cloud Computing,Big Data,Big Data Analytics,Windows Azure Marketplace DataMarketDataMarket,Windows Azure Blobs,Amazon S3

http://oakleafblog.blogspot.com/2012/01/introducing-apache-hadoop-services-for.html

Share this:

  • Print
  • Email
  • Facebook
  • Twitter
  • Digg
  • Reddit
  • StumbleUpon
« Microsoft BizSpark to Offer Startups $60,000 in Cloud Services on Windows Azure
How to Deploy Stored Procedures to a Federated Database in Azure »
  • Categories
    • AppFabric Team Blog (13)
    • Channel 9 (440)
    • cloud development blog (42)
    • Cloudy in Seattle (10)
    • Convective (12)
    • Matias Woloski (15)
    • Nick Harris .NET – Enterprise Development with Azure, ASP .NET MVC and Windows Phone 7 (50)
    • OakLeaf Systems (427)
    • Scott Hanselman's Blog (31)
    • ScottGu (18)
    • SQL Azure Team Blog (29)
    • Stack Overflow Azure (7543)
    • Uncategorized (16)
    • Wade Wegner (19)
    • Windows Azure Developer Tools Team (25)
    • Windows Azure Marketplace DataMarket Blog (26)
    • Windows Azure Storage Team Blog (63)
    • Windows Azure Team Blog (563)
    • Windows Phone Developer Blog (56)
    • Zane Adam's blog (22)
  • Translator
    English flagItalian flagKorean flagChinese (Simplified) flagChinese (Traditional) flagPortuguese flagGerman flagFrench flagSpanish flagJapanese flagArabic flagRussian flagGreek flagDutch flagBulgarian flagCzech flagCroatian flagDanish flagFinnish flagHindi flagPolish flagRomanian flagSwedish flagNorwegian flagCatalan flagFilipino flagHebrew flagIndonesian flagLatvian flagLithuanian flagSerbian flagSlovak flagSlovenian flagUkrainian flagVietnamese flagAlbanian flagEstonian flagGalician flagMaltese flagThai flagTurkish flagHungarian flagBelarus flagIrish flagIcelandic flagMacedonian flagMalay flagPersian flag
  • Recent Posts
    • Azure Cache throws exception
    • CSS Transition effects missing after uploading to Azure
    • Is it possible to update configuration settings programmatically?
    • Using Jquery to update an Azure web page
    • ASP .NET MVC4 to WCF DataServices
  • Advertisements

  • RSS

    Windows Azure Blog

  • Twitter
  • Categories
    AppFabric Team Blog Channel 9 cloud development blog Cloudy in Seattle Convective Matias Woloski Nick Harris .NET - Enterprise Development with Azure, ASP .NET MVC and Windows Phone 7 OakLeaf Systems ScottGu Scott Hanselman's Blog SQL Azure Team Blog Stack Overflow Azure Uncategorized Wade Wegner Windows Azure Developer Tools Team Windows Azure Marketplace DataMarket Blog Windows Azure Storage Team Blog Windows Azure Team Blog Windows Phone Developer Blog Zane Adam's blog
Mocell WordPress Theme By MagPress.com
Thanks to Cat Lovers | Meet Locals | Florida Chat
Copyright © 2013. All Rights Reserved.
loading Cancel
Post was not sent - check your email addresses!
Email check failed, please try again
Sorry, your blog cannot share posts by email.