Webmaster Tips » Webmaster Books » Apache

Sort by:

Apache books

Apache Spark in 24 Hours, Sams Teach Yourself

Author: Jeffrey Aven
List price: $44.99
Amazon price: $21.93   Book details at Amazon.com
Average rating:  / 0 (0 reviews)
Publisher: Sams Publishing (27 August 2016)

Apache Spark is a fast, scalable, and flexible open source distributed processing engine for big data systems and is one of the most active open source big data projects to date. In just 24 lessons of one hour or less, Sams Teach Yourself Apache Spark in 24 Hours helps you build practical Big Data solutions that leverage Spark’s amazing speed, scalability, simplicity, and versatility.
This book’s straightforward, step-by-step approach shows you how to deploy, program, optimize, manage, integrate, and extend Spark–now, and for years to come. You’ll discover how to create powerful solutions encompassing cloud computing, real-time stream processing, machine learning, and more. Every lesson builds on what you’ve already learned, giving you a rock-solid foundation for real-world success.
Whether you are a data analyst, data engineer, data scientist, or data steward, learning Spark will help you to advance your career or embark on a new career in the booming area of Big Data.
Learn how to
• Discover what Apache Spark does and how it fits into the Big Data landscape
• Deploy and run Spark locally or in the cloud
• Interact with Spark from the shell
• Make the most of the Spark Cluster Architecture
• Develop Spark applications with Scala and functional Python
• Program with the Spark API, including transformations and actions
• Apply practical data engineering/analysis approaches designed for Spark
• Use Resilient Distributed Datasets (RDDs) for caching, persistence, and output
• Optimize Spark solution performance
• Use Spark with SQL (via Spark SQL) and with NoSQL (via Cassandra)
• Leverage cutting-edge functional programming techniques
• Extend Spark with streaming, R, and Sparkling Water
• Start building Spark-based machine learning and graph-processing applications
• Explore advanced messaging technologies, including Kafka
• Preview and prepare for Spark’s next generation of innovations

Instructions walk you through common questions, issues, and tasks; Q-and-As, Quizzes, and Exercises build and test your knowledge; "Did You Know?" tips offer insider advice and shortcuts; and "Watch Out!" alerts help you avoid pitfalls. By the time you're finished, you'll be comfortable using Apache Spark to solve a wide spectrum of Big Data problems.

Apache Spark 2 for Beginners

Author: Rajanarayanan Thottuvaikkatumana
List price: $39.99
Amazon price: $39.99   Book details at Amazon.com
Average rating:  / 0 (0 reviews)
Publisher: Packt Publishing - ebooks Account ( 6 October 2016)

Key Features

  • This book offers an easy introduction to the Spark framework published on the latest version of Apache Spark 2
  • Perform efficient data processing, machine learning and graph processing using various Spark components
  • A practical guide aimed at beginners to get them up and running with Spark
Book Description

Spark is one of the most widely-used large-scale data processing engines and runs extremely fast. It is a framework that has tools that are equally useful for application developers as well as data scientists.

This book starts with the fundamentals of Spark 2 and covers the core data processing framework and API, installation, and application development setup. Then the Spark programming model is introduced through real-world examples followed by Spark SQL programming with DataFrames. An introduction to SparkR is covered next. Later, we cover the charting and plotting features of Python in conjunction with Spark data processing. After that, we take a look at Spark's stream processing, machine learning, and graph processing libraries. The last chapter combines all the skills you learned from the preceding chapters to develop a real-world Spark application.

By the end of this book, you will have all the knowledge you need to develop efficient large-scale applications using Apache Spark.

What you will learn
  • Get to know the fundamentals of Spark 2 and the Spark programming model using Scala and Python
  • Know how to use Spark SQL and DataFrames using Scala and Python
  • Get an introduction to Spark programming using R
  • Perform Spark data processing, charting, and plotting using Python
  • Get acquainted with Spark stream processing using Scala and Python
  • Be introduced to machine learning using Spark MLlib
  • Get started with graph processing using the Spark GraphX
  • Bring together all that you've learned and develop a complete Spark application
About the Author

Rajanarayanan Thottuvaikkatumana, Raj, is a seasoned technologist with more than 23 years of software development experience at various multinational companies. He has lived and worked in India, Singapore, and the USA, and is presently based out of the UK. His experience includes architecting, designing, and developing software applications. He has worked on various technologies including major databases, application development platforms, web technologies, and big data technologies. Since 2000, he has been working mainly in Java related technologies, and does heavy-duty server-side programming in Java and Scala. He has worked on very highly concurrent, highly distributed, and high transaction volume systems. Currently he is building a next generation Hadoop YARN-based data processing platform and an application suite built with Spark using Scala.

Raj holds one master's degree in Mathematics, one master's degree in Computer Information Systems and has many certifications in ITIL and cloud computing to his credit. Raj is the author of Cassandra Design Patterns - Second Edition, published by Packt.

When not working on the assignments his day job demands, Raj is an avid listener to classical music and watches a lot of tennis.

Table of Contents
  1. Spark Fundamentals
  2. Spark Programming Model
  3. Spark SQL
  4. Spark Programming with R
  5. Spark Data Analysis with Python
  6. Spark Stream Processing
  7. Spark Machine Learning
  8. Spark Graph Processing
  9. Designing Spark Applications

Apache Drill: The SQL query engine for Hadoop and NoSQL

Author: Ted Dunning
List price: $29.99
Amazon price: $29.96   Book details at Amazon.com
Average rating:  / 0 (0 reviews)
Publisher: O'Reilly Media (25 December 2016)

Apache Drill is a significant new tool in the Hadoop ecosystem that enables users to execute queries in a Hadoop cluster and get results quickly. This practical book provides a first touch introduction to Drill and its ability to handle large files containing data in flexible formats with nested data structures and tables.

Developers and analysts with moderate technical skills will learn the basics of installing and running Drill, and advanced users will understand how incorporate the framework into complex programs, such as using Drill queries to replace some of the MapReduce operations in a large-scale program.

  • Gain a basic understanding of how Apache Drill works and what it helps you do
  • Get a complete language reference to Drill
  • Learn the use cases that make the most sense for Drill
  • Use a detailed technical reference for extending the framework

Pro Apache Phoenix: An SQL Driver for HBase

Author: Shakil Akhtar
List price: $29.99
Amazon price: $19.11   Book details at Amazon.com
Average rating:  / 0 (0 reviews)
Publisher: Apress (30 December 2016)

Leverage Phoenix as an ANSI SQL engine built on top of the highly distributed and scalable NoSQL framework HBase. Learn the basics and best practices that are being adopted in Phoenix to enable a high write and read throughput in a big data space.

This book includes real-world cases such as Internet of Things devices that send continuous streams to Phoenix, and the book explains how key features such as joins, indexes, transactions, and functions help you understand the simple, flexible, and powerful API that Phoenix provides. Examples are provided using real-time data and data-driven businesses that show you how to collect, analyze, and act in seconds.

Pro Apache Phoenix covers the nuances of setting up a distributed HBase cluster with Phoenix libraries, running performance benchmarks, configuring parameters for production scenarios, and viewing the results. The book also shows how Phoenix plays well with other key frameworks in the Hadoop ecosystem such as Apache Spark, Pig, Flume, and Sqoop.

You will learn how to:

  • Handle a petabyte data store by applying familiar SQL techniques
  • Store, analyze, and manipulate data in a NoSQL Hadoop echo system with HBase
  • Apply best practices while working with a scalable data store on Hadoop and HBase
  • Integrate popular frameworks (Apache Spark, Pig, Flume) to simplify big data analysis
  • Demonstrate real-time use cases and big data modeling techniques

Who This Book Is For
Data engineers, Big Data administrators, and architects.

Apache Maven Cookbook

Author: Raghuram Bharathan
List price: $49.99
Amazon price: $42.21   Book details at Amazon.com
Average rating:  / 0 (0 reviews)
Publisher: Packt Publishing - ebooks Account (30 April 2015)

Over 90 hands-on recipes to successfully build and automate development life cycle tasks following Maven conventions and best practices

About This Book
  • Understand the features of Apache Maven that makes it a powerful tool for build automation
  • Full of real-world scenarios covering multi-module builds and best practices to make the most out of Maven projects
  • A step-by-step tutorial guide full of pragmatic examples
Who This Book Is For

If you are a Java developer or a manager who has experience with Apache Maven and want to extend your knowledge, then this is the ideal book for you.

Apache Maven Cookbook is for those who want to learn how Apache Maven can be used for build automation. It is also meant for those familiar with Apache Maven, but want to understand the finer nuances of Maven and solve specific problems.

What You Will Learn
  • Install Apache Maven successfully on your preferred OS
  • Explore the various features of Apache Maven to build efficient automation tools
  • Discover when and how to use the various Apache Maven plugins
  • Generate and publish your project documentation using Apache Maven
  • Analyze and control code quality and code coverage using Apache Maven
  • Build various types of Java projects as well as other binaries
  • Set up complex projects using the concept of inheritance
In Detail

Apache Maven offers a comprehensive set of features to build, test, release, and deploy software projects and maintain enterprise development infrastructure.

This book is a hands-on guide that enables you to explore the vast potential of Apache Maven, the leading software build tool. You will start off by successfully installing Apache Maven on your favorite OS, and then you will create your first working project. Furthermore, the book walks you through setting up and using Maven with popular Java Integrated Development Environments (IDEs) so you can exploit Maven features to build standard Java applications. Above all, you will also learn to create site reports and documentation for your project along with handling typical build requirements and other advanced Maven usages.

Apache Hive Essentials

Author: Dayong Du
List price: $39.99
Amazon price: $30.66   Book details at Amazon.com
Average rating:  / 0 (0 reviews)
Publisher: Packt Publishing - ebooks Account (27 March 2015)

Immerse yourself on a fantastic journey to discover the attributes of big data by using Hive

About This Book
  • Discover how Hive can coexist and work with other tools in the Hadoop ecosystem to create big data solutions
  • Grasp the skills needed, learn the best practices, and avoid the pitfalls in writing efficient Hive queries to analyze the big data
  • Create an environment to analyze big data using practical, example-oriented scenarios
Who This Book Is For

If you are a data analyst, developer, or simply someone who wants to use Hive to explore and analyze data in Hadoop, this is the book for you. Whether you are new to big data or an expert, with this book, you will be able to master both the basic and the advanced features of Hive. Since Hive is an SQL-like language, some previous experience with the SQL language and databases is useful to have a better understanding of this book.

What You Will Learn
  • Create and set up the Hive environment
  • Discover how to use Hive's definition language to describe data
  • Discover interesting data by joining and filtering datasets in Hive
  • Transform data by using Hive sorting, ordering, and functions
  • Aggregate and sample data in different ways
  • Boost Hive query performance and enhance data security in Hive
  • Customize Hive to your needs by using user-defined functions and integrate it with other tools
In Detail

In this book, we prepare you for your journey into big data by firstly introducing you to backgrounds in the big data domain along with the process of setting up and getting familiar with your Hive working environment. Next, the book guides you through discovering and transforming the values of big data with the help of examples. It also hones your skill in using the Hive language in an efficient manner. Towards the end, the book focuses on advanced topics such as performance, security, and extensions in Hive, which will guide you on exciting adventures on this worthwhile big data journey.

By the end of the book, you will be familiar with Hive and able to work efficiently to find solutions to big data problems.

Apache Spark Machine Learning Blueprints

Author: Alex Liu
List price: $39.99
Amazon price: $32.86   Book details at Amazon.com
Average rating:  / 0 (0 reviews)
Publisher: Packt Publishing (30 May 2016)

Key Features

  • Customize Apache Spark and R to fit your analytical needs in customer research, fraud detection, risk analytics, and recommendation engine development
  • Develop a set of practical Machine Learning applications that can be implemented in real-life projects
  • A comprehensive, project-based guide to improve and refine your predictive models for practical implementation
Book Description

There's a reason why Apache Spark has become one of the most popular tools in Machine Learning – its ability to handle huge datasets at an impressive speed means you can be much more responsive to the data at your disposal. This book shows you Spark at its very best, demonstrating how to connect it with R and unlock maximum value not only from the tool but also from your data.

Packed with a range of project "blueprints" that demonstrate some of the most interesting challenges that Spark can help you tackle, you'll find out how to use Spark notebooks and access, clean, and join different datasets before putting your knowledge into practice with some real-world projects, in which you will see how Spark Machine Learning can help you with everything from fraud detection to analyzing customer attrition. You'll also find out how to build a recommendation engine using Spark's parallel computing powers.

What you will learn
  • Set up Apache Spark for machine learning and discover its impressive processing power
  • Combine Spark and R to unlock detailed business insights essential for decision making
  • Build machine learning systems with Spark that can detect fraud and analyze financial risks
  • Build predictive models focusing on customer scoring and service ranking
  • Build a recommendation systems using SPSS on Apache Spark
  • Tackle parallel computing and find out how it can support your machine learning projects
  • Turn open data and communication data into actionable insights by making use of various forms of machine learning
About the Author

Alex Liu is an expert in research methods and data science. He is currently one of IBM's leading experts in Big Data analytics and also a lead data scientist, where he serves big corporations, develops Big Data analytics IPs, and speaks at industrial conferences such as STRATA, Insights, SMAC, and BigDataCamp. In the past, Alex served as chief or lead data scientist for a few companies, including Yapstone, RS, and TRG. Before this, he was a lead consultant and director at RMA, where he provided data analytics consultation and training to many well-known organizations, including the United Nations, Indymac, AOL, Ingram Micro, GEM, Farmers Insurance, Scripps Networks, Sears, and USAID. At the same time, he taught advanced research methods to PhD candidates at University of Southern California and University of California at Irvine. Before this, he worked as a managing director for CATE/GEC and as a research fellow for the Asia/Pacific Research Center at Stanford University. Alex has a Ph.D. in quantitative sociology and a master's degree of science in statistical computing from Stanford University.

Table of Contents
  1. Spark for Machine Learning
  2. Data Preparation for Spark ML
  3. A Holistic View on Spark
  4. Fraud Detection on Spark
  5. Risk Scoring on Spark
  6. Churn Prediction on Spark
  7. Recommendations on Spark
  8. Learning Analytics on Spark
  9. City Analytics on Spark
  10. Learning Telco Data on Spark
  11. Modeling Open Data on Spark

Learning Spark: Lightning-Fast Big Data Analysis

Author: Holden Karau
List price: $39.99
Amazon price: $21.23   Book details at Amazon.com
Average rating:  / 0 (0 reviews)
Publisher: O'Reilly Media (27 February 2015)

Data in all domains is getting bigger. How can you work with it efficiently? Recently updated for Spark 1.3, this book introduces Apache Spark, the open source cluster computing system that makes data analytics fast to write and fast to run. With Spark, you can tackle big datasets quickly through simple APIs in Python, Java, and Scala. This edition includes new information on Spark SQL, Spark Streaming, setup, and Maven coordinates.

Written by the developers of Spark, this book will have data scientists and engineers up and running in no time. You’ll learn how to express parallel jobs with just a few lines of code, and cover applications from simple batch jobs to stream processing and machine learning.

  • Quickly dive into Spark capabilities such as distributed datasets, in-memory caching, and the interactive shell
  • Leverage Spark’s powerful built-in libraries, including Spark SQL, Spark Streaming, and MLlib
  • Use one programming paradigm instead of mixing and matching tools like Hive, Hadoop, Mahout, and Storm
  • Learn how to deploy interactive, batch, and streaming applications
  • Connect to data sources including HDFS, Hive, JSON, and S3
  • Master advanced topics like data partitioning and shared variables

Streaming Architecture: New Designs Using Apache Kafka and MapR Streams

Author: Ted Dunning
List price: $24.99
Amazon price: $12.25   Book details at Amazon.com
Average rating:  / 0 (0 reviews)
Publisher: O'Reilly Media (26 May 2016)

More and more data-driven companies are looking to adopt stream processing and streaming analytics. With this concise ebook, you’ll learn best practices for designing a reliable architecture that supports this emerging big-data paradigm.

Authors Ted Dunning and Ellen Friedman (Real World Hadoop) help you explore some of the best technologies to handle stream processing and analytics, with a focus on the upstream queuing or message-passing layer. To illustrate the effectiveness of these technologies, this book also includes specific use cases.

Ideal for developers and non-technical people alike, this book describes:

  • Key elements in good design for streaming analytics, focusing on the essential characteristics of the messaging layer
  • New messaging technologies, including Apache Kafka and MapR Streams, with links to sample code
  • Technology choices for streaming analytics: Apache Spark Streaming, Apache Flink, Apache Storm, and Apache Apex
  • How stream-based architectures are helpful to support microservices
  • Specific use cases such as fraud detection and geo-distributed data streams

Ted Dunning is Chief Applications Architect at MapR Technologies, and active in the open source community. He currently serves as VP for Incubator at the Apache Foundation, as a champion and mentor for a large number of projects, and as committer and PMC member of the Apache ZooKeeper and Drill projects. Ted is on Twitter as @ted_dunning.

Ellen Friedman, a committer for the Apache Drill and Apache Mahout projects, is a solutions consultant and well-known speaker and author, currently writing mainly about big data topics. With a PhD in Biochemistry, she has years of experience as a research scientist and has written about a variety of technical topics. Ellen is on Twitter as @Ellen_Friedman.

Apache Solr : For Starters

Author: Ryan Velasco
List price: $9.99
Amazon price: $9.14   Book details at Amazon.com
Average rating:  / 0 (0 reviews)
Publisher: CreateSpace Independent Publishing Platform (26 November 2016)

Solr (pronounced "solar") is an open source enterprise search platform, written in Java, from the Apache Lucene project. Its major features include full-text search, hit highlighting, faceted search, real-time indexing, dynamic clustering, database integration, NoSQL features and rich document (e.g., Word, PDF) handling. Providing distributed search and index replication, Solr is designed for scalability and fault tolerance. Solr is the second-most popular enterprise search engine after Elasticsearch. Solr runs as a standalone full-text search server. It uses the Lucene Java search library at its core for full-text indexing and search, and has REST-like HTTP/XML and JSON APIs that make it usable from most popular programming languages. Solr's external configuration allows it to be tailored to many types of application without Java coding, and it has a plugin architecture to support more advanced customization. This updated and expanded second edition of Book provides a user-friendly introduction to the subject, Taking a clear structural framework, it guides the reader through the subject's core elements. A flowing writing style combines with the use of illustrations and diagrams throughout the text to ensure the reader understands even the most complex of concepts. This succinct and enlightening overview is a required reading for all those interested in the subject . We hope you find this book useful in shaping your future career & Business.

« Previous123456789104748Next »