data-engineering

Sams Teach Yourself Hadoop in 24 Hours

2017-04-07 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Jeffrey Aven

API Big Data Cloud Computing Hadoop HDFS Hive Java Spark data

Apache Hadoop is the technology at the heart of the Big Data revolution, and Hadoop skills are in enormous demand. Now, in just 24 lessons of one hour or less, you can learn all the skills and techniques you'll need to deploy each key component of a Hadoop platform in your local environment or in the cloud, building a fully functional Hadoop cluster and using it with real programs and datasets. Each short, easy lesson builds on all that's come before, helping you master all of Hadoop's essentials, and extend it to meet your unique challenges. Apache Hadoop in 24 Hours, Sams Teach Yourself covers all this, and much more: Understanding Hadoop and the Hadoop Distributed File System (HDFS) Importing data into Hadoop, and process it there Mastering basic MapReduce Java programming, and using advanced MapReduce API concepts Making the most of Apache Pig and Apache Hive Implementing and administering YARN Taking advantage of the full Hadoop ecosystem Managing Hadoop clusters with Apache Ambari Working with the Hadoop User Environment (HUE) Scaling, securing, and troubleshooting Hadoop environments Integrating Hadoop into the enterprise Deploying Hadoop in the cloud Getting started with Apache Spark Step-by-step instructions walk you through common questions, issues, and tasks; Q-and-As, Quizzes, and Exercises build and test your knowledge; "Did You Know?" tips offer insider advice and shortcuts; and "Watch Out!" alerts help you avoid pitfalls. By the time you're finished, you'll be comfortable using Apache Hadoop to solve a wide spectrum of Big Data problems.

Usage-Driven Database Design: From Logical Data Modeling through Physical Schema Definition

2017-04-07 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by George Tillmann

Big Data Cassandra Data Modelling Hadoop NoSQL Oracle SQL data data-models

Design great databases—from logical data modeling through physical schema definition. You will learn a framework that finally cracks the problem of merging data and process models into a meaningful and unified design that accounts for how data is actually used in production systems. Key to the framework is a method for taking the logical data model that is a static look at the definition of the data, and merging that static look with the process models describing how the data will be used in actual practice once a given system is implemented. The approach solves the disconnect between the static definition of data in the logical data model and the dynamic flow of the data in the logical process models. The design framework in this book can be used to create operational databases for transaction processing systems, or for data warehouses in support of decision support systems. The information manager can be a flat file, Oracle Database, IMS, NoSQL, Cassandra, Hadoop, or any other DBMS. Usage-Driven Database Design emphasizes practical aspects of design, and speaks to what works, what doesn't work, and what to avoid at all costs. Included in the book are lessons learned by the author over his 30+ years in the corporate trenches. Everything in the book is grounded on good theory, yet demonstrates a professional and pragmatic approach to design that can come only from decades of experience. Presents an end-to-end framework from logical data modeling through physical schema definition. Includes lessons learned, techniques, and tricks that can turn a database disaster into a success. Applies to all types of database management systems, including NoSQL such as Cassandra and Hadoop, and mainstream SQL databases such as Oracle and SQL Server What You'll Learn Create logical data models that accurately reflect the real world of the user Create usage scenarios reflecting how applications will use a new database Merge static data models with dynamic process models to create resilient yet flexible database designs Support application requirements by creating responsive database schemas in any database architecture Cope with big data and unstructured data for transaction processing and decision support systems Recognize when relational approaches won't work, and when to turn toward NoSQL solutions such as Cassandra or Hadoop Who This Book Is For System developers, including business analysts, database designers, database administrators, and application designers and developers who must design or interact with database systems

Exam Ref 70-761 Querying Data with Transact-SQL, 1st Edition

2017-04-06 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Itzik Ben-Gan

Azure Cloud Computing Data Management JSON Microsoft SQL XML data microsoft-sql-server relational-databases transact-sql

Prepare for Microsoft Exam 70-761–and help demonstrate your real-world mastery of SQL Server 2016 Transact-SQL data management, queries, and database programming. Designed for experienced IT professionals ready to advance their status, Exam Ref focuses on the critical-thinking and decision-making acumen needed for success at the MCSA level. Focus on the expertise measured by these objectives: Filter, sort, join, aggregate, and modify data Use subqueries, table expressions, grouping sets, and pivoting Query temporal and non-relational data, and output XML or JSON Create views, user-defined functions, and stored procedures Implement error handling, transactions, data types, and nulls This Microsoft Exam Ref: Organizes its coverage by exam objectives Features strategic, what-if scenarios to challenge you Assumes you have experience working with SQL Server as a database administrator, system engineer, or developer Includes downloadable sample database and code for SQL Server 2016 SP1 (or later) and Azure SQL Database Querying Data with Transact-SQL About the Exam Exam 70-761 focuses on the skills and knowledge necessary to manage and query data and to program databases with Transact-SQL in SQL Server 2016. About Microsoft Certification Passing this exam earns you credit toward a Microsoft Certified Solutions Associate (MCSA) certification that demonstrates your mastery of essential skills for building and implementing on-premises and cloud-based databases across organizations. Exam 70-762 (Developing SQL Databases) is also required for MCSA: SQL 2016 Database Development certification. See full details at: microsoft.com/learning

Oracle SQL Tuning with Oracle SQLTXPLAIN: Oracle Database 12c Edition, Second Edition

2017-04-06 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Stelios Charalambides

Oracle SQL data

Learn through this practical guide to SQL tuning how Oracle's own experts do it, using a freely downloadable tool called SQLTXPLAIN. This new edition has been expanded to include AWR, Oracle 12c Statistics, interpretation of SQL Monitor reports, Parallel execution, and Exadata-related features. Reading this book and using SQL helps you learn to tune even the most complex SQL, and you'll learn to do it quickly, without the huge learning curve usually associated with tuning as a whole. Firmly based in real-world problems, this book helps you reclaim system resources and avoid the most common bottleneck in overall performance, badly tuned SQL. You'll learn how the optimizer works, how to take advantage of its latest features, and when it's better to turn them off. Best of all, the book is updated to cover the very latest feature set in Oracle Database 12c. Covers AWR report integration Helps with SQL Monitor Report Interpretation Provides a reliable method that is repeatable Shows the very latest tuning features in Oracle Database 12c Enables the building of test cases without affecting production What You Will Learn Identify how and why complex SQL has gone wrong Correctly interpret AWR reports generated via SQLTXPLAIN Collect the best statistics for your environment Know when to invoke built-in tuning facilities Recognize when tuning is not the solution Spot the steps in a SQL statement's execution plan that are critical to performance of that statement Modify your SQL to solve performance problems and increase the speed and throughput of production database systems Who This Book Is For Anyone who deals with SQL and SQL tuning. Both developers and DBAs will benefit from learning how to use the SQLTXPLAIN tool, and from the problem solving methodology in this book.

Mastering Spark for Data Science

2017-03-29 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Matthew Hallett , David George , Antoine Amend (Databricks) , Andrew Morgan

AI/ML Analytics API Big Data Data Science Spark SQL Data Streaming apache-spark data

Master the techniques and sophisticated analytics used to construct Spark-based solutions that scale to deliver production-grade data science products About This Book Develop and apply advanced analytical techniques with Spark Learn how to tell a compelling story with data science using Spark’s ecosystem Explore data at scale and work with cutting edge data science methods Who This Book Is For This book is for those who have beginner-level familiarity with the Spark architecture and data science applications, especially those who are looking for a challenge and want to learn cutting edge techniques. This book assumes working knowledge of data science, common machine learning methods, and popular data science tools, and assumes you have previously run proof of concept studies and built prototypes. What You Will Learn Learn the design patterns that integrate Spark into industrialized data science pipelines See how commercial data scientists design scalable code and reusable code for data science services Explore cutting edge data science methods so that you can study trends and causality Discover advanced programming techniques using RDD and the DataFrame and Dataset APIs Find out how Spark can be used as a universal ingestion engine tool and as a web scraper Practice the implementation of advanced topics in graph processing, such as community detection and contact chaining Get to know the best practices when performing Extended Exploratory Data Analysis, commonly used in commercial data science teams Study advanced Spark concepts, solution design patterns, and integration architectures Demonstrate powerful data science pipelines In Detail Data science seeks to transform the world using data, and this is typically achieved through disrupting and changing real processes in real industries. In order to operate at this level you need to build data science solutions of substance –solutions that solve real problems. Spark has emerged as the big data platform of choice for data scientists due to its speed, scalability, and easy-to-use APIs. This book deep dives into using Spark to deliver production-grade data science solutions. This process is demonstrated by exploring the construction of a sophisticated global news analysis service that uses Spark to generate continuous geopolitical and current affairs insights.You will learn all about the core Spark APIs and take a comprehensive tour of advanced libraries, including Spark SQL, Spark Streaming, MLlib, and more. You will be introduced to advanced techniques and methods that will help you to construct commercial-grade data products. Focusing on a sequence of tutorials that deliver a working news intelligence service, you will learn about advanced Spark architectures, how to work with geographic data in Spark, and how to tune Spark algorithms so they scale linearly. Style and approach This is an advanced guide for those with beginner-level familiarity with the Spark architecture and working with Data Science applications. Mastering Spark for Data Science is a practical tutorial that uses core Spark APIs and takes a deep dive into advanced libraries including: Spark SQL, visual streaming, and MLlib. This book expands on titles like: Machine Learning with Spark and Learning Spark. It is the next learning curve for those comfortable with Spark and looking to improve their skills.

PostgreSQL High Performance Cookbook

2017-03-29 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Dinesh Kumar , Chitij Chauhan

data postgresql relational-databases

This book is your definitive guide to understanding and improving PostgreSQL database performance. You'll learn about query optimization, database monitoring, and advanced memory and configuration techniques. With examples and clear explanations, you'll gain the skills to identify performance bottlenecks and make your database system highly efficient. What this Book will help me do Effectively optimize PostgreSQL queries to enhance response times. Utilize robust server monitoring techniques to identify and address inefficiencies. Implement memory optimization strategies to maximize server performance. Master replication and failover methods for high availability. Build strategies for secure and efficient database migrations. Author(s) Dinesh Kumar and None Chauhan are experienced database professionals with years of expertise in working with PostgreSQL. They have been involved in database design, optimization, and innovations in open-source database technologies. Their teaching approach is clear and actionable, making the topics accessible to both beginners and seasoned professionals. Who is it for? This book is ideally suited for developers and database administrators with a basic understanding of PostgreSQL. If you're seeking practical guidance to enhance your PostgreSQL performance tuning and maintenance skills, this book is designed for you. It covers concepts for professionals looking to advance their database expertise. Beginners in database management who are motivated to learn advanced techniques will also find it approachable.

Learning Apache Spark 2

2017-03-28 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Muhammad Asif Abbasi

AI/ML Analytics Big Data Data Analytics Scala Spark SQL Data Streaming apache-spark data

Dive into the world of Big Data with "Learning Apache Spark 2". This book introduces you to the powerful Apache Spark framework, tailored for real-time data analytics and machine learning. Through practical examples and real-world use-cases, you'll gain hands-on experience in leveraging Spark's capabilities for your data processing needs. What this Book will help me do Master the fundamentals of Apache Spark 2 and its new features. Effectively use Spark SQL, MLlib, RDDs, GraphX, and Spark Streaming to tackle real-world challenges. Gain skills in data processing, transformation, and analysis with Spark. Deploy and operate your Spark applications in clustered environments. Develop your own recommendation engines and predictive analytics models with Spark. Author(s) None Abbasi brings a wealth of expertise in Big Data technologies with a keen focus on simplifying complex concepts for learners. With substantial experience working in data processing frameworks, their approach to teaching creates an engaging and practical learning experience. With "Learning Apache Spark 2", None empowers readers to confidently tackle challenges in Big Data processing and analytics. Who is it for? This book is ideal for aspiring Big Data professionals seeking an accessible introduction to Apache Spark. Beginners in Spark will find step-by-step guidance, while those familiar with earlier versions will appreciate the insights into Spark 2's new features. Familiarity with Big Data concepts and Scala programming is recommended for optimal understanding.

Oracle Database 12c Release 2 Performance Tuning Tips & Techniques

2017-03-22 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Richard Niemiec

Cloud Computing Oracle Cyber Security SQL data oracle-database-solutions

Proven Database Optimization Solutions―Fully Updated for Oracle Database 12c Release 2 Systematically identify and eliminate database performance problems with help from Oracle Certified Master Richard Niemiec. Filled with real-world case studies and best practices, Oracle Database 12c Release 2 Performance Tuning Tips and Techniques details the latest monitoring, troubleshooting, and optimization methods. Find out how to identify and fix bottlenecks on premises and in the cloud, configure storage devices, execute effective queries, and develop bug-free SQL and PL/SQL code. Testing, reporting, and security enhancements are also covered in this Oracle Press guide. • Properly index and partition Oracle Database 12c Release 2 • Work effectively with Oracle Cloud, Oracle Exadata, and Oracle Enterprise Manager • Efficiently manage disk drives, ASM, RAID arrays, and memory • Tune queries with Oracle SQL hints and the Trace utility • Troubleshoot databases using V$ views and X$ tables • Create your first cloud database service and prepare for hybrid cloud • Generate reports using Oracle’s Statspack and Automatic Workload Repository tools • Use sar, vmstat, and iostat to monitor operating system statistics

SQL Server 2016 Developer's Guide

2017-03-22 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Milo≈° Radivojeviƒá , William Durkin , Dejan Sarka

Analytics JSON Cyber Security SQL data microsoft-sql-server relational-databases

SQL Server 2016 Developer's Guide provides an in-depth overview of the new features and enhancements introduced in SQL Server 2016 that can significantly improve your development process. This book covers robust techniques for building high-performance, secure database applications while leveraging cutting-edge functionalities such as Stretch Database, temporal tables, and enhanced In-Memory OLTP capabilities. What this Book will help me do Master the new development features introduced in SQL Server 2016 and understand their applications. Use In-Memory OLTP enhancements to significantly boost application performance. Efficiently manage and analyze data using temporal tables and JSON integration. Explore SQL Server security enhancements to ensure data safety and access control. Gain insights into integrating R with SQL Server 2016 for advanced analytics. Author(s) None Radivojević, Dejan Sarka, and William Durkin are experienced database developers and architects with a strong focus on SQL Server technologies. They bring years of practical experience and a clear, insightful approach to teaching complex concepts. Their expertise shines in this comprehensive guide, providing readers with both foundational knowledge and advanced techniques. Who is it for? This guide is perfect for database developers and solution architects looking to harness the full potential of SQL Server 2016's new features. It's intended for professionals with prior experience in SQL Server or similar platforms who aim to develop efficient, high-performance applications. You'll benefit from this book if you are keen to master SQL Server 2016 and elevate your development skills.

Introduction to Bayesian Estimation and Copula Models of Dependence

2017-03-20 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Arkady Shemyakin , Alexander Kniazev

Microsoft Monte Carlo data data-models

Presents an introduction to Bayesian statistics, presents an emphasis on Bayesian methods (prior and posterior), Bayes estimation, prediction, MCMC,Bayesian regression, and Bayesian analysis of statistical modelsof dependence, and features a focus on copulas for risk management Introduction to Bayesian Estimation and Copula Models of Dependence emphasizes the applications of Bayesian analysis to copula modeling and equips readers with the tools needed to implement the procedures of Bayesian estimation in copula models of dependence. This book is structured in two parts: the first four chapters serve as a general introduction to Bayesian statistics with a clear emphasis on parametric estimation and the following four chapters stress statistical models of dependence with a focus of copulas. A review of the main concepts is discussed along with the basics of Bayesian statistics including prior information and experimental data, prior and posterior distributions, with an emphasis on Bayesian parametric estimation. The basic mathematical background of both Markov chains and Monte Carlo integration and simulation is also provided. The authors discuss statistical models of dependence with a focus on copulas and present a brief survey of pre-copula dependence models. The main definitions and notations of copula models are summarized followed by discussions of real-world cases that address particular risk management problems. In addition, this book includes: • Practical examples of copulas in use including within the Basel Accord II documents that regulate the world banking system as well as examples of Bayesian methods within current FDA recommendations • Step-by-step procedures of multivariate data analysis and copula modeling, allowing readers to gain insight for their own applied research and studies • Separate reference lists within each chapter and end-of-the-chapter exercises within Chapters 2 through 8 • A companion website containing appendices: data files and demo files in Microsoft® Office Excel®, basic code in R, and selected exercise solutions Introduction to Bayesian Estimation and Copula Models of Dependence is a reference and resource for statisticians who need to learn formal Bayesian analysis as well as professionals within analytical and risk management departments of banks and insurance companies who are involved in quantitative analysis and forecasting. This book can also be used as a textbook for upper-undergraduate and graduate-level courses in Bayesian statistics and analysis. ARKADY SHEMYAKIN, PhD, is Professor in the Department of Mathematics and Director of the Statistics Program at the University of St. Thomas. A member of the American Statistical Association and the International Society for Bayesian Analysis, Dr. Shemyakin's research interests include informationtheory, Bayesian methods of parametric estimation, and copula models in actuarial mathematics, finance, and engineering. ALEXANDER KNIAZEV, PhD, is Associate Professor and Head of the Department of Mathematics at Astrakhan State University in Russia. Dr. Kniazev's research interests include representation theory of Lie algebras and finite groups, mathematical statistics, econometrics, and financial mathematics.

Designing Data-Intensive Applications

2017-03-16 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Martin Kleppmann

NoSQL RDBMS data

Data is at the center of many challenges in system design today. Difficult issues need to be figured out, such as scalability, consistency, reliability, efficiency, and maintainability. In addition, we have an overwhelming variety of tools, including relational databases, NoSQL datastores, stream or batch processors, and message brokers. What are the right choices for your application? How do you make sense of all these buzzwords? In this practical and comprehensive guide, author Martin Kleppmann helps you navigate this diverse landscape by examining the pros and cons of various technologies for processing and storing data. Software keeps changing, but the fundamental principles remain the same. With this book, software engineers and architects will learn how to apply those ideas in practice, and how to make full use of data in modern applications. Peer under the hood of the systems you already use, and learn how to use and operate them more effectively Make informed decisions by identifying the strengths and weaknesses of different tools Navigate the trade-offs around consistency, scalability, fault tolerance, and complexity Understand the distributed systems research upon which modern databases are built Peek behind the scenes of major online services, and learn from their architectures

DS8000 Copy Services

2017-03-15 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Lukasz Drózda , Warren Stanley , Roland Wolf , Lisa Gundy , Alcides Bertazi , Axel Westphal , Michael Frankenberg , Bert Dufrasne , Cay-Uwe Kulzer

IBM data

Abstract This IBM® Redbooks® publication helps you plan, install, tailor, configure, and manage Copy Services on the IBM DS8000® operating in an IBM z Systems® or Open Systems environment. This book helps you design and implement a new Copy Services installation or migrate from an existing installation. It includes hints and tips to maximize the effectiveness of your installation, and information about tools and products to automate Copy Services functions. It is intended for anyone who needs a detailed and practical understanding of the DS8000 Copy Services.

Understanding Metadata

2017-03-15 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Scott Gidley , Federico Castanedo

Big Data Data Governance Data Lake IBM Informatica Teradata Trifacta data metadata

One viable option for organizations looking to harness massive amounts of data is the data lake, a single repository for storing all the raw data, both structured and unstructured, that floods into the company. But that isn’t the end of the story. The key to making a data lake work is data governance, using metadata to provide valuable context through tagging and cataloging. This practical report examines why metadata is essential for managing, migrating, accessing, and deploying any big data solution. Authors Federico Castanedo and Scott Gidley dive into the specifics of analyzing metadata for keeping track of your data—where it comes from, where it’s located, and how it’s being used—so you can provide safeguards and reduce risk. In the process, you’ll learn about methods for automating metadata capture. This report also explains the main features of a data lake architecture, and discusses the pros and cons of several data lake management solutions that support metadata. These solutions include: Traditional data integration/management vendors such as the IBM Research Accelerated Discovery Lab Tooling from open source projects, including Teradata Kylo and Informatica Startups such as Trifacta and Zaloni that provide best of breed technology

Oracle Database Upgrade and Migration Methods: Including Oracle 12c Release 2

2017-03-01 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Nassyam Basha , K M Krishnakumar , Y V Ravikumar

Oracle data oracle-database-solutions

Learn all of the available upgrade and migration methods in detail to move to Oracle Database version 12c. You will become familiar with database upgrade best practices to complete the upgrade in an effective manner and understand the Oracle Database 12c patching process. So it’s time to upgrade Oracle Database to version 12c and you need to choose the appropriate method while considering issues such as downtime. This book explains all of the available upgrade and migration methods so you can choose the one that suits your environment. You will be aware of the practical issues and proactive measures to take to upgrade successfully and reduce unexpected issues. With every release of Oracle Database there are new features and fixes to bugs identified in previous versions. As each release becomes obsolete, existing databases need to be upgraded. explains each method along with its strategy, requirements, steps, and known issues that have been seen so far. This book also compares the methods to help you choose the proper method according to your constraints. Oracle Database Upgrade and Migration Methods Also included in this book: Pre-requisite patches and pre-upgrade steps Patching to perform changes at the binary and database level to apply bug fixes What You Will Learn: Understand the need and importance of database upgrading and migration Be aware of the challenges associated with database upgrade decision making Compare all upgrade/migration methods Become familiar with database upgrade best practices and recommendations Understand database upgrade concepts in high availability and multi-tenant environments Know the database downgrade steps in case the upgraded database isn’t compatible with the environment Discover the features and benefits to the organization when it moves from the old database version to the latest database version Understand Oracle 12c patching concepts Who This Book Is For: Core database administrators, solution architects, business consultants, and database architects

Mastering Elastic Stack

2017-02-28 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Ravi Kumar Gupta , Yuvraj Gupta

Analytics Data Analytics ELK Kibana Logstash Cyber Security data elastic-stack-elk-stack elastic stack (elk stack) elasticsearch search

Mastering Elastic Stack is your complete guide to advancing your data analytics expertise using the ELK Stack. With detailed coverage of Elasticsearch, Logstash, Kibana, Beats, and X-Pack, this book equips you with the skills to process and analyze any type of data efficiently. Through practical examples and real-world scenarios, you'll gain the ability to build end-to-end pipelines and create insightful dashboards. What this Book will help me do Build and manage log pipelines using Logstash, Beats, and Elasticsearch for real-time analytics. Develop advanced Kibana dashboards to visualize and interpret complex datasets. Efficiently utilize X-Pack features for alerting, monitoring, and security in the Elastic Stack. Master plugin customization and deployment for a tailored Elastic Stack environment. Apply Elastic Stack solutions to real-world cases for centralized logging and actionable insights. Author(s) The authors, None Kumar Gupta and None Gupta, are experienced technologists who have spent years working at the forefront of data processing and analytics. They are well-versed in Elasticsearch, Logstash, Kibana, and the Elastic ecosystem, having worked extensively in enterprise environments where these tools have transformed operations. Their passion for teaching and thorough understanding of the tools culminate in this comprehensive resource. Who is it for? The ideal reader is a developer already familiar with Elasticsearch, Logstash, and Kibana who wants to deepen their understanding of the stack. If you're involved in creating scalable data pipelines, analyzing complex datasets, or looking to implement centralized logging solutions in your work, this book is an excellent resource. It bridges the gap from intermediate to expert knowledge, allowing you to use the Elastic Stack effectively in various scenarios. Whether you are transitioning from a beginner or enhancing your skill set, this book meets your needs.

QGIS: Becoming a GIS Power User

2017-02-28 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Alex Mandel , Víctor Olaya Ferrero , Ben Mearns , Alexander Bruy , Anita Graser (Austrian Institute of Technology)

Data Management GIS Master Data Management Python data geographic-information-system-gis geographic information system (gis) location-data

Master data management, visualization, and spatial analysis techniques in QGIS and become a GIS power user About This Book Learn how to work with various types of data and create beautiful maps using this easy-to-follow guide Give a touch of professionalism to your maps, both for functionality and look and feel, with the help of this practical guide This progressive, hands-on guide builds on a geo-spatial data and adds more reactive maps using geometry tools. Who This Book Is For If you are a user, developer, or consultant and want to know how to use QGIS to achieve the results you are used to from other types of GIS, then this learning path is for you. You are expected to be comfortable with core GIS concepts. This Learning Path will make you an expert with QGIS by showing you how to develop more complex, layered map applications. It will launch you to the next level of GIS users. What You Will Learn Create your first map by styling both vector and raster layers from different data sources Use parameters such as precipitation, relative humidity, and temperature to predict the vulnerability of fields and crops to mildew Re-project vector and raster data and see how to convert between different style formats Use a mix of web services to provide a collaborative data system Use raster analysis and a model automation tool to model the physical conditions for hydrological analysis Get the most out of the cartographic tools to in QGIS to reveal the advanced tips and tricks of cartography In Detail The first module Learning QGIS, Third edition covers the installation and configuration of QGIS. You'll become a master in data creation and editing, and creating great maps. By the end of this module, you'll be able to extend QGIS with Python, getting in-depth with developing custom tools for the Processing Toolbox. The second module QGIS Blueprints gives you an overview of the application types and the technical aspects along with few examples from the digital humanities. After estimating unknown values using interpolation methods and demonstrating visualization and analytical techniques, the module ends by creating an editable and data-rich map for the discovery of community information. The third module QGIS 2 Cookbook covers data input and output with special instructions for trickier formats. Later, we dive into exploring data, data management, and preprocessing steps to cut your data to just the important areas. At the end of this module, you will dive into the methods for analyzing routes and networks, and learn how to take QGIS beyond the out-of-the-box features with plug-ins, customization, and add-on tools. This Learning Path combines some of the best that Packt has to offer in one complete, curated package. It includes content from the following Packt products: Learning QGIS, Third Edition by Anita Graser QGIS Blueprints by Ben Mearns QGIS 2 Cookbook by Alex Mandel, Víctor Olaya Ferrero, Anita Graser, Alexander Bruy Style and approach This Learning Path will get you up and running with QGIS. We start off with an introduction to QGIS and create maps and plugins. Then, we will guide you through Blueprints for geographic web applications, each of which will teach you a different feature by boiling down a complex workflow into steps you can follow. Finally, you'll turn your attention to becoming a QGIS power user and master data management, visualization, and spatial analysis techniques of QGIS. Downloading the example code for this book. You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the code file.

Learning PySpark

2017-02-27 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Denny Lee (Databricks) , Tomasz Drabas

AI/ML Big Data Cloud Computing Data Engineering PySpark Python Spark Data Streaming apache-spark data

"Learning PySpark" guides you through mastering the integration of Python with Apache Spark to build scalable and efficient data applications. You'll delve into Spark 2.0's architecture, efficiently process data, and explore PySpark's capabilities ranging from machine learning to structured streaming. By the end, you'll be equipped to craft and deploy robust data pipelines and applications. What this Book will help me do Master the Spark 2.0 architecture and its Python integration with PySpark. Leverage PySpark DataFrames and RDDs for effective data manipulation and analysis. Develop scalable machine learning models using PySpark's ML and MLlib libraries. Understand advanced PySpark features such as GraphFrames for graph processing and TensorFrames for deep learning models. Gain expertise in deploying PySpark applications locally and on the cloud for production-ready solutions. Author(s) Authors None Drabas and None Lee bring extensive experience in data engineering and Python programming. They combine a practical, example-driven approach with deep insights into Apache Spark's ecosystem. Their expertise and clarity in writing make this book accessible for individuals aiming to excel in big data technologies with Python. Who is it for? This book is best suited for Python developers who want to integrate Apache Spark 2.0 into their workflow to process large-scale data. Ideal readers will have foundational knowledge of Python and seek to build scalable data-intensive applications using Spark, regardless of prior experience with Spark itself.

Oracle Database 12c Release 2 In-Memory: Tips and Techniques for Maximum Performance

2017-02-22 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Joyjeet Banerjee

Oracle data oracle-database-solutions

Master Oracle Database 12c Release 2’s powerful In-Memory option This Oracle Press guide shows, step-by-step, how to optimize database performance and cut transaction processing time using Oracle Database 12c Release 2 In-Memory. Oracle Database 12c Release 2 In-Memory: Tips and Techniques for Maximum Performance features hands-on instructions, best practices, and expert tips from an Oracle enterprise architect. You will learn how to deploy the software, use In-Memory Advisor, build queries, and interoperate with Oracle RAC and Multitenant. A complete chapter of case studies illustrates real-world applications. • Configure Oracle Database 12c and construct In-Memory enabled databases • Edit and control In-Memory options from the graphical interface • Implement In-Memory with Oracle Real Application Clusters • Use the In-Memory Advisor to determine what objects to keep In-Memory • Optimize In-Memory queries using groups, expressions, and aggregations • Maximize performance using Oracle Exadata Database Machine and In-Memory option • Use Swingbench to create data and simulate real-life system workloads

Mastering Elasticsearch 5.x - Third Edition

2017-02-21 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Bharvi Dixit

Analytics Big Data ELK data elasticsearch search

This comprehensive guide dives deep into the functionalities of Elasticsearch 5, the widely-used search and analytics engine. Leveraging the power of Apache Lucene, this book will help you understand advanced concepts like querying, indexing, and cluster management to build efficient and scalable search solutions. What this Book will help me do Master advanced features of Elasticsearch such as text scoring, sharding, and aggregation. Understand how to handle big data efficiently using Elasticsearch's architecture. Learn practical implementation techniques for Elasticsearch features through hands-on examples. Develop custom plugins for Elasticsearch to tailor its functionalities to specific needs. Scale and optimize Elasticsearch clusters for high performance in production environments. Author(s) Bharvi Dixit is an experienced software engineer and a recognized expert in implementing Elasticsearch solutions. With a strong background in distributed systems and database management, Bharvi's writing is informed by real-world experience and a focus on practical applications. Who is it for? This book is ideal for developers and data engineers with existing experience in Elasticsearch who wish to deepen their knowledge. It serves as a valuable resource for professionals tasked with creating scalable search applications. A working understanding of Elasticsearch basics and query DSL is recommended to fully benefit from this guide.

IBM Power Systems L and LC Server Positioning Guide

2017-02-16 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Scott Vetter , Andrew Laidlaw , Tonny Bastiaans

IBM data ibm-power-systems

This IBM® Redpaper™ publication is written to assist you in locating the optimal server/workload fit within the IBM Power Systems™ L and IBM OpenPOWER LC product lines. IBM has announced several scale-out servers, and as a partner in the OpenPOWER organization, unique design characteristics that are engineered into the LC line have broadened the suite of available workloads beyond typical client OS hosting. This paper looks at the benefits of the Power Systems L servers and OpenPOWER LC servers, and how they are different, providing unique benefits for Enterprise workloads and use cases.

talk-data.com

Activity Trend

Top Events

Top Speakers

Sams Teach Yourself Hadoop in 24 Hours

Usage-Driven Database Design: From Logical Data Modeling through Physical Schema Definition

Exam Ref 70-761 Querying Data with Transact-SQL, 1st Edition

Oracle SQL Tuning with Oracle SQLTXPLAIN: Oracle Database 12c Edition, Second Edition

Mastering Spark for Data Science

PostgreSQL High Performance Cookbook

Learning Apache Spark 2

Oracle Database 12c Release 2 Performance Tuning Tips & Techniques

SQL Server 2016 Developer's Guide

Introduction to Bayesian Estimation and Copula Models of Dependence

Designing Data-Intensive Applications

DS8000 Copy Services

Understanding Metadata

Oracle Database Upgrade and Migration Methods: Including Oracle 12c Release 2

Mastering Elastic Stack

QGIS: Becoming a GIS Power User

Learning PySpark

Oracle Database 12c Release 2 In-Memory: Tips and Techniques for Maximum Performance

Mastering Elasticsearch 5.x - Third Edition

IBM Power Systems L and LC Server Positioning Guide