Discussing Forecasting with Brett McLaughlin (@akabret), @Akamai

2017-07-26 · The Future of Data Podcast | conversation with leaders, influencers, and change makers in the World of Data & Analytics Listen

podcast_episode

by Brett McLaughlin (Akamai)

Analytics Big Data Data Science

In this session, Brett McLaughlin, Chief Data Strategist at Akamai, discussed his journey to creating a forecasting solution. He sheds light on some limitations, some innovative thinking, and some hacks that one could use to structure a good forecasting model.

Timeline: 0:29 Brett's journey. 15:06 Data scientist fulling the vision of the CEO. 24:25 Art of doing business and science of doing business. 29:23 Data science and mathematics. 34:55 Salesforce defining the value of algorithms. 38:14 Capturing feedback to improve data models. 46:14 First steps in building a futuristic data model. 54:27 Using algorithms to forecast. 1:01 Tips for data leaders to build a team.

Podcast link: https://futureofdata.org/discussing-forecasting-brett-mclaughlin-akabret-akamai/

Here's Brett's Bio: Twenty-one years of experience transforming business operations through more intelligent use of data. Expertise in leading organizations in data transformation, predictive analytics (e.g., forecasting, linear programming, operational simulations, etc.), world-class visualizations and interfaces, and tight integration into existing operations.

About #Podcast:

FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to discuss their journey to create the data-driven future.

Wanna Join? If you or any you know wants to join in, Register your interest @ http://play.analyticsweek.com/guest/

Want to sponsor? Email us @ [email protected]

Keywords:

FutureOfData #DataAnalytics #Leadership #Podcast #BigData #Strategy

Nathaniel Lin, Chief Data Scientist, NFPA

2017-05-18 · The Future of Data Podcast | conversation with leaders, influencers, and change makers in the World of Data & Analytics Listen

podcast_episode

by Nathaniel Lin (National Fire Protection Association)

AI/ML Analytics Big Data Data Analytics Data Science IBM Marketing

In this session, Nathaniel discussed how NFPA uses data to empower fire stations worldwide with data-driven insights. We discussed the future of fire in this tech-driven world.

Timeline: 0:29 Nathaniel's journey. 3:50 What's NFPA? 6:12 Nathaniel's role in NFPA. 8:50 Nathaniel's book. 12:21 The data science team at NFPA. 15:01 Working with the government. 18:50 Interesting use cases of NFPA. 25:49 Fining tuning the data model at NFPA. 28:11 NFPA alliance with the Insurance industry. 31:33 Recruiting an idea concept or tool. 33:16 How to approach NFPA? 36:03 Nathaniel's role: in facing or outfacing? 40:41 Suggestions for Non-profits to build a data science practice. 43:49 Putting together a data science team. 46:34 Predicting the fire outcome. 48:11 Closing remarks.

Podcast link: https://futureofdata.org/futureofdata-nathaniel-lin-chief-data-scientist-nfpa/

Bio- Nathaniel Lin has an extensive background in business and marketing analytics with strategic roles in both start-ups and Fortune 500 companies. He offers the National Fire Protection Association (NFPA) agency and client perspective gleaned from his work at Fidelity Investments, OgilvyOne, Aspen Marketing, and IBM Worldwide. During his tenure with IBM Asia Pacific, he also built and led a marketing analytics group that won a DMA/NCDM Gold Award in B2B Marketing.

Lin served as an adjunct professor of business analytics at Boston College and Georgia Tech College of Management. He is also the founder of two LinkedIn groups related to big data analytics and is the 2014 author of Applied Business Analytics – Integrating Business Process, Big Data, and Advanced Analytics. Lin has an MBA in Management of Technology/Sloan Fellows from MIT Sloan School of Management and earned both a Ph.D. In Environmental Engineering and an Honors B.S from Birmingham University in England.

Founded in 1896, NFPA is a global, nonprofit organization devoted to eliminating death, injury, property, and economic loss due to fire, electrical and related hazards. The association delivers information and knowledge through more than 300 consensus codes and standards, research, training, education, outreach, and advocacy; and partner with others who share an interest in furthering the NFPA mission. For more information, visit www.nfpa.org.

The podcast is sponsored by: TAO.ai(https://tao.ai), Artificial Intelligence Driven Career Coach

About #Podcast:

FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to discuss their journey to create the data-driven future.

Wanna Join? If you or any you know wants to join in, Register your interest @ http://play.analyticsweek.com/guest/

Want to sponsor? Email us @ [email protected]

Keywords:

FutureOfData #DataAnalytics #Leadership #Podcast #BigData #Strategy

Analyzing Data with Power BI and Power Pivot for Excel, First Edition

2017-04-19 · O'Reilly Data Science Books O'Reilly Amazon

book

by Alberto Ferrari , Marco Russo

BI DAX Power BI business-intelligence data data-science microsoft-power-platform power-bi

Renowned DAX experts Alberto Ferrari and Marco Russo teach you how to design data models for maximum efficiency and effectiveness. How can you use Excel and Power BI to gain real insights into your information? As you examine your data, how do you write a formula that provides the numbers you need? The answers to both of these questions lie with the data model. This book introduces the basic techniques for shaping data models in Excel and Power BI. It’s meant for readers who are new to data modeling as well as for experienced data modelers looking for tips from the experts. If you want to use Power BI or Excel to analyze data, the many real-world examples in this book will help you look at your reports in a different way—like experienced data modelers do. As you’ll soon see, with the right data model, the correct answer is always a simple one! By reading this book, you will: • Gain an understanding of the basics of data modeling, including tables, relationships, and keys • Familiarize yourself with star schemas, snowflakes, and common modeling techniques • Learn the importance of granularity • Discover how to use multiple fact tables, like sales and purchases, in a complex data model • Manage calendar-related calculations by using date tables • Track historical attributes, like previous addresses of customers or manager assignments • Use snapshots to compute quantity on hand • Work with multiple currencies in the most efficient way • Analyze events that have durations, including overlapping durations • Learn what data model you need to answer your specific business questions About This Book • For Excel and Power BI users who want to exploit the full power of their favorite tools • For BI professionals seeking new ideas for modeling data

Tabular Modeling in Microsoft SQL Server Analysis Services, Second Edition

2017-04-12 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Alberto Ferrari , Marco Russo

Agile/Scrum Analytics BI Microsoft PowerShell Cyber Security SQL SQL Server data data-engineering microsoft-sql-server relational-databases

Build agile and responsive business intelligence solutions Create a semantic model and analyze data using the tabular model in SQL Server 2016 Analysis Services to create corporate-level business intelligence (BI) solutions. Led by two BI experts, you will learn how to build, deploy, and query a tabular model by following detailed examples and best practices. This hands-on book shows you how to use the tabular model’s in-memory database to perform rapid analytics—whether you are new to Analysis Services or already familiar with its multidimensional model. Discover how to: • Determine when a tabular or multidimensional model is right for your project • Build a tabular model using SQL Server Data Tools in Microsoft Visual Studio 2015 • Integrate data from multiple sources into a single, coherent view of company information • Choose a data-modeling technique that meets your organization’s performance and usability requirements • Implement security by establishing administrative and data user roles • Define and implement partitioning strategies to reduce processing time • Use Tabular Model Scripting Language (TMSL) to execute and automate administrative tasks • Optimize your data model to reduce the memory footprint for VertiPaq • Choose between in-memory (VertiPaq) and pass-through (DirectQuery) engines for tabular models • Select the proper hardware and virtualization configurations • Deploy and manipulate tabular models from C# and PowerShell using AMO and TOM libraries Get code samples, including complete apps, at: https://aka.ms/tabular/downloads About This Book • For BI professionals who are new to SQL Server 2016 Analysis Services or already familiar with previous versions of the product, and who want the best reference for creating and maintaining tabular models. • Assumes basic familiarity with database design and business analytics concepts.

Usage-Driven Database Design: From Logical Data Modeling through Physical Schema Definition

2017-04-07 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by George Tillmann

Big Data Cassandra Hadoop NoSQL Oracle SQL data data-engineering data-models

Design great databases—from logical data modeling through physical schema definition. You will learn a framework that finally cracks the problem of merging data and process models into a meaningful and unified design that accounts for how data is actually used in production systems. Key to the framework is a method for taking the logical data model that is a static look at the definition of the data, and merging that static look with the process models describing how the data will be used in actual practice once a given system is implemented. The approach solves the disconnect between the static definition of data in the logical data model and the dynamic flow of the data in the logical process models. The design framework in this book can be used to create operational databases for transaction processing systems, or for data warehouses in support of decision support systems. The information manager can be a flat file, Oracle Database, IMS, NoSQL, Cassandra, Hadoop, or any other DBMS. Usage-Driven Database Design emphasizes practical aspects of design, and speaks to what works, what doesn't work, and what to avoid at all costs. Included in the book are lessons learned by the author over his 30+ years in the corporate trenches. Everything in the book is grounded on good theory, yet demonstrates a professional and pragmatic approach to design that can come only from decades of experience. Presents an end-to-end framework from logical data modeling through physical schema definition. Includes lessons learned, techniques, and tricks that can turn a database disaster into a success. Applies to all types of database management systems, including NoSQL such as Cassandra and Hadoop, and mainstream SQL databases such as Oracle and SQL Server What You'll Learn Create logical data models that accurately reflect the real world of the user Create usage scenarios reflecting how applications will use a new database Merge static data models with dynamic process models to create resilient yet flexible database designs Support application requirements by creating responsive database schemas in any database architecture Cope with big data and unstructured data for transaction processing and decision support systems Recognize when relational approaches won't work, and when to turn toward NoSQL solutions such as Cassandra or Hadoop Who This Book Is For System developers, including business analysts, database designers, database administrators, and application designers and developers who must design or interact with database systems

Defining Data Engineering with Maxime Beauchemin - Episode 3

2017-03-05 · Data Engineering Podcast Listen

podcast_episode

by Maxime Beauchemin (Preset) , Tobias Macey

Airflow Beam Data Engineering Data Management Datadog DevOps Druid GitHub Hive Luigi

Summary

What exactly is data engineering? How has it evolved in recent years and where is it going? How do you get started in the field? In this episode, Maxime Beauchemin joins me to discuss these questions and more.

Transcript provided by CastSource

Preamble

Hello and welcome to the Data Engineering Podcast, the show about modern data infrastructure Go to dataengineeringpodcast.com to subscribe to the show, sign up for the newsletter, read the show notes, and get in touch. You can help support the show by checking out the Patreon page which is linked from the site. To help other people find the show you can leave a review on iTunes, or Google Play Music, and tell your friends and co-workers Your host is Tobias Macey and today I’m interviewing Maxime Beauchemin

Questions

Introduction How did you get involved in the field of data engineering? How do you define data engineering and how has that changed in recent years? Do you think that the DevOps movement over the past few years has had any impact on the discipline of data engineering? If so, what kinds of cross-over have you seen? For someone who wants to get started in the field of data engineering what are some of the necessary skills? What do you see as the biggest challenges facing data engineers currently? At what scale does it become necessary to differentiate between someone who does data engineering vs data infrastructure and what are the differences in terms of skill set and problem domain? How much analytical knowledge is necessary for a typical data engineer? What are some of the most important considerations when establishing new data sources to ensure that the resulting information is of sufficient quality? You have commented on the fact that data engineering borrows a number of elements from software engineering. Where does the concept of unit testing fit in data management and what are some of the most effective patterns for implementing that practice? How has the work done by data engineers and managers of data infrastructure bled back into mainstream software and systems engineering in terms of tools and best practices? How do you see the role of data engineers evolving in the next few years?

Keep In Touch

@mistercrunch on Twitter mistercrunch on GitHub Medium

Links

Datadog Airflow The Rise of the Data Engineer Druid.io Luigi Apache Beam Samza Hive Data Modeling

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA Support Data Engineering Podcast

Pro Apache Phoenix: An SQL Driver for HBase, First Edition

2016-12-29 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Ravi Magham , Shakil Akhtar

API Big Data Hadoop Apache HBase NoSQL Spark SQL data data-engineering nosql-databases

Leverage Phoenix as an ANSI SQL engine built on top of the highly distributed and scalable NoSQL framework HBase. Learn the basics and best practices that are being adopted in Phoenix to enable a high write and read throughput in a big data space. This book includes real-world cases such as Internet of Things devices that send continuous streams to Phoenix, and the book explains how key features such as joins, indexes, transactions, and functions help you understand the simple, flexible, and powerful API that Phoenix provides. Examples are provided using real-time data and data-driven businesses that show you how to collect, analyze, and act in seconds. Pro Apache Phoenix covers the nuances of setting up a distributed HBase cluster with Phoenix libraries, running performance benchmarks, configuring parameters for production scenarios, and viewing the results. The book also shows how Phoenix plays well with other key frameworks in the Hadoop ecosystem such as Apache Spark, Pig, Flume, and Sqoop. You will learn how to: Handle a petabyte data store by applying familiar SQL techniques Store, analyze, and manipulate data in a NoSQL Hadoop echo system with HBase Apply best practices while working with a scalable data store on Hadoop and HBase Integrate popular frameworks (Apache Spark, Pig, Flume) to simplify big data analysis Demonstrate real-time use cases and big data modeling techniques Who This Book Is For Data engineers, Big Data administrators, and architects

Data modeling with Cassandra

2016-12-15 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Eben Hewitt , Jeff Carpenter

Cassandra data data-engineering nosql-databases

In this lesson, you’ll learn how to design data models for Cassandra, including a data modeling process and notation. To apply this knowledge, we’ll design the data model for a sample application. This will help show how all the parts fit together. Along the way, we’ll use a tool to help us manage our CQL (Cassandra Query Language) scripts. What you’ll learn—and how you can apply it You will learn common patterns and antipatterns for data modeling in Cassandra. This lesson will cover the concepts around data modeling and will compare a Cassandra data model with an equivalent relational database model. You’ll learn about defining queries and about logical and physical database modeling. You’ll learn how to optimize your model for performance, and finally you’ll learn how to implement your model schema using CQL. This lesson is for you because… You are an application developer or architect who wants to learn how data is stored and processed in Cassandra. You are a database administrator who wants to learn about Cassandra. Prerequisites Helpful but not essential to have a basic understanding of relational vs. distributed databases. Helpful but not essential to understand Cassandra Query Language, CQL. Materials or downloads needed in advance None

Optimizing Cassandra performance

2016-12-15 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Eben Hewitt , Jeff Carpenter

Cassandra data data-engineering nosql-databases

In this lesson, we look at how to tune Cassandra to improve performance. There are a variety of settings in the configuration file and on individual tables. Although the default settings are appropriate for many use cases, there might be circumstances in which you need to change them. We’ll look at how and why to make these changes. We also see how to use the cassandra-stress test tool that ships with Cassandra to generate load against Cassandra and quickly see how it behaves under stress test circumstances. We can then tune Cassandra appropriately and feel confident that we’re ready to deploy to a production environment. What you’ll learn—and how you can apply it You’ll learn how to monitor and analyze Cassandra performance. You’ll learn about Cassandra features such as caching, memtables, commit logs, SStables, hinted handoff, compaction, and threading to improve responsiveness, consistency, and speed and reduce data loss. We’ll also look at timeout properties and JVM settings. This lesson is for you because… You are a developer, database administrator, or architect who wants to learn how to tune Cassandra. Prerequisites Understanding of Cassandra architecture and data model. If you want to run cassandra-stress Cassandra installed with a running Cassandra cluster. Materials or downloads needed A Cassandra cluster if you want to run cassandra-stress

Implementing CDISC Using SAS

2016-11-28 · O'Reilly Data Science Books O'Reilly Amazon

book

by Chris Holland , Jack Shostak

SAS XML analytics-platforms data data-science

For decades researchers and programmers have used SAS to analyze, summarize, and report clinical trial data. Now Chris Holland and Jack Shostak have updated their popular Implementing CDISC Using SAS, the first comprehensive book on applying clinical research data and metadata to the Clinical Data Interchange Standards Consortium (CDISC) standards.

Implementing CDISC Using SAS: An End-to-End Guide, Second Edition, is an all-inclusive guide on how to implement and analyze the Study Data Tabulation Model (SDTM) and the Analysis Data Model (ADaM) data and prepare clinical trial data for regulatory submission. Updated to reflect the 2017 FDA mandate for adherence to CDISC standards, this new edition covers creating and using metadata, developing conversion specifications, implementing and validating SDTM and ADaM data, determining solutions for legacy data conversions, and preparing data for regulatory submission. The book covers products such as Base SAS, SAS Clinical Data Integration, and the SAS Clinical Standards Toolkit, as well as JMP Clinical. Topics included in this new edition include an implementation of the Define-XML 2.0 standard, new SDTM domains, validation with Pinnacle 21 software, event narratives in JMP Clinical, and of course new versions of SAS and JMP software.

Any manager or user of clinical trial data in this day and age is likely to benefit from knowing how to either put data into a CDISC standard or analyzing and finding data once it is in a CDISC format. If you are one such person--a data manager, clinical and/or statistical programmer, biostatistician, or even a clinician--then this book is for you.

Apache HBase Primer

2016-11-17 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Deepak Vohra

API Hadoop Apache HBase NoSQL data data-engineering nosql-databases

Learn the fundamental foundations and concepts of the Apache HBase (NoSQL) open source database. It covers the HBase data model, architecture, schema design, API, and administration. Apache HBase is the database for the Apache Hadoop framework. HBase is a column family based NoSQL database that provides a flexible schema model. What You'll Learn Work with the core concepts of HBase Discover the HBase data model, schema design, and architecture Use the HBase API and administration Who This Book Is For Apache HBase (NoSQL) database users, designers, developers, and admins.

Programming Pig, 2nd Edition

2016-11-09 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Alan Gates , Daniel Dai

Hadoop HDFS Python data data-engineering pig

For many organizations, Hadoop is the first step for dealing with massive amounts of data. The next step? Processing and analyzing datasets with the Apache Pig scripting platform. With Pig, you can batch-process data without having to create a full-fledged application, making it easy to experiment with new datasets. Updated with use cases and programming examples, this second edition is the ideal learning tool for new and experienced users alike. You’ll find comprehensive coverage on key features such as the Pig Latin scripting language and the Grunt shell. When you need to analyze terabytes of data, this book shows you how to do it efficiently with Pig. Delve into Pig’s data model, including scalar and complex data types Write Pig Latin scripts to sort, group, join, project, and filter your data Use Grunt to work with the Hadoop Distributed File System (HDFS) Build complex data processing pipelines with Pig’s macros and modularity features Embed Pig Latin in Python for iterative processing and other advanced tasks Use Pig with Apache Tez to build high-performance batch and interactive data processing applications Create your own load and store functions to handle data formats and storage mechanisms

Delivering Business Intelligence with Microsoft SQL Server 2016, Fourth Edition, 4th Edition

2016-11-04 · O'Reilly Business Intelligence Books O'Reilly Amazon

book

by Brian Larson

Analytics BI C#/.NET DAX KPI Microsoft Power BI SQL SQL Server data data-engineering microsoft-sql-server +1 more

Distribute Actionable, Timely BI with Microsoft® SQL Server® 2016 and Power BI Drive better, faster, more informed decision making across your organization using the expert tips and best practices featured in this hands-on guide. Delivering Business Intelligence with Microsoft SQL Server 2016, Fourth Edition, shows, step-by-step, how to distribute high-performance, custom analytics to users enterprise-wide. Discover how to build BI Semantic Models, create data marts and OLAP cubes, write MDX and DAX scripts, and share insights using Microsoft client tools. The book includes coverage of self-service business intelligence with Power BI. • Understand the goals and components of successful BI • Build data marts, OLAP cubes, and Tabular models • Load and cleanse data with SQL Server Integration Services • Manipulate and analyze data using MDX and DAX scripts and queries • Work with SQL Server Analysis Services and the BI Semantic Model • Author interactive reports using SQL Server Data Tools • Create KPIs and digital dashboards • Implement time-based analytics • Embed data model content in custom applications using ADOMD.NET • Use Power BI to gather, model, and visualize data in a self-service environment

Real World SQL and PL/SQL: Advice from the Experts

2016-08-22 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Arup Nanda , Alex Nuitjen , Heli Helskyaho , Martin Widlake , Brendan Tierney

Analytics Oracle Cyber Security SQL data data-engineering pl-sql pl/sql

Master the Underutilized Advanced Features of SQL and PL/SQL This hands-on guide from Oracle Press shows how to fully exploit lesser known but extremely useful SQL and PL/SQL features―and how to effectively use both languages together. Written by a team of Oracle ACE Directors, Real-World SQL and PL/SQL: Advice from the Experts features best practices, detailed examples, and insider tips that clearly demonstrate how to write, troubleshoot, and implement code for a wide variety of practical applications. The book thoroughly explains underutilized SQL and PL/SQL functions and lays out essential development strategies. Data modeling, advanced analytics, database security, secure coding, and administration are covered in complete detail. Learn how to: • Apply advanced SQL and PL/SQL tools and techniques • Understand SQL and PL/SQL functionality and determine when to use which language • Develop accurate data models and implement business logic • Run PL/SQL in SQL and integrate complex datasets • Handle PL/SQL instrumenting and profiling • Use Oracle Advanced Analytics and Oracle R Enterprise • Build and execute predictive queries • Secure your data using encryption, hashing, redaction, and masking • Defend against SQL injection and other code-based attacks • Work with Oracle Virtual Private Database Code examples in the book are available for download at www.MHProfessional.com. TAG: For a complete list of Oracle Press titles, visit www.OraclePressBooks.com

Enabling Real-time Analytics on IBM z Systems Platform

2016-08-08 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Cedrine Madera , Ravi Kumar , Steven LaFalce , Sebastian Muszytowski , Oliver Benke , Lydia Parziale , Willie Favero

AI/ML Analytics IBM SAS Cyber Security SPSS data data-engineering

Regarding online transaction processing (OLTP) workloads, IBM® z Systems™ platform, with IBM DB2®, data sharing, Workload Manager (WLM), geoplex, and other high-end features, is the widely acknowledged leader. Most customers now integrate business analytics with OLTP by running, for example, scoring functions from transactional context for real-time analytics or by applying machine-learning algorithms on enterprise data that is kept on the mainframe. As a result, IBM adds investment so clients can keep the complete lifecycle for data analysis, modeling, and scoring on z Systems control in a cost-efficient way, keeping the qualities of services in availability, security, reliability that z Systems solutions offer. Because of the changed architecture and tighter integration, IBM has shown, in a customer proof-of-concept, that a particular client was able to achieve an orders-of-magnitude improvement in performance, allowing that client’s data scientist to investigate the data in a more interactive process. Open technologies, such as Predictive Model Markup Language (PMML) can help customers update single components instead of being forced to replace everything at once. As a result, you have the possibility to combine your preferred tool for model generation (such as SAS Enterprise Miner or IBM SPSS® Modeler) with a different technology for model scoring (such as Zementis, a company focused on PMML scoring). IBM SPSS Modeler is a leading data mining workbench that can apply various algorithms in data preparation, cleansing, statistics, visualization, machine learning, and predictive analytics. It has over 20 years of experience and continued development, and is integrated with z Systems. With IBM DB2 Analytics Accelerator 5.1 and SPSS Modeler 17.1, the possibility exists to do the complete predictive model creation including data transformation within DB2 Analytics Accelerator. So, instead of moving the data to a distributed environment, algorithms can be pushed to the data, using cost-efficient DB2 Accelerator for the required resource-intensive operations. This IBM Redbooks® publication explains the overall z Systems architecture, how the components can be installed and customized, how the new IBM DB2 Analytics Accelerator loader can help efficient data loading for z Systems data and external data, how in-database transformation, in-database modeling, and in-transactional real-time scoring can be used, and what other related technologies are available. This book is intended for technical specialists and architects, and data scientists who want to use the technology on the z Systems platform. Most of the technologies described in this book require IBM DB2 for z/OS®. For acceleration of the data investigation, data transformation, and data modeling process, DB2 Analytics Accelerator is required. Most value can be archived if most of the data already resides on z Systems platforms, although adding external data (like from social sources) poses no problem at all.

Cassandra: The Definitive Guide, 2nd Edition

2016-07-12 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Eben Hewitt , Jeff Carpenter

Cassandra Cloud Computing Docker ELK Hadoop Java JavaScript Python Spark data data-engineering nosql-databases

Imagine what you could do if scalability wasn't a problem. With this hands-on guide, you’ll learn how the Cassandra database management system handles hundreds of terabytes of data while remaining highly available across multiple data centers. This expanded second edition—updated for Cassandra 3.0—provides the technical details and practical examples you need to put this database to work in a production environment. Authors Jeff Carpenter and Eben Hewitt demonstrate the advantages of Cassandra’s non-relational design, with special attention to data modeling. If you’re a developer, DBA, or application architect looking to solve a database scaling issue or future-proof your application, this guide helps you harness Cassandra’s speed and flexibility. Understand Cassandra’s distributed and decentralized structure Use the Cassandra Query Language (CQL) and cqlsh—the CQL shell Create a working data model and compare it with an equivalent relational model Develop sample applications using client drivers for languages including Java, Python, and Node.js Explore cluster topology and learn how nodes exchange data Maintain a high level of performance in your cluster Deploy Cassandra on site, in the Cloud, or with Docker Integrate Cassandra with Spark, Hadoop, Elasticsearch, Solr, and Lucene

Big Data

2016-06-07 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Rajkumar Buyya , Rodrigo N. Calheiros , Amir Vahid Dastjerdi

AI/ML Big Data Data Management data data-engineering

Big Data: Principles and Paradigms captures the state-of-the-art research on the architectural aspects, technologies, and applications of Big Data. The book identifies potential future directions and technologies that facilitate insight into numerous scientific, business, and consumer applications. To help realize Big Data’s full potential, the book addresses numerous challenges, offering the conceptual and technological solutions for tackling them. These challenges include life-cycle data management, large-scale storage, flexible processing infrastructure, data modeling, scalable machine learning, data analysis algorithms, sampling techniques, and privacy and ethical issues. Covers computational platforms supporting Big Data applications Addresses key principles underlying Big Data computing Examines key developments supporting next generation Big Data platforms Explores the challenges in Big Data computing and ways to overcome them Contains expert contributors from both academia and industry

Mastering Data Visualization with Microsoft Visio Professional 2016

2016-05-27 · O'Reilly Data Visualization Books O'Reilly Amazon

book

by David Parker

BI DataViz Microsoft data data-science data-science-tasks data-visualization microsoft-visio

Microsoft Visio Professional 2016 is an essential tool for creating sophisticated data visualizations across a variety of contexts and industries. In 'Mastering Data Visualization with Microsoft Visio Professional 2016', you'll learn how to utilize Visio's powerful features to transform data into compelling graphics and actionable insights. What this Book will help me do Understand how to integrate external data from various sources into your Visio diagrams. Master the use of Visio's tools to represent information using data-driven graphics. Learn the process of designing and utilizing custom shapes and templates for tailored visualizations. Discover methods for automating diagram creation from structured and external data sources. Gain techniques to share and present interactive and professional visuals with a wide audience. Author(s) John Marshall, the author of 'Mastering Data Visualization with Microsoft Visio Professional 2016,' brings years of experience in data modeling and visualization. With an extensive technical background, Marshall is a renowned expert in leveraging visual tools to communicate complex ideas effectively. His approachable writing style makes highly technical concepts accessible to professionals at various levels. Who is it for? If you're a business intelligence professional, technical analyst, or a Microsoft Office power user looking to enhance your skills in creating impactful visualizations, this book is for you. Its step-by-step approach is ideal for users of Visio Professional starting out or seeking advanced techniques. You'll gain practical insights and learn to apply them effectively in your business or technical workflows, achieving refined data presentations.

Practical Data Analysis Cookbook

2016-04-29 · O'Reilly Data Science Books O'Reilly Amazon

book

by Tomasz Drabas

Data Science NLP Pandas Python Scikit-learn data data-science data-science-tools

Practical Data Analysis Cookbook takes you on a comprehensive journey to mastering data exploration and analysis using Python. From data cleaning and transformation to building predictive and classification models, this book provides practical recipes for tackling real-world data challenges and extracting valuable insights. What this Book will help me do Efficiently clean, transform, and explore datasets using tools like pandas and OpenRefine. Develop predictive models for time series and other datasets using Python libraries such as scikit-learn and Statsmodels. Apply clustering and classification techniques to real-world data problems to gain actionable insights. Explore advanced topics like natural language processing and graph theory concepts using specialized tools. Build the skills to solve practical data modeling problems encountered in a data science role. Author(s) None Drabas is an experienced data scientist and author who specializes in Python-based data analysis. With a background in tackling intricate data-driven problems, None brings real-world experience to the readers. In creating this Cookbook, None adopts a step-by-step approach, making complex techniques accessible to learners of all backgrounds. Who is it for? If you are a data analyst, data scientist, or someone interested in exploring Python for practical data problems, this book is for you. It suits beginners starting their data journey and intermediate professionals looking to enhance their toolset. With clear instructions, it's ideal for anyone willing to build practical skills and tackle real-world challenges in data analysis.

Measurement Data Modeling and Parameter Estimation

2016-04-19 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Dongyun Yi , Jing Yao , Zhengming Wang , Defeng Gu , Xiaojun Duan

data data-engineering data-models

This book discusses the theories, methods, and application techniques of the measurement data mathematical modeling and parameter estimation. It seeks to build a bridge between mathematical theory and engineering practice in the measurement data processing field so theoretical researchers and technical engineers can communicate. It is organized with abundant materials, such as illustrations, tables, examples, and exercises. The authors create examples to apply mathematical theory innovatively to measurement and control engineering. Not only does this reference provide theoretical knowledge, it provides information on first hand experiences.

talk-data.com

Data Modelling

Activity Trend

Top Events

Top Speakers

Discussing Forecasting with Brett McLaughlin (@akabret), @Akamai

FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to discuss their journey to create the data-driven future.

FutureOfData #DataAnalytics #Leadership #Podcast #BigData #Strategy

Nathaniel Lin, Chief Data Scientist, NFPA

FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to discuss their journey to create the data-driven future.

FutureOfData #DataAnalytics #Leadership #Podcast #BigData #Strategy

Analyzing Data with Power BI and Power Pivot for Excel, First Edition

Tabular Modeling in Microsoft SQL Server Analysis Services, Second Edition

Usage-Driven Database Design: From Logical Data Modeling through Physical Schema Definition

Defining Data Engineering with Maxime Beauchemin - Episode 3

Pro Apache Phoenix: An SQL Driver for HBase, First Edition

Data modeling with Cassandra

Optimizing Cassandra performance

Implementing CDISC Using SAS

Apache HBase Primer

Programming Pig, 2nd Edition

Delivering Business Intelligence with Microsoft SQL Server 2016, Fourth Edition, 4th Edition

Real World SQL and PL/SQL: Advice from the Experts

Enabling Real-time Analytics on IBM z Systems Platform

Cassandra: The Definitive Guide, 2nd Edition

Big Data

Mastering Data Visualization with Microsoft Visio Professional 2016

Practical Data Analysis Cookbook

Measurement Data Modeling and Parameter Estimation