Shakeeb Akhter: DataOps in Action - Implementing Agile and Automation

2018-11-01 · Secrets of Data Analytics Leaders Listen

podcast_episode

by Shakeeb Akhter (Northwestern Medicine) , Wayne Eckerson (Eckerson Group)

Analytics Data Engineering Data Management DataOps DWH

In this episode, Wayne Eckerson and Shakeeb Ahkter dive into DataOps. They discuss what DataOps is, the goals and principles of DataOps, and reasons to adopt a DataOps strategy. Shakeeb also reveals the benefits gained from DataOps and what tools he uses. He is the Director of Enterprise Data Warehouse at Northwestern Medicine and is responsible for direction and oversight of data management, data engineering, and analytics.

Pervasive Intelligence Now

2018-10-23 · O'Reilly Data Science Books O'Reilly Amazon

book

by Anu Jain

Analytics Cyber Security business-intelligence data data-science

  This book looks at strategies to help companies become more intelligent, connected, and agile. It discusses how companies can define and measure high-impact outcomes and use effectively analytics technology to achieve them. It also looks at the technology needed to implement the analytics necessary to achieve high-impact outcomes—from both analytics tool and technical infrastructure perspective. Also discussed are ancillary, but critical, topics such as data security and governance that may not traditionally be a part of analytics discussions but are essential in helping companies maintain a secure environment for their analytics and access the quality data they need to gain critical insights and drive better decision-making.

Understanding #BigData for #BigCities with Maksim ( @MrMaksimize @CityofSanDiego )

2018-10-04 · The Future of Data Podcast | conversation with leaders, influencers, and change makers in the World of Data & Analytics Listen

podcast_episode

by Maksim Pecherskiy (City of San Diego)

AI/ML Analytics API Big Data CI/CD Data Science DevOps

In this podcast, Maksim, CDO @ City of San Diago, discussed the nuances of running big data for big cities. He shares his perspectives on effectively building a central data office in a complex and extremely collaborative environment like a big city. He shared his thoughts on some ways to effectively prioritize which project to pursue. He shared how leadership and execution could blend to solve civic issues relating to big and small cities. A great practitioner podcast for folks seeking to build a robust data science practice across a large and collaborative ecosystem.

Timeline: 0:28 Maksim's journey. 6:45 Maksim's current role. 11:46 Collaboration process in creating a data inventory. 14:52 Working with the bureaucracy. 18:35 Dealing with unforeseen circumstances at work. 20:22 Prioritization at work. 22:58 Qualities of a good data leader. 26:15 Collaboration with other cities. 27:40 Cool data projects in other cities. 30:55 Shortcomings of other city representatives. 36:54 Use cases in AI 39:00 What would Maksim change about himself? 40:50 Future cities and data 43:55 Opportunities for private investors in the public sector. 45:53 Maksim's success mantra. 50:19 Closing remark.

Maksim's Book Recommendation: The Phoenix Project: A Novel about IT, DevOps, and Helping Your Business Win by Gene Kim, Kevin Behr, George Spafford amzn.to/2MAu5Xv

Podcast Link: https://futureofdata.org/understanding-bigdata-for-bigcities-with-maksim-mrmaksimize-cityofsandiego-futureofdata-podcast/

Maksim's BIO: Maksim Pecherskiy: As the CDO for the City of San Diego, working in the Performance & Analytics Department, Maksim strives to bring the necessary components together to allow the City's residents to benefit from a more efficient, agile government that is as innovative as the community around it. He has been solving complex problems with technology for nearly a decade. He spent 2014 working as a Code For America fellow in Puerto Rico, focusing on economic development. His team delivered a product called PrimerPeso that provides business owners and residents a tool to search, and apply for, government programs for which they may be eligible.

Before moving to California, Maksim was a Solutions Architect at Promet Source in Chicago, where he built large web applications and designed complex integrations. He shaped workflow, configuration management, and continuous integration processes while leading and training international development teams. Before his work at Promet, he was a software engineer at AllPlayers, who was instrumental in the design and architecture of its APIs and the development and documentation of supporting client libraries in various languages.

Maksim graduated from DePaul University with a bachelor of science degree in information systems and from Linköping University, Sweden, with a bachelor of science degree in international business. He is also certified as a Lean Six Sigma Green Belt.

About #Podcast:

FutureOfData podcast is a conversation starter to bring leaders, influencers and lead practitioners to come on show and discuss their journey in creating the data driven future.

Wanna Join? If you or any you know wants to join in, Register your interest by mailing us @ [email protected]

Want to sponsor? Email us @ [email protected]

Keywords: FutureOfData,

DataAnalytics,

Leadership,

Futurist,

Podcast,

BigData,

Strategy

IBM z14 Model ZR1 Technical Introduction

2018-10-02 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Octavian Lascu

Analytics Cloud Computing IBM data data-engineering

Abstract This IBM® Redbooks® publication introduces the latest member of the IBM Z platform, the IBM z14 Model ZR1 (Machine Type 3907). It includes information about the Z environment and how it helps integrate data and transactions more securely, and provides insight for faster and more accurate business decisions. The z14 ZR1 is a state-of-the-art data and transaction system that delivers advanced capabilities, which are vital to any digital transformation. The z14 ZR1 is designed for enhanced modularity, which is in an industry standard footprint. This system excels at the following tasks: Securing data with pervasive encryption Transforming a transactional platform into a data powerhouse Getting more out of the platform with IT Operational Analytics Providing resilience towards zero downtime Accelerating digital transformation with agile service delivery Revolutionizing business processes Mixing open source and Z technologies This book explains how this system uses new innovations and traditional Z strengths to satisfy growing demand for cloud, analytics, and open source technologies. With the z14 ZR1 as the base, applications can run in a trusted, reliable, and secure environment that improves operations and lessens business risk.

IBM z14 Technical Introduction

2018-10-02 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Octavian Lascu

Analytics Cloud Computing IBM data data-engineering

Abstract This IBM® Redbooks® publication introduces the latest IBM z platform, the IBM z14™. It includes information about the Z environment and how it helps integrate data and transactions more securely, and can infuse insight for faster and more accurate business decisions. The z14 is a state-of-the-art data and transaction system that delivers advanced capabilities, which are vital to the digital era and the trust economy. This system includes the following functionality: Securing data with pervasive encryption Transforming a transactional platform into a data powerhouse Getting more out of the platform with IT Operational Analytics Providing resilience with key to zero downtime Accelerating digital transformation with agile service delivery Revolutionizing business processes Blending open source and Z technologies This book explains how this system uses both new innovations and traditional Z strengths to satisfy growing demand for cloud, analytics, and mobile applications. With the z14 as the base, applications can run in a trusted, reliable, and secure environment that both improves operations and lessens business risk.

An Agile Approach To Master Data Management with Mark Marinelli - Episode 46

2018-09-03 · Data Engineering Podcast Listen

podcast_episode

by Mark Marinelli (Tamr) , Tobias Macey

AI/ML API BI Cloud Computing Data Engineering Data Management Data Science ERP Master Data Management Cyber Security

Summary

With the proliferation of data sources to give a more comprehensive view of the information critical to your business it is even more important to have a canonical view of the entities that you care about. Is customer number 342 in your ERP the same as Bob Smith on Twitter? Using master data management to build a data catalog helps you answer these questions reliably and simplify the process of building your business intelligence reports. In this episode the head of product at Tamr, Mark Marinelli, discusses the challenges of building a master data set, why you should have one, and some of the techniques that modern platforms and systems provide for maintaining it.

Preamble

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline you’ll need somewhere to deploy it, so check out Linode. With private networking, shared block storage, node balancers, and a 40Gbit network, all controlled by a brand new API you’ve got everything you need to run a bullet-proof data platform. Go to dataengineeringpodcast.com/linode to get a $20 credit and launch a new server in under a minute. You work hard to make sure that your data is reliable and accurate, but can you say the same about the deployment of your machine learning models? The Skafos platform from Metis Machine was built to give your data scientists the end-to-end support that they need throughout the machine learning lifecycle. Skafos maximizes interoperability with your existing tools and platforms, and offers real-time insights and the ability to be up and running with cloud-based production scale infrastructure instantaneously. Request a demo at dataengineeringpodcast.com/metis-machine to learn more about how Metis Machine is operationalizing data science. Go to dataengineeringpodcast.com to subscribe to the show, sign up for the mailing list, read the show notes, and get in touch. Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat Your host is Tobias Macey and today I’m interviewing Mark Marinelli about data mastering for modern platforms

Interview

Introduction How did you get involved in the area of data management? Can you start by establishing a definition of data mastering that we can work from?

How does the master data set get used within the overall analytical and processing systems of an organization?

What is the traditional workflow for creating a master data set?

What has changed in the current landscape of businesses and technology platforms that makes that approach impractical? What are the steps that an organization can take to evolve toward an agile approach to data mastering?

At what scale of company or project does it makes sense to start building a master data set? What are the limitations of using ML/AI to merge data sets? What are the limitations of a golden master data set in practice?

Are there particular formats of data or types of entities that pose a greater challenge when creating a canonical format for them? Are there specific problem domains that are more likely to benefit from a master data set?

Once a golden master has been established, how are changes to that information handled in practice? (e.g. versioning of the data) What storage mechanisms are typically used for managing a master data set?

Are there particular security, auditing, or access concerns that engineers should be considering when managing their golden master that goes beyond the rest of their data infrastructure? How do you manage latency issues when trying to reference the same entities from multiple disparate systems?

What have you found to be the most common stumbling blocks for a group that is implementing a master data platform?

What suggestions do you have to help prevent such a project from being derailed?

What resources do you recommend for someone looking to learn more about the theoretical and practical aspects of

The Complexities Of Modern Data Pipelines by Dave Wells - Audio Blog

2018-07-18 · Secrets of Data Analytics Leaders Listen

podcast_episode

by Dave Wells (Eckerson Group)

Analytics BI

Data pipelines become chaotic with pressures of agile, democratization, self-service, and organizational “pockets” of analytics. From enterprise BI to self-service analysis, data pipeline management should ensure analysis results are traceable, reproducible, and of production strength. Robust data pipelines rely on eight critical components.

Originally published at https://www.eckerson.com/articles/the-complexities-of-modern-data-pipelines

Building Data Flows In Apache NiFi With Kevin Doran and Andy LoPresto - Episode 39

2018-07-08 · Data Engineering Podcast Listen

podcast_episode

by Andy LoPresto , Kevin Doran , Tobias Macey

Airflow Flink API Chef CSV Data Engineering Data Governance Data Management Dataflow DataOps DevOps Docker +13 more

Summary

Data integration and routing is a constantly evolving problem and one that is fraught with edge cases and complicated requirements. The Apache NiFi project models this problem as a collection of data flows that are created through a self-service graphical interface. This framework provides a flexible platform for building a wide variety of integrations that can be managed and scaled easily to fit your particular needs. In this episode project members Kevin Doran and Andy LoPresto discuss the ways that NiFi can be used, how to start using it in your environment, and plans for future development. They also explained how it fits in the broad landscape of data tools, the interesting and challenging aspects of the project, and how to build new extensions.

Preamble

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline you’ll need somewhere to deploy it, so check out Linode. With private networking, shared block storage, node balancers, and a 40Gbit network, all controlled by a brand new API you’ve got everything you need to run a bullet-proof data platform. Go to dataengineeringpodcast.com/linode to get a $20 credit and launch a new server in under a minute. Are you struggling to keep up with customer request and letting errors slip into production? Want to try some of the innovative ideas in this podcast but don’t have time? DataKitchen’s DataOps software allows your team to quickly iterate and deploy pipelines of code, models, and data sets while improving quality. Unlike a patchwork of manual operations, DataKitchen makes your team shine by providing an end to end DataOps solution with minimal programming that uses the tools you love. Join the DataOps movement and sign up for the newsletter at datakitchen.io/de today. After that learn more about why you should be doing DataOps by listening to the Head Chef in the Data Kitchen at dataengineeringpodcast.com/datakitchen Go to dataengineeringpodcast.com to subscribe to the show, sign up for the mailing list, read the show notes, and get in touch. Your host is Tobias Macey and today I’m interviewing Kevin Doran and Andy LoPresto about Apache NiFi

Interview

Introduction How did you get involved in the area of data management? Can you start by explaining what NiFi is? What is the motivation for building a GUI as the primary interface for the tool when the current trend is to represent everything as code? How did you get involved with the project?

Where does it sit in the broader landscape of data tools?

Does the data that is processed by NiFi flow through the servers that it is running on (á la Spark/Flink/Kafka), or does it orchestrate actions on other systems (á la Airflow/Oozie)?

How do you manage versioning and backup of data flows, as well as promoting them between environments?

One of the advertised features is tracking provenance for data flows that are managed by NiFi. How is that data collected and managed?

What types of reporting are available across this information?

What are some of the use cases or requirements that lend themselves well to being solved by NiFi?

When is NiFi the wrong choice?

What is involved in deploying and scaling a NiFi installation?

What are some of the system/network parameters that should be considered? What are the scaling limitations?

What have you found to be some of the most interesting, unexpected, and/or challenging aspects of building and maintaining the NiFi project and community? What do you have planned for the future of NiFi?

Contact Info

Kevin Doran

@kevdoran on Twitter Email

Andy LoPresto

@yolopey on Twitter Email

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Links

NiFi HortonWorks DataFlow HortonWorks Apache Software Foundation Apple CSV XML JSON Perl Python Internet Scale Asset Management Documentum DataFlow NSA (National Security Agency) 24 (TV Show) Technology Transfer Program Agile Software Development Waterfall Spark Flink Kafka Oozie Luigi Airflow FluentD ETL (Extract, Transform, and Load) ESB (Enterprise Service Bus) MiNiFi Java C++ Provenance Kubernetes Apache Atlas Data Governance Kibana K-Nearest Neighbors DevOps DSL (Domain Specific Language) NiFi Registry Artifact Repository Nexus NiFi CLI Maven Archetype IoT Docker Backpressure NiFi Wiki TLS (Transport Layer Security) Mozilla TLS Observatory NiFi Flow Design System Data Lineage GDPR (General Data Protection Regulation)

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA Support Data Engineering Podcast

Essentials of Time Series for Financial Applications

2018-05-29 · O'Reilly Data Science Books O'Reilly Amazon

book

by Massimo Guidolin , Manuela Pedio

data data-science data-science-tasks statistics time-series

Essentials of Time Series for Financial Applications serves as an agile reference for upper level students and practitioners who desire a formal, easy-to-follow introduction to the most important time series methods applied in financial applications (pricing, asset management, quant strategies, and risk management). Real-life data and examples developed with EViews illustrate the links between the formal apparatus and the applications. The examples either directly exploit the tools that EViews makes available or use programs that by employing EViews implement specific topics or techniques. The book balances a formal framework with as few proofs as possible against many examples that support its central ideas. Boxes are used throughout to remind readers of technical aspects and definitions and to present examples in a compact fashion, with full details (workout files) available in an on-line appendix. The more advanced chapters provide discussion sections that refer to more advanced textbooks or detailed proofs. Provides practical, hands-on examples in time-series econometrics Presents a more application-oriented, less technical book on financial econometrics Offers rigorous coverage, including technical aspects and references for the proofs, despite being an introduction Features examples worked out in EViews (9 or higher)

Enhancing the IBM Power Systems Platform with IBM Watson Services

2018-04-12 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Soheel Chughtai , Ahmed Azraq , Ahmed Mashhour , Duy V Nguyen , Reginaldo Marcelo Dos Santos

AI/ML API IBM Linux data data-engineering ibm-power-systems

Abstract This IBM® Redbooks® publication provides an introduction to the IBM POWER® processor architecture. It describes the IBM POWER processor and IBM Power Systems™ servers, highlighting the advantages and benefits of IBM Power Systems servers, IBM AIX®, IBM i, and Linux on Power. This publication showcases typical business scenarios that are powered by Power Systems servers. It provides an introduction to the artificial intelligence (AI) capabilities that IBM Watson® services enable, and how these AI capabilities can be augmented in existing applications by using an agile approach to embed intelligence into every operational process. For each use case, the business benefits of adding Watson services are detailed. This publication gives an overview about each Watson service, and how each one is commonly used in real business scenarios. It gives an introduction to the Watson API explorer, which you can use to try the application programming interfaces (APIs) and their capabilities. The Watson services are positioned against the machine learning capabilities of IBM PowerAI. In this publication, you have a guide about how to set up a development environment on Power Systems servers, a sample code implementation of one of the business cases, and a description of preferred practices to move any application that you develop into production. This publication is intended for technical professionals who are interested in learning about or implementing IBM Watson services on AIX, IBM i, and Linux.

IBM z14 Model ZR1 Technical Introduction

2018-04-10 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Frank Packheiser , John Troy , Bill White , Octavian Lascu , Hervey Kamga , Martijn Raave

Analytics Cloud Computing IBM data data-engineering

Abstract This IBM® Redbooks® publication introduces the latest member of the IBM Z platform, the IBM z14 Model ZR1 (Machine Type 3907). It includes information about the Z environment and how it helps integrate data and transactions more securely, and provides insight for faster and more accurate business decisions. The z14 ZR1 is a state-of-the-art data and transaction system that delivers advanced capabilities, which are vital to any digital transformation. The z14 ZR1 is designed for enhanced modularity, which is in an industry standard footprint. This system excels at the following tasks: Securing data with pervasive encryption Transforming a transactional platform into a data powerhouse Getting more out of the platform with IT Operational Analytics Providing resilience towards zero downtime Accelerating digital transformation with agile service delivery Revolutionizing business processes Mixing open source and Z technologies This book explains how this system uses new innovations and traditional Z strengths to satisfy growing demand for cloud, analytics, and open source technologies. With the z14 ZR1 as the base, applications can run in a trusted, reliable, and secure environment that improves operations and lessens business risk.

Defining DataOps with Chris Bergh - Episode 26

2018-04-08 · Data Engineering Podcast Listen

podcast_episode

by Christopher Bergh (DataKitchen) , Tobias Macey

Analytics API Data Analytics Data Engineering Data Management Datadog DataOps DevOps Informatica Talend

Summary

Managing an analytics project can be difficult due to the number of systems involved and the need to ensure that new information can be delivered quickly and reliably. That challenge can be met by adopting practices and principles from lean manufacturing and agile software development, and the cross-functional collaboration, feedback loops, and focus on automation in the DevOps movement. In this episode Christopher Bergh discusses ways that you can start adding reliability and speed to your workflow to deliver results with confidence and consistency.

Preamble

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline you’ll need somewhere to deploy it, so check out Linode. With private networking, shared block storage, node balancers, and a 40Gbit network, all controlled by a brand new API you’ve got everything you need to run a bullet-proof data platform. Go to dataengineeringpodcast.com/linode to get a $20 credit and launch a new server in under a minute. For complete visibility into the health of your pipeline, including deployment tracking, and powerful alerting driven by machine-learning, DataDog has got you covered. With their monitoring, metrics, and log collection agent, including extensive integrations and distributed tracing, you’ll have everything you need to find and fix performance bottlenecks in no time. Go to dataengineeringpodcast.com/datadog today to start your free 14 day trial and get a sweet new T-Shirt. Go to dataengineeringpodcast.com to subscribe to the show, sign up for the newsletter, read the show notes, and get in touch. Your host is Tobias Macey and today I’m interviewing Christopher Bergh about DataKitchen and the rise of DataOps

Interview

Introduction How did you get involved in the area of data management? How do you define DataOps?

How does it compare to the practices encouraged by the DevOps movement? How does it relate to or influence the role of a data engineer?

How does a DataOps oriented workflow differ from other existing approaches for building data platforms? One of the aspects of DataOps that you call out is the practice of providing multiple environments to provide a platform for testing the various aspects of the analytics workflow in a non-production context. What are some of the techniques that are available for managing data in appropriate volumes across those deployments? The practice of testing logic as code is fairly well understood and has a large set of existing tools. What have you found to be some of the most effective methods for testing data as it flows through a system? One of the practices of DevOps is to create feedback loops that can be used to ensure that business needs are being met. What are the metrics that you track in your platform to define the value that is being created and how the various steps in the workflow are proceeding toward that goal?

In order to keep feedback loops fast it is necessary for tests to run quickly. How do you balance the need for larger quantities of data to be used for verifying scalability/performance against optimizing for cost and speed in non-production environments?

How does the DataKitchen platform simplify the process of operationalizing a data analytics workflow? As the need for rapid iteration and deployment of systems to capture, store, process, and analyze data becomes more prevalent how do you foresee that feeding back into the ways that the landscape of data tools are designed and developed?

Contact Info

LinkedIn @ChrisBergh on Twitter Email

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Links

DataOps Manifesto DataKitchen 2017: The Year Of DataOps Air Traffic Control Chief Data Officer (CDO) Gartner W. Edwards Deming DevOps Total Quality Management (TQM) Informatica Talend Agile Development Cattle Not Pets IDE (Integrated Devel

Database Refactoring Patterns with Pramod Sadalage - Episode 22

2018-03-12 · Data Engineering Podcast Listen

podcast_episode

by Pramod Sadalage , Tobias Macey

CI/CD Data Engineering Data Management DevOps Docker DWH GitHub Java Linux MongoDB Neo4j NoSQL +1 more

Summary

As software lifecycles move faster, the database needs to be able to keep up. Practices such as version controlled migration scripts and iterative schema evolution provide the necessary mechanisms to ensure that your data layer is as agile as your application. Pramod Sadalage saw the need for these capabilities during the early days of the introduction of modern development practices and co-authored a book to codify a large number of patterns to aid practitioners, and in this episode he reflects on the current state of affairs and how things have changed over the past 12 years.

Preamble

Hello and welcome to the Data Engineering Podcast, the show about modern data infrastructure When you’re ready to launch your next project you’ll need somewhere to deploy it. Check out Linode at dataengineeringpodcast.com/linode and get a $20 credit to try out their fast and reliable Linux virtual servers for running your data pipelines or trying out the tools you hear about on the show. Go to dataengineeringpodcast.com to subscribe to the show, sign up for the newsletter, read the show notes, and get in touch. You can help support the show by checking out the Patreon page which is linked from the site. To help other people find the show you can leave a review on iTunes, or Google Play Music, and tell your friends and co-workers Your host is Tobias Macey and today I’m interviewing Pramod Sadalage about refactoring databases and integrating database design into an iterative development workflow

Interview

Introduction How did you get involved in the area of data management? You first co-authored Refactoring Databases in 2006. What was the state of software and database system development at the time and why did you find it necessary to write a book on this subject? What are the characteristics of a database that make them more difficult to manage in an iterative context? How does the practice of refactoring in the context of a database compare to that of software? How has the prevalence of data abstractions such as ORMs or ODMs impacted the practice of schema design and evolution? Is there a difference in strategy when refactoring the data layer of a system when using a non-relational storage system? How has the DevOps movement and the increased focus on automation affected the state of the art in database versioning and evolution? What have you found to be the most problematic aspects of databases when trying to evolve the functionality of a system? Looking back over the past 12 years, what has changed in the areas of database design and evolution?

How has the landscape of tooling for managing and applying database versioning changed since you first wrote Refactoring Databases? What do you see as the biggest challenges facing us over the next few years?

Contact Info

Website pramodsadalage on GitHub @pramodsadalage on Twitter

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Links

Database Refactoring

Website Book

Thoughtworks Martin Fowler Agile Software Development XP (Extreme Programming) Continuous Integration

The Book Wikipedia

Test First Development DDL (Data Definition Language) DML (Data Modification Language) DevOps Flyway Liquibase DBMaintain Hibernate SQLAlchemy ORM (Object Relational Mapper) ODM (Object Document Mapper) NoSQL Document Database MongoDB OrientDB CouchBase CassandraDB Neo4j ArangoDB Unit Testing Integration Testing OLAP (On-Line Analytical Processing) OLTP (On-Line Transaction Processing) Data Warehouse Docker QA==Quality Assurance HIPAA (Health Insurance Portability and Accountability Act) PCI DSS (Payment Card Industry Data Security Standard) Polyglot Persistence Toplink Java ORM Ruby on Rails ActiveRecord Gem

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA Support Data Engineering Podcast

The Sentient Enterprise

2017-10-16 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Thomas Davenport (Babson College) , Oliver Ratzesberger (Google Cloud) , Mohanbir Sawhney

Analytics Teradata data data-engineering

Mohan and Oliver have been very fortunate to have intimate views into the data challenges that face the largest organizations and institutions across every possible industry—and what they have been hearing about for some time is how the business needs to use data and analytics to their advantage. They continually hear the same issues, such as: We're spending valuable meeting time wondering why everyone's data doesn't match up. We can't leverage our economies of scale while remaining agile with data. We need self-serve apps that let the enterprise experiment with data and accelerate the development process. We need to get on a more predictive curve to ensure long-term success. To really address the data concerns of today's enterprise, they wanted to find a way to help enterprises achieve the success they seek. Not as a prescriptive process—but a methodology to become agile and leverage data and analytics to drive a competitive advantage. You know, it's amazing what can happen when two people with very different perspectives get together to solve a big problem. This evolutionary guide resulted from the a-ha moment between these two influencers at the top of their fields—one, an academic researcher and consultant, and the other, a longtime analytics practitioner and chief product officer at Teradata. Together, they created a powerful framework every type of business can use to connect analytic power, business practices, and human dynamics in ways that can transform what is currently possible.

Manage Your SAP Projects with SAP Activate

2017-10-04 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Vinay Singh

SAP data data-engineering

Dive into SAP Activate, a cutting-edge methodology for SAP S/4HANA implementation, designed to enhance your project management effectiveness. This book delivers a step-by-step introduction to the SAP Activate framework, covering Agile and Scrum approaches. You will learn how this framework facilitates achieving project objectives efficiently, providing you with the tools to streamline your SAP projects. What this Book will help me do Understand the key components and significance of SAP S/4HANA. Learn the framework and pillars of SAP Activate for successful SAP implementation. Master application of Agile and Scrum methodologies within SAP projects. Explore real-world case studies demonstrating SAP Activate in action. Develop a sample project using the SAP Activate framework to build hands-on expertise. Author(s) Vinay Singh is a seasoned SAP consultant with extensive experience in SAP implementations across various industries. With a focus on methodical and actionable guidance, Vinay has crafted his writing to help readers excel in practical SAP implementation. His work is complemented by a rich understanding of Agile methodologies applied to SAP contexts. Who is it for? This book is ideal for SAP professionals and consultants aspiring to efficiently implement and manage SAP projects using the SAP Activate approach. It is especially beneficial for those familiar with SAP HANA looking to transition from traditional waterfall methods to more agile frameworks. Readers seeking to enhance their project management skillset for SAP S/4HANA will find this book indispensable.

Essentials of Cloud Application Development on IBM Bluemix

2017-08-07 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Hala Aziz , Ahmed Azraq , Sally Fikry , Ben Smith , Mohamed El-Khouly , Ahmed S. Hassan

API Cloud Computing Computer Science Dashboard DevOps Git IBM JavaScript JSON data data-engineering

Abstract This IBM® Redbooks® publication is based on the Presentations Guide of the course Essentials of Cloud Application Development on IBM Bluemix that was developed by the IBM Redbooks team in partnership with IBM Skills Academy Program. This course is designed to teach university students the basic skills that are required to develop, deploy, and test cloud-based applications that use the IBM Bluemix® cloud services. The primary target audience for this course is university students in undergraduate computer science and computer engineer programs with no previous experience working in cloud environments. However, anyone new to cloud computing can also benefit from this course. After completing this course, you should be able to accomplish the following tasks: Define cloud computing Describe the factors that lead to the adoption of cloud computing Describe the choices that developers have when creating cloud applications Describe infrastructure as a service, platform as a service, and software as a service Describe IBM Bluemix and its architecture Identify the runtimes and services that IBM Bluemix offers Describe IBM Bluemix infrastructure types Create an application in IBM Bluemix Describe the IBM Bluemix dashboard, catalog, and documentation features Explain how the application route is used to test an application from the browser Create services in IBM Bluemix Describe how to bind services to an application in IBM Bluemix Describe the environment variables that are used with IBM Bluemix services Explain what are IBM Bluemix organizations, domains, spaces, and users Describe how to create an IBM SDK for Node.js application that runs on IBM Bluemix Explain how to manage your IBM Bluemix account with the Cloud Foundry CLI Describe how to set up and use the IBM Bluemix plug-in for Eclipse Describe the role of Node.js for server-side scripting Describe IBM Bluemix DevOps Services and the capabilities of IBM DevOps Services Identify the Web IDE features in IBM Bluemix DevOps Describe how to connect a Git repository client to Bluemix DevOps Services project Explain the pipeline build and deploy processes that IBM Bluemix DevOps Services use Describe how IBM Bluemix DevOps Services integrate with the IBM Bluemix cloud Describe the agile planning tools in IBM Bluemix Describe the characteristics of REST APIs Explain the advantages of the JSON data format Describe an example of REST APIs using Watson Describe the main types of data services in IBM Bluemix Describe the benefits of IBM Cloudant® Explain how Cloudant databases and documents are accessed from IBM Bluemix Describe how to use REST APIs to interact with Cloudant database Describe Bluemix mobile backend as a service (MBaaS) and the MBaaS architecture Describe the Push Notifications service Describe the App ID service Describe the Kinetise service Describe how to create Bluemix Mobile applications by using MobileFirst Services Starter Boilerplate The workshop materials were created in June 2017. Therefore, all IBM Bluemix features that are described in this Presentations Guide and IBM Bluemix user interfaces that are used in the examples are current as of June 2017.

IBM z14 Technical Introduction

2017-07-26 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Esra Ufacik , Frank Packheiser , John Troy , Bill White , Octavian Lascu , Michal Kordyzon , Hervey Kamga , Bo XU

Analytics Cloud Computing IBM Cyber Security data data-engineering

Abstract This IBM® Redpaper Redbooks® publication introduces the latest IBM Z platform, the IBM z14®. It includes information about the Z environment and how it helps integrate data and transactions more securely, and can infuse insight for faster and more accurate business decisions. The z14 is state-of-the-art data and transaction system that delivers advanced capabilities, which are vital to the digital era and the trust economy. These capabilities include: - Securing data with pervasive encryption - Transforming a transactional platform into a data powerhouse - Getting more out of the platform with IT Operational Analytics - Providing resilience with key to zero downtime - Accelerating digital transformation with agile service delivery - Revolutionizing business processes - Blending open source and Z technologies This book explains how this system uses both new innovations and traditional Z strengths to satisfy growing demand for cloud, analytics, and security. With the z14 as the base, applications can run in a trusted, reliable, and secure environment that both improves operations and lessens business risk.

IBM Spectrum Accelerate Deployment, Usage, and Maintenance

2017-07-19 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Markus Oscheka , Bertrand Dufrasne , Abilio Oliveira , Grant Kabobel

Cloud Computing IBM data data-engineering

Abstract This edition applies to IBM® Spectrum Accelerate V11.5.4. IBM Spectrum Accelerate™, a member of IBM Spectrum Storage™, is an agile, software-defined storage solution for enterprise and cloud that builds on the customer-proven and mature IBM XIV® storage software. The key characteristic of Spectrum Accelerate is that it can be easily deployed and run on purpose-built or existing hardware that is chosen by the customer. IBM Spectrum Accelerate enables rapid deployment of high-performance and scalable block data storage infrastructure over commodity hardware on-premises or off-premises. This IBM Redbooks® publication provides a broad understanding of IBM Spectrum Accelerate. The book introduces Spectrum Accelerate and describes planning and preparation that are essential for a successful deployment of the solution. The deployment is described through a step-by-step approach, by using a graphical user interface (GUI) based method or a simple command-line interface (CLI) based procedure. Chapters in this book describe the logical configuration of the system, host support and business continuity functions, and migration. Although it makes many references to the XIV storage software, the book also emphasizes where IBM Spectrum Accelerate differs from XIV. Finally, a substantial portion of the book is dedicated to maintenance and troubleshooting to provide detailed guidance for the customer support personnel.

Principles of Data Wrangling

2017-07-14 · O'Reilly Data Science Books O'Reilly Amazon

book

by Jeffrey Heer , Connor Carreras (Firebolt) , Sean Kandel , Joseph M. Hellerstein , Tye Rattenbury

Trifacta data data-science

A key task that any aspiring data-driven organization needs to learn is data wrangling, the process of converting raw data into something truly useful. This practical guide provides business analysts with an overview of various data wrangling techniques and tools, and puts the practice of data wrangling into context by asking, "What are you trying to do and why?" Wrangling data consumes roughly 50-80% of an analyst’s time before any kind of analysis is possible. Written by key executives at Trifacta, this book walks you through the wrangling process by exploring several factors—time, granularity, scope, and structure—that you need to consider as you begin to work with data. You’ll learn a shared language and a comprehensive understanding of data wrangling, with an emphasis on recent agile analytic processes used by many of today’s data-driven organizations. Appreciate the importance—and the satisfaction—of wrangling data the right way. Understand what kind of data is available Choose which data to use and at what level of detail Meaningfully combine multiple sources of data Decide how to distill the results to a size and shape that can drive downstream analysis

Analytics

2017-07-05 · O'Reilly Data Science Books O'Reilly Amazon

book

by Phil Simon

Analytics KPI data data-science google-analytics web-analytics

For years, organizations have struggled to make sense out of their data. IT projects designed to provide employees with dashboards, KPIs, and business-intelligence tools often take a year or more to reach the finish line...if they get there at all. This has always been a problem. Today, though, it's downright unacceptable. The world changes faster than ever. Speed has never been more important. By adhering to antiquated methods, firms lose the ability to see nascent trends—and act upon them until it's too late. But what if the process of turning raw data into meaningful insights didn't have to be so painful, time-consuming, and frustrating? What if there were a better way to do analytics? Fortunately, you're in luck... Analytics: The Agile Way is the eighth book from award-winning author and Arizona State University professor Phil Simon. Analytics: The Agile Way demonstrates how progressive organizations such as Google, Nextdoor, and others approach analytics in a fundamentally different way. They are applying the same Agile techniques that software developers have employed for years. They have replaced large batches in favor of smaller ones...and their results will astonish you. Through a series of case studies and examples, Analytics: The Agile Way demonstrates the benefits of this new analytics mind-set: superior access to information, quicker insights, and the ability to spot trends far ahead of your competitors.

talk-data.com

Agile/Scrum

Activity Trend

Top Events

Top Speakers

Shakeeb Akhter: DataOps in Action - Implementing Agile and Automation

Pervasive Intelligence Now

Understanding #BigData for #BigCities with Maksim ( @MrMaksimize @CityofSanDiego )

FutureOfData podcast is a conversation starter to bring leaders, influencers and lead practitioners to come on show and discuss their journey in creating the data driven future.

IBM z14 Model ZR1 Technical Introduction

IBM z14 Technical Introduction

An Agile Approach To Master Data Management with Mark Marinelli - Episode 46

The Complexities Of Modern Data Pipelines by Dave Wells - Audio Blog

Building Data Flows In Apache NiFi With Kevin Doran and Andy LoPresto - Episode 39

Essentials of Time Series for Financial Applications

Enhancing the IBM Power Systems Platform with IBM Watson Services

IBM z14 Model ZR1 Technical Introduction

Defining DataOps with Chris Bergh - Episode 26

Database Refactoring Patterns with Pramod Sadalage - Episode 22

The Sentient Enterprise

Manage Your SAP Projects with SAP Activate

Essentials of Cloud Application Development on IBM Bluemix

IBM z14 Technical Introduction

IBM Spectrum Accelerate Deployment, Usage, and Maintenance

Principles of Data Wrangling

Analytics