Understanding Data Analytics in Information Security with @JayJarome, @BitSight

2017-09-19 · The Future of Data Podcast | conversation with leaders, influencers, and change makers in the World of Data & Analytics Listen

podcast_episode

by Jay Jarome (BitSight)

Analytics Big Data Cyber Security

In this Podcast, Jay talks about the landscape of Information Security and how businesses are preparing to address their cybersecurity challenges. This is a great podcast for anyone interested in learning about best practices when it comes to managing infrastructure security for their organization.

Timeline: 0:29 Jay's journey. 3:18 What's Scientia Institute? 8:28 The book Data-Driven Security. 10:42 The aha moment while writing the book. 11:53 High points of Jay's book. 14:08 Security level of a typical business today. 16:22 Thoughts on how companies can understand risk. 19:50 Balancing mitigation of threat vs. business continuity. 25:33 Treating security as a financial problem. 27:25 Security predictability and insurance. 28:44 Who should take responsibility for risk and security? 30:15 Measuring the risk of company infrastructure. 31:33 Tackling standards and regulations. 33:04 The concept of best practices. 34:38 The maturity of the model in the security side of businesses. 37:55 The lower limit and higher limit of security. 39:50 Resources to learn about security. 41:11 Who's a good security candidate? 42:20 Jay's favorite read. 43:36 Examples of companies who're doing well in security. 45:28 What's next in the world of security. 47:40 Closing remarks.

Podcast link: https://futureofdata.org/understanding-data-analytics-information-security-jayjarome-bitsight/

About #Podcast:

FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to discuss their journey to create the data-driven future.

Wanna Join? If you or any you know wants to join in, Register your interest @ http://play.analyticsweek.com/guest/

Want to sponsor? Email us @ [email protected]

Keywords: FutureOfData Data Analytics Leadership Podcast Big Data Strategy

Data Warehousing with Greenplum

2017-07-27 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Marshall Presser

Analytics BI Data Engineering DWH RDBMS Cyber Security SQL data data-engineering data-warehouse storage-repositories

Relational databases haven’t gone away, but they are evolving to integrate messy, disjointed unstructured data into a cleansed repository for analytics. With the execution of massively parallel processing (MPP), the latest generation of analytic data warehouses is helping organizations move beyond business intelligence to processing a variety of advanced analytic workloads. These MPP databases expose their power with the familiarity of SQL. This report introduces the Greenplum Database, recently released as an open source project by Pivotal Software. Lead author Marshall Presser of Pivotal Data Engineering takes you through the Greenplum approach to data analytics and data-driven decisions, beginning with Greenplum’s shared-nothing architecture. You’ll explore data organization and storage, data loading, running queries, as well as performing analytics in the database. You’ll learn: How each networked node in Greenplum’s architecture features an independent operating system, memory, and storage Four deployment options to help you balance security, cost, and time to usability Ways to organize data, including distribution, storage, partitioning, and loading How to use Apache MADlib for in-database analytics, and GPText to process and analyze free-form text Tools for monitoring, managing, securing, and optimizing query responses available in the Pivotal Greenplum commercial database

Mastering Apache Spark 2.x - Second Edition

2017-07-26 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Romeo Kienzler

AI/ML Analytics Big Data Cloud Computing IBM Kubernetes Scala Spark SQL apache-spark data data-engineering

Mastering Apache Spark 2.x is the essential guide to harnessing the power of big data processing. Dive into real-time data analytics, machine learning, and cluster computing using Apache Spark's advanced features and modules like Spark SQL and MLlib. What this Book will help me do Gain proficiency in Spark's batch and real-time data processing with SparkSQL. Master techniques for machine learning and deep learning using SparkML and SystemML. Understand the principles of Spark's graph processing with GraphX and GraphFrames. Learn to deploy Apache Spark efficiently on platforms like Kubernetes and IBM Cloud. Optimize Spark cluster performance by configuring parameters effectively. Author(s) Romeo Kienzler is a seasoned professional in big data and machine learning technologies. With years of experience in cloud-based distributed systems, Romeo brings practical insights into leveraging Apache Spark. He combines his deep technical expertise with a clear and engaging writing style. Who is it for? This book is tailored for intermediate Apache Spark users eager to deepen their knowledge in Spark 2.x's advanced features. Ideal for data engineers and big data professionals seeking to enhance their analytics pipelines with Spark. A basic understanding of Spark and Scala is necessary. If you're aiming to optimize Spark for real-world applications, this book is crafted for you.

#FutureOfData with Robin Thottungal, Chief Data Scientist at EPA

2017-07-13 · The Future of Data Podcast | conversation with leaders, influencers, and change makers in the World of Data & Analytics Listen

podcast_episode

by Robin Thottungal (EPA)

Analytics Big Data Data Management Data Science

In this podcast, Robin discussed how an analytics organization functions in a collaborative culture. He shed some light on preparing a robust framework while working on policy rich setup. This talk is a must for anyone building an analytics organization with a culture-rich or policy rich environment.

Timeline: 0:29 Robin's journey. 6:02 Challenges in working as a chief data scientist. 9:50 Two breeds of data scientists. 13:38 Introducing data science into large companies. 16:57 Creating a center of excellence with data. 19:52 Challenges in working with a government agency. 22:57 Creating a self-serving system. 26:29 Defining chief data officer, chief analytics officer, chief data scientist. 28:28 Designing an architecture for a rapidly changing company culture. 31:39 Future of analytics and data leaders. 35:47 Art of doing business and science of doing business. 42:26 Perfect data science hire. 45:08 Closing remarks.

Podcast link: https://futureofdata.org/futureofdata-with-robin-thottungal-chief-data-scientist-at-epa/

Here's Robin's bio on his current EPA Role: - Leading the Data Analytics effort of 15,000+ member agency through providing strategic vision, program development, evangelizing the value of data-driven decision making, bringing a lean-start up approach to the public sector & building advanced data analytics platform capable of real-time/batch analysis.

-Serving as Chief data scientist for the agency, including directing, coordinating, and overseeing the division’s leadership of EPA’s multimedia data analytics, visualization, and predictive analysis work along with related tools, application development, and services.

-Develop and oversee the implementation of Agency policy on integration analysis of environmental data, including multimedia analysis and assessments of environmental quality, status, and trends.

-Develop, market, and implement tactical and strategic plans for the Agency’s data management, advanced data analytics, and predictive analysis work.

-Lead crossfederal, state, tribal, and local government data partnerships as well as information partnerships with other entities.

About #Podcast:

FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to discuss their journey to create the data-driven future.

Wanna Join? If you or any you know wants to join in, Register your interest @ http://play.analyticsweek.com/guest/

Want to sponsor? Email us @ [email protected]

Keywords:

FutureOfData #DataAnalytics #Leadership #Podcast #BigData #Strategy

Practical Predictive Analytics

2017-06-30 · O'Reilly Data Science Books O'Reilly Amazon

book

by Ralph Winters

Analytics Big Data Data Science Databricks Marketing Spark business-intelligence data data-science prescriptive-analytics

Dive into the world of predictive analytics with 'Practical Predictive Analytics.' This comprehensive guide walks you through analyzing current and historical data to predict future outcomes. Using tools like R and Spark, you will master practical skills, solve real-world challenges, and apply predictive analytics across domains like marketing, healthcare, and retail. What this Book will help me do Learn the six steps for successfully implementing predictive analytics projects. Acquire practical skills in data cleaning, input, and model deployment using tools like R and Spark. Understand core predictive analytics algorithms and their applications in various industries. Apply data analytics techniques to solve problems in fields such as healthcare and marketing. Master methods for handling big data analytics using Databricks and Spark for effective prediction. Author(s) The author, None Winters, is an experienced data scientist and technical educator. With extensive background in predictive analytics, Winters specializes in applying statistical methods and techniques to real-world consultation scenarios. Winters brings a practical and accessible approach to this text, ensuring that learners can follow along and apply their newfound expertise effectively. Who is it for? This book is ideal for statisticians and analysts with some programming background in languages like R, who want to master predictive analytics skills. It caters to intermediate learners who aim to enhance their ability to solve complex analytical problems. Whether you're looking to advance your career or improve your proficiency in data science, this book will serve as a valuable resource for learning and growth.

Apache Spark 2.x Cookbook

2017-05-31 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Rishi Yadav (Roost.ai)

AI/ML Analytics Big Data Cloud Computing Kafka Scala Spark Data Streaming apache-spark data data-engineering

Discover how to harness the power of Apache Spark 2.x for your Big Data processing projects. In this book, you will explore over 70 cloud-ready recipes that will guide you to perform distributed data analytics, structured streaming, machine learning, and much more. What this Book will help me do Effectively install and configure Apache Spark with various cluster managers and platforms. Set up and utilize development environments tailored for Spark applications. Operate on schema-aware data using RDDs, DataFrames, and Datasets. Perform real-time streaming analytics with sources such as Apache Kafka. Leverage MLlib for supervised learning, unsupervised learning, and recommendation systems. Author(s) None Yadav is a seasoned data engineer with a deep understanding of Big Data tools and technologies, particularly Apache Spark. With years of experience in the field of distributed computing and data analysis, Yadav brings practical insights and techniques to enrich the learning experience of readers. Who is it for? This book is ideal for data engineers, data scientists, and Big Data professionals who are keen to enhance their Apache Spark 2.x skills. If you're working with distributed processing and want to solve complex data challenges, this book addresses practical problems. Note that a basic understanding of Scala is recommended to get the most out of this resource.

On a Mission in Medicine: Using Data for Personalized Treatment

2017-05-25 · Data Made to Matter by MIT Sloan School of Management Listen

podcast_episode

by Dimitris Bertsimas (MIT Sloan School of Management)

Analytics

Data-based algorithms are personalizing medicine. A family history with diabetes led MIT Sloan professor Dimitris Bertsimas to make breakthroughs in treatment. We spoke with Dimitris about why he works on diabetes, and how he’s using his expertise in data analytics to help.

Nathaniel Lin, Chief Data Scientist, NFPA

2017-05-18 · The Future of Data Podcast | conversation with leaders, influencers, and change makers in the World of Data & Analytics Listen

podcast_episode

by Nathaniel Lin (National Fire Protection Association)

AI/ML Analytics Big Data Data Modelling Data Science IBM Marketing

In this session, Nathaniel discussed how NFPA uses data to empower fire stations worldwide with data-driven insights. We discussed the future of fire in this tech-driven world.

Timeline: 0:29 Nathaniel's journey. 3:50 What's NFPA? 6:12 Nathaniel's role in NFPA. 8:50 Nathaniel's book. 12:21 The data science team at NFPA. 15:01 Working with the government. 18:50 Interesting use cases of NFPA. 25:49 Fining tuning the data model at NFPA. 28:11 NFPA alliance with the Insurance industry. 31:33 Recruiting an idea concept or tool. 33:16 How to approach NFPA? 36:03 Nathaniel's role: in facing or outfacing? 40:41 Suggestions for Non-profits to build a data science practice. 43:49 Putting together a data science team. 46:34 Predicting the fire outcome. 48:11 Closing remarks.

Podcast link: https://futureofdata.org/futureofdata-nathaniel-lin-chief-data-scientist-nfpa/

Bio- Nathaniel Lin has an extensive background in business and marketing analytics with strategic roles in both start-ups and Fortune 500 companies. He offers the National Fire Protection Association (NFPA) agency and client perspective gleaned from his work at Fidelity Investments, OgilvyOne, Aspen Marketing, and IBM Worldwide. During his tenure with IBM Asia Pacific, he also built and led a marketing analytics group that won a DMA/NCDM Gold Award in B2B Marketing.

Lin served as an adjunct professor of business analytics at Boston College and Georgia Tech College of Management. He is also the founder of two LinkedIn groups related to big data analytics and is the 2014 author of Applied Business Analytics – Integrating Business Process, Big Data, and Advanced Analytics. Lin has an MBA in Management of Technology/Sloan Fellows from MIT Sloan School of Management and earned both a Ph.D. In Environmental Engineering and an Honors B.S from Birmingham University in England.

Founded in 1896, NFPA is a global, nonprofit organization devoted to eliminating death, injury, property, and economic loss due to fire, electrical and related hazards. The association delivers information and knowledge through more than 300 consensus codes and standards, research, training, education, outreach, and advocacy; and partner with others who share an interest in furthering the NFPA mission. For more information, visit www.nfpa.org.

The podcast is sponsored by: TAO.ai(https://tao.ai), Artificial Intelligence Driven Career Coach

About #Podcast:

FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to discuss their journey to create the data-driven future.

Wanna Join? If you or any you know wants to join in, Register your interest @ http://play.analyticsweek.com/guest/

Want to sponsor? Email us @ [email protected]

Keywords:

FutureOfData #DataAnalytics #Leadership #Podcast #BigData #Strategy

Python: Data Analytics and Visualization

2017-03-31 · O'Reilly Data Visualization Books O'Reilly Amazon

book

by Martin Czygan , Ashish Kumar (Grainite) , Kirthi Raman , Phuong Vo.T.H

AI/ML Analytics API DataViz IoT Matplotlib NumPy Pandas Python Scikit-learn SciPy programming-languages +1 more

Understand, evaluate, and visualize data About This Book Learn basic steps of data analysis and how to use Python and its packages A step-by-step guide to predictive modeling including tips, tricks, and best practices Effectively visualize a broad set of analyzed data and generate effective results Who This Book Is For This book is for Python Developers who are keen to get into data analysis and wish to visualize their analyzed data in a more efficient and insightful manner. What You Will Learn Get acquainted with NumPy and use arrays and array-oriented computing in data analysis Process and analyze data using the time-series capabilities of Pandas Understand the statistical and mathematical concepts behind predictive analytics algorithms Data visualization with Matplotlib Interactive plotting with NumPy, Scipy, and MKL functions Build financial models using Monte-Carlo simulations Create directed graphs and multi-graphs Advanced visualization with D3 In Detail You will start the course with an introduction to the principles of data analysis and supported libraries, along with NumPy basics for statistics and data processing. Next, you will overview the Pandas package and use its powerful features to solve data-processing problems. Moving on, you will get a brief overview of the Matplotlib API .Next, you will learn to manipulate time and data structures, and load and store data in a file or database using Python packages. You will learn how to apply powerful packages in Python to process raw data into pure and helpful data using examples. You will also get a brief overview of machine learning algorithms, that is, applying data analysis results to make decisions or building helpful products such as recommendations and predictions using Scikit-learn. After this, you will move on to a data analytics specialization - predictive analytics. Social media and IOT have resulted in an avalanche of data. You will get started with predictive analytics using Python. You will see how to create predictive models from data. You will get balanced information on statistical and mathematical concepts, and implement them in Python using libraries such as Pandas, scikit-learn, and NumPy. You'll learn more about the best predictive modeling algorithms such as Linear Regression, Decision Tree, and Logistic Regression. Finally, you will master best practices in predictive modeling. After this, you will get all the practical guidance you need to help you on the journey to effective data visualization. Starting with a chapter on data frameworks, which explains the transformation of data into information and eventually knowledge, this path subsequently cover the complete visualization process using the most popular Python libraries with working examples This Learning Path combines some of the best that Packt has to offer in one complete, curated package. It includes content from the following Packt products: Getting Started with Python Data Analysis, Phuong Vo.T.H &Martin Czygan Learning Predictive Analytics with Python, Ashish Kumar Mastering Python Data Visualization, Kirthi Raman Style and approach The course acts as a step-by-step guide to get you familiar with data analysis and the libraries supported by Python with the help of real-world examples and datasets. It also helps you gain practical insights into predictive modeling by implementing predictive-analytics algorithms on public datasets with Python. The course offers a wealth of practical guidance to help you on this journey to data visualization

Learning Apache Spark 2

2017-03-28 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Muhammad Asif Abbasi

AI/ML Analytics Big Data Scala Spark SQL Data Streaming apache-spark data data-engineering

Dive into the world of Big Data with "Learning Apache Spark 2". This book introduces you to the powerful Apache Spark framework, tailored for real-time data analytics and machine learning. Through practical examples and real-world use-cases, you'll gain hands-on experience in leveraging Spark's capabilities for your data processing needs. What this Book will help me do Master the fundamentals of Apache Spark 2 and its new features. Effectively use Spark SQL, MLlib, RDDs, GraphX, and Spark Streaming to tackle real-world challenges. Gain skills in data processing, transformation, and analysis with Spark. Deploy and operate your Spark applications in clustered environments. Develop your own recommendation engines and predictive analytics models with Spark. Author(s) None Abbasi brings a wealth of expertise in Big Data technologies with a keen focus on simplifying complex concepts for learners. With substantial experience working in data processing frameworks, their approach to teaching creates an engaging and practical learning experience. With "Learning Apache Spark 2", None empowers readers to confidently tackle challenges in Big Data processing and analytics. Who is it for? This book is ideal for aspiring Big Data professionals seeking an accessible introduction to Apache Spark. Beginners in Spark will find step-by-step guidance, while those familiar with earlier versions will appreciate the insights into Spark 2's new features. Familiarity with Big Data concepts and Scala programming is recommended for optimal understanding.

Effective Business Intelligence with QuickSight

2017-03-10 · O'Reilly Business Intelligence Books O'Reilly Amazon

book

by Rajesh Nadipalli

Analytics AWS Amazon RDS BI Big Data Cloud Computing DataViz QuickSight S3 Cyber Security amazon-quicksight analytics-platforms +2 more

Effective Business Intelligence with QuickSight introduces you to Amazon QuickSight, a modern BI tool that enables interactive visualizations powered by the cloud. With comprehensive tutorials, you'll master how to load, prepare, and visualize your data for actionable insights. This book provides real-world examples to showcase how QuickSight integrates into the AWS ecosystem. What this Book will help me do Understand how to effectively use Amazon QuickSight for business intelligence. Learn how to connect QuickSight to data sources like S3, RDS, and more. Create interactive dashboards and visualizations with QuickSight tools. Gain expertise in managing users, permissions, and data security in QuickSight. Execute a real-world big data project using AWS Data Lakes and QuickSight. Author(s) None Nadipalli is a seasoned data architect with extensive experience in cloud computing and business intelligence. With expertise in the AWS ecosystem, she has worked on numerous large-scale data analytics projects. Her writing focuses on providing practical knowledge through easy-to-follow examples and actionable insights. Who is it for? This book is ideal for business intelligence architects, developers, and IT executives seeking to leverage Amazon QuickSight. It is suited for readers with foundational knowledge of AWS who want to enhance their capabilities in BI and data visualization. If your goal is to modernize your business intelligence systems and explore advanced analytics, this book is perfect for you.

Beginning Power BI: A Practical Guide to Self-Service Data Analytics with Excel 2016 and Power BI Desktop, Second Edition

2017-02-28 · O'Reilly Data Science Books O'Reilly Amazon

book

by Dan Clark

Analytics BI Microsoft Power BI business-intelligence data data-science microsoft-power-platform power-bi

Analyze your company's data quickly and easily using Microsoft's latest tools. You will learn to build scalable and robust data models to work from, clean and combine different data sources effectively, and create compelling visualizations and share them with your colleagues. Author Dan Clark takes you through each topic using step-by-step activities and plenty of screen shots to help familiarize you with the tools. This second edition includes new material on advanced uses of Power Query, along with the latest user guidance on the evolving Power BI platform. Beginning Power BI is your hands-on guide to quick, reliable, and valuable data insight. What You'll Learn Simplify data discovery, association, and cleansing Build solid analytical data models Create robust interactive data presentations Combine analytical and geographic data in map-based visualizations Publish and share dashboards and reports Who This Book Is For Business analysts, database administrators, developers, and other professionals looking to better understand and communicate with data

Big Data Visualization

2017-02-28 · O'Reilly Data Visualization Books O'Reilly Amazon

book

by James D. Miller , Dalong Chen

Analytics Big Data Data Quality DataViz Hadoop IBM Python Splunk Tableau data data-science data-science-tasks +1 more

Dive into 'Big Data Visualization' and uncover how to tackle the challenges of visualizing vast quantities of complex data. With a focus on scalable and dynamic techniques, this guide explores the nuances of effective data analysis. You'll master tools and approaches to display, interpret, and communicate data in impactful ways. What this Book will help me do Understand the fundamentals of big data visualization, including unique challenges and solutions. Explore practical techniques for using D3 and Python to visualize and detect anomalies in big data. Learn to leverage dashboards like Tableau to present data insights effectively. Address and improve data quality issues to enhance analysis accuracy. Gain hands-on experience with real-world use cases for tools such as Hadoop and Splunk. Author(s) James D. Miller is an IBM-certified expert specializing in data analytics and visualization. With years of experience handling massive datasets and extracting actionable insights, he is dedicated to sharing his expertise. His practical approach is evident in how he combines tool mastery with a clear understanding of data complexities. Who is it for? This book is designed for data analysts, data scientists, and others involved in interpreting and presenting big datasets. Whether you are a beginner looking to understand big data visualization or an experienced professional seeking advanced tools and techniques, this guide suits your needs perfectly. A foundational knowledge in programming languages like R and big data platforms such as Hadoop is recommended to maximize your learning.

Mastering Elastic Stack

2017-02-28 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Ravi Kumar Gupta , Yuvraj Gupta

Analytics ELK Kibana Logstash Cyber Security data data-engineering elastic-stack-elk-stack elastic stack (elk stack) elasticsearch search

Mastering Elastic Stack is your complete guide to advancing your data analytics expertise using the ELK Stack. With detailed coverage of Elasticsearch, Logstash, Kibana, Beats, and X-Pack, this book equips you with the skills to process and analyze any type of data efficiently. Through practical examples and real-world scenarios, you'll gain the ability to build end-to-end pipelines and create insightful dashboards. What this Book will help me do Build and manage log pipelines using Logstash, Beats, and Elasticsearch for real-time analytics. Develop advanced Kibana dashboards to visualize and interpret complex datasets. Efficiently utilize X-Pack features for alerting, monitoring, and security in the Elastic Stack. Master plugin customization and deployment for a tailored Elastic Stack environment. Apply Elastic Stack solutions to real-world cases for centralized logging and actionable insights. Author(s) The authors, None Kumar Gupta and None Gupta, are experienced technologists who have spent years working at the forefront of data processing and analytics. They are well-versed in Elasticsearch, Logstash, Kibana, and the Elastic ecosystem, having worked extensively in enterprise environments where these tools have transformed operations. Their passion for teaching and thorough understanding of the tools culminate in this comprehensive resource. Who is it for? The ideal reader is a developer already familiar with Elasticsearch, Logstash, and Kibana who wants to deepen their understanding of the stack. If you're involved in creating scalable data pipelines, analyzing complex datasets, or looking to implement centralized logging solutions in your work, this book is an excellent resource. It bridges the gap from intermediate to expert knowledge, allowing you to use the Elastic Stack effectively in various scenarios. Whether you are transitioning from a beginner or enhancing your skill set, this book meets your needs.

Scala: Guide for Data Science Professionals

2017-02-24 · O'Reilly Data Science Books O'Reilly Amazon

book

by Patrick R. Nicolas , Pascal Bugnion , Arun Manivannan

AI/ML Analytics Data Engineering Data Science HDFS Hive JavaScript NoSQL Scala Spark SQL Data Streaming +3 more

Scala will be a valuable tool to have on hand during your data science journey for everything from data cleaning to cutting-edge machine learning About This Book Build data science and data engineering solutions with ease An in-depth look at each stage of the data analysis process — from reading and collecting data to distributed analytics Explore a broad variety of data processing, machine learning, and genetic algorithms through diagrams, mathematical formulations, and source code Who This Book Is For This learning path is perfect for those who are comfortable with Scala programming and now want to enter the field of data science. Some knowledge of statistics is expected. What You Will Learn Transfer and filter tabular data to extract features for machine learning Read, clean, transform, and write data to both SQL and NoSQL databases Create Scala web applications that couple with JavaScript libraries such as D3 to create compelling interactive visualizations Load data from HDFS and HIVE with ease Run streaming and graph analytics in Spark for exploratory analysis Bundle and scale up Spark jobs by deploying them into a variety of cluster managers Build dynamic workflows for scientific computing Leverage open source libraries to extract patterns from time series Master probabilistic models for sequential data In Detail Scala is especially good for analyzing large sets of data as the scale of the task doesn’t have any significant impact on performance. Scala’s powerful functional libraries can interact with databases and build scalable frameworks — resulting in the creation of robust data pipelines. The first module introduces you to Scala libraries to ingest, store, manipulate, process, and visualize data. Using real world examples, you will learn how to design scalable architecture to process and model data — starting from simple concurrency constructs and progressing to actor systems and Apache Spark. After this, you will also learn how to build interactive visualizations with web frameworks. Once you have become familiar with all the tasks involved in data science, you will explore data analytics with Scala in the second module. You’ll see how Scala can be used to make sense of data through easy to follow recipes. You will learn about Bokeh bindings for exploratory data analysis and quintessential machine learning with algorithms with Spark ML library. You’ll get a sufficient understanding of Spark streaming, machine learning for streaming data, and Spark graphX. Armed with a firm understanding of data analysis, you will be ready to explore the most cutting-edge aspect of data science — machine learning. The final module teaches you the A to Z of machine learning with Scala. You’ll explore Scala for dependency injections and implicits, which are used to write machine learning algorithms. You’ll also explore machine learning topics such as clustering, dimentionality reduction, Naïve Bayes, Regression models, SVMs, neural networks, and more. This learning path combines some of the best that Packt has to offer into one complete, curated package. It includes content from the following Packt products: Scala for Data Science, Pascal Bugnion Scala Data Analysis Cookbook, Arun Manivannan Scala for Machine Learning, Patrick R. Nicolas Style and approach A complete package with all the information necessary to start building useful data engineering and data science solutions straight away. It contains a diverse set of recipes that cover the full spectrum of interesting data analysis tasks and will help you revolutionize your data analysis skills using Scala. Downloading the example code for this book. You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the code file.

Global Business By The Big Analytics

2017-02-17 · The Future of Data Podcast | conversation with leaders, influencers, and change makers in the World of Data & Analytics Listen

podcast_episode

by Vishal Kumar (AnalyticsWeek)

AI/ML Analytics Big Data

Wednesday at 9 AM Pacific

Update: Talk is available @ https://www.voiceamerica.com/episode/97300/the-quantum-disruption-in-global-business-driven-by-the-big-analytics

The Quantum disruption in Global Business driven by The Big Analytics Listen to Vishal Kumar, An Author, Innovator, and a Mentor in discussion on one of the most important and relevant subjects of the modern times: The Big Analytics, and how it is changing the landscape of Global Business

Wednesday at 9 AM Pacific Time on VoiceAmerica Business Channel

Featured Guest

Vishal Kumar

Vishal Kumar is CEO & President of AnalyticsWeek. He is a leading advocate for data-driven decision making. He is rated as top 100 global influencers to follow in data analytics by leading research organizations. He has published two books on the topics of analytics. Currently, his work involves using Artificial Intelligence to prepare the workforce for the future. Vishal has been a keynote speaker at various international conferences. He sits as advisor to various analytics startups.

Originally Posted @ VoiceAmerica

About #Podcast:

FutureOfData podcast is a conversation starter to bring leaders, influencers and lead practitioners to come on show and discuss their journey in creating the data driven future.

Wanna Join? If you or any you know wants to join in, Register your interest @ http://play.analyticsweek.com/guest/

Want to sponsor? Email us @ [email protected]

Keywords: FutureOfData Data Analytics Leadership Podcast Big Data Strategy

Learning Kibana 5.0

2017-02-15 · O'Reilly Data Science Books O'Reilly Amazon

book

by Bahaaldine Azarmi

Analytics Dashboard DataViz ELK Kibana Logstash data data-science data-science-tasks data-visualization

Learning Kibana 5.0 is your gateway to mastering the art of data visualization using the powerful features of the Kibana platform. This book guides you through the process of creating stunning interactive dashboards and making data-driven insights accessible with real-time visualizations. Whether you're new to the Elastic stack or seeking to refine your expertise, this book equips you to harness Kibana's full potential. What this Book will help me do Build robust, real-time dashboards in Kibana to visualize complex datasets efficiently. Leverage Timelion to perform time-series data analysis and create metrics-based dashboards. Explore advanced analytics using the Graph plugin to uncover relationships and correlations in data. Learn how to create and deploy custom plugins to tailor Kibana to specific project needs. Understand how to use the Elastic stack to monitor, analyze, and optimize various types of data flows. Author(s) Bahaaldine Azarmi is a seasoned expert in the Elastic stack, known for his dedication to making complex technical topics approachable and practical. With years of experience in data analytics and software development, Bahaaldine shares not only his technical expertise but also his passion for helping professionals achieve their goals through clear, actionable guidance. His writing emphasizes hands-on learning and practical application. Who is it for? This book is perfect for developers, data visualization engineers, and data scientists who aim to hone their skills in data visualization and interactive dashboard development. It assumes a basic understanding of Elasticsearch and Logstash to maximize its practicality. If you aim to advance your career by learning how to optimize data architecture and solve real-world problems using the Elastic stack, this book is ideal for you.

Tabular Modeling with SQL Server 2016 Analysis Services Cookbook

2017-01-30 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Derek Wilson

Analytics BI DAX SQL SSAS data data-engineering microsoft-sql-server relational-databases

With "Tabular Modeling with SQL Server 2016 Analysis Services Cookbook," you'll discover how to harness the full potential of the latest Tabular models in SQL Server Analysis Services (SSAS). This practical guide equips data professionals with the tools, techniques, and knowledge to optimize data analytics and deliver fast, reliable, and impactful business insights. What this Book will help me do Understand the fundamentals of Tabular modeling and its advantages over traditional methods. Use SQL Server 2016 SSAS features to build and deploy Tabular models tailored to business needs. Master DAX for creating powerful calculated fields and optimized measures. Administer and secure your models effectively, ensuring robust BI solutions. Optimize performance and explore advanced features in Tabular solutions for maximum efficiency. Author(s) None Wilson is an experienced SQL BI professional with a strong background in database modeling and analytics. With years of hands-on experience in developing BI solutions, Wilson takes a practical and straightforward teaching approach. Their guidance in this book makes the complex topics of Tabular modeling and SSAS accessible to both seasoned professionals and newcomers to the field. Who is it for? This book is tailored for SQL BI professionals, database architects, and data analysts aiming to leverage Tabular models in SQL Server Analysis Services. It caters to those familiar with database management and basic BI concepts who are eager to improve their analysis solutions. It's a valuable resource if you aim to gain expertise in using tabular modeling for business intelligence.

Dask with Matthew Rocklin - Episode 2

2017-01-22 · Data Engineering Podcast Listen

podcast_episode

by Matthew Rocklin , Tobias Macey

Airflow Analytics Big Data Data Engineering GitHub Hadoop Kubernetes Luigi NumPy Pandas Python Spark

Summary

There is a vast constellation of tools and platforms for processing and analyzing your data. In this episode Matthew Rocklin talks about how Dask fills the gap between a task oriented workflow tool and an in memory processing framework, and how it brings the power of Python to bear on the problem of big data.

Preamble

Hello and welcome to the Data Engineering Podcast, the show about modern data infrastructure Go to dataengineeringpodcast.com to subscribe to the show, sign up for the newsletter, read the show notes, and get in touch. You can help support the show by checking out the Patreon page which is linked from the site. To help other people find the show you can leave a review on iTunes, or Google Play Music, and tell your friends and co-workers Your host is Tobias Macey and today I’m interviewing Matthew Rocklin about Dask and the Blaze ecosystem.

Interview with Matthew Rocklin

Introduction How did you get involved in the area of data engineering? Dask began its life as part of the Blaze project. Can you start by describing what Dask is and how it originated? There are a vast number of tools in the field of data analytics. What are some of the specific use cases that Dask was built for that weren’t able to be solved by the existing options? One of the compelling features of Dask is the fact that it is a Python library that allows for distributed computation at a scale that has largely been the exclusive domain of tools in the Hadoop ecosystem. Why do you think that the JVM has been the reigning platform in the data analytics space for so long? Do you consider Dask, along with the larger Blaze ecosystem, to be a competitor to the Hadoop ecosystem, either now or in the future? Are you seeing many Hadoop or Spark solutions being migrated to Dask? If so, what are the common reasons? There is a strong focus for using Dask as a tool for interactive exploration of data. How does it compare to something like Apache Drill? For anyone looking to integrate Dask into an existing code base that is already using NumPy or Pandas, what does that process look like? How do the task graph capabilities compare to something like Airflow or Luigi? Looking through the documentation for the graph specification in Dask, it appears that there is the potential to introduce cycles or other bugs into a large or complex task chain. Is there any built-in tooling to check for that before submitting the graph for execution? What are some of the most interesting or unexpected projects that you have seen Dask used for? What do you perceive as being the most relevant aspects of Dask for data engineering/data infrastructure practitioners, as compared to the end users of the systems that they support? What are some of the most significant problems that you have been faced with, and which still need to be overcome in the Dask project? I know that the work on Dask is largely performed under the umbrella of PyData and sponsored by Continuum Analytics. What are your thoughts on the financial landscape for open source data analytics and distributed computation frameworks as compared to the broader world of open source projects?

Keep in touch

@mrocklin on Twitter mrocklin on GitHub

Links

http://matthewrocklin.com/blog/work/2016/09/22/cluster-deployments?utm_source=rss&utm_medium=rss https://opendatascience.com/blog/dask-for-institutions/?utm_source=rss&utm_medium=rss Continuum Analytics 2sigma X-Array Tornado

Website Podcast Interview

Airflow Luigi Mesos Kubernetes Spark Dryad Yarn Read The Docs XData

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA Support Data Engineering Podcast

Apache Spark for Data Science Cookbook

2016-12-22 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Padma Priya Chitturi

AI/ML Analytics Big Data Data Science NLP NumPy Pandas SciPy Spark apache-spark data data-engineering

In "Apache Spark for Data Science Cookbook," you'll delve into solving real-world analytical challenges using the robust Apache Spark framework. This book features hands-on recipes that cover data analysis, distributed machine learning, and real-time data processing. You'll gain practical skills to process, visualize, and extract insights from large datasets efficiently. What this Book will help me do Master using Apache Spark for processing and analyzing large-scale datasets effectively. Harness Spark's MLLib for implementing machine learning algorithms like classification and clustering. Utilize libraries such as NumPy, SciPy, and Pandas in conjunction with Spark for numerical computations. Apply techniques like Natural Language Processing and text mining using Spark-integrated tools. Perform end-to-end data science workflows, including data exploration, modeling, and visualization. Author(s) Nagamallikarjuna Inelu and None Chitturi bring their extensive experience working with data science and distributed computing frameworks like Apache Spark. Nagamallikarjuna specializes in applying machine learning algorithms to big data problems, while None has contributed to various big data system implementations. Together, they focus on providing practitioners with practical and efficient solutions. Who is it for? This book is primarily intended for novice and intermediate data scientists and analysts who are curious about using Apache Spark to tackle data science problems. Readers are expected to have some familiarity with basic data science tasks. If you want to learn practical applications of Spark in data analysis and enhance your big data analytics skills, this resource is for you.

talk-data.com

Data Analytics

Activity Trend

Top Events

Top Speakers

Understanding Data Analytics in Information Security with @JayJarome, @BitSight

FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to discuss their journey to create the data-driven future.

Data Warehousing with Greenplum

Mastering Apache Spark 2.x - Second Edition

#FutureOfData with Robin Thottungal, Chief Data Scientist at EPA

FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to discuss their journey to create the data-driven future.

FutureOfData #DataAnalytics #Leadership #Podcast #BigData #Strategy

Practical Predictive Analytics

Apache Spark 2.x Cookbook

On a Mission in Medicine: Using Data for Personalized Treatment

Nathaniel Lin, Chief Data Scientist, NFPA

FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to discuss their journey to create the data-driven future.

FutureOfData #DataAnalytics #Leadership #Podcast #BigData #Strategy

Python: Data Analytics and Visualization

Learning Apache Spark 2

Effective Business Intelligence with QuickSight

Beginning Power BI: A Practical Guide to Self-Service Data Analytics with Excel 2016 and Power BI Desktop, Second Edition

Big Data Visualization

Mastering Elastic Stack

Scala: Guide for Data Science Professionals

Global Business By The Big Analytics

FutureOfData podcast is a conversation starter to bring leaders, influencers and lead practitioners to come on show and discuss their journey in creating the data driven future.

Learning Kibana 5.0

Tabular Modeling with SQL Server 2016 Analysis Services Cookbook

Dask with Matthew Rocklin - Episode 2

Apache Spark for Data Science Cookbook