talk-data.com talk-data.com

Topic

Big Data

data_processing analytics large_datasets

1217

tagged

Activity Trend

28 peak/qtr
2020-Q1 2026-Q1

Activities

1217 activities · Newest first

In this podcast, @RobertoMaranca shared his thoughts on running a large data-driven organization. He shared his thoughts on the future of data organizations through compliance and privacy. He shared how businesses could survive policy like GDPR and prepare themselves for better data transparency and visibility. This podcast is great for leadership, leading a transnational corporation.

TIMELINE: 0:28 Roberto's journey. 8:18 Best practices as a data steward. 16:58 Data leadership and GDPR. 22:18 Impact of GDPR. 25:34 GDPR creating better knowledge archive. 29:27 GDPR and IOT infrastructure. 35:08 Shadow IT phenomenon and consumer privacy. 44:54 Suggestions for enterprises to deal with privacy disruption. 50:52 Data debt. 53:10 Opportunities in new privacy frameworks. 57:52 Roberto's success mantra. 1:02:38 Roberto's favorite reads.

Roberto's Recommended Read: Team of Teams: New Rules of Engagement for a Complex World by General Stanley McChrystal and Tantum Collins https://amzn.to/2kUxW1K Do Androids Dream of Electric Sheep?: The inspiration for the films Blade Runner and Blade Runner 2049 by Philip K. Dick https://amzn.to/2xOOpxZ A Scanner Darkly by Philip K. Dick https://amzn.to/2sAsUMs Other Philip K. Dick Books @ https://amzn.to/2JBwwY0

Podcast Link: https://futureofdata.org/data-leadership-through-privacy-gdpr-by-robertomaranca/

Roberto's BIO: With almost 25 years of experience in the world of IT and Data, Roberto has spent most its working life with General Electric in their Capital Division, where since 2014, as Chief Data Officer for their International Unit, he has been overlooking the implementation of the Data Governance and Quality frameworks, spanning from supporting risk model validation to enabling divestitures and leading their more recent Basel III data initiatives. For the last year, he has held the role of Chief Data Officer at Lloyds Banking Group, shaping and implementing a new Data Strategy and dividing his time between BCBS 239 and GDPR programs.

Roberto has got a Master’s Degree in Aeronautical Engineering from “Federico II” Naples University.

About #Podcast:

FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to discuss their journey to create the data-driven future.

Want to sponsor? Email us @ [email protected]

Keywords:

FutureOfData #DataAnalytics #Leadership #Podcast #BigData #Strategy

In this podcast @BesaBauta from MeryFirst talks about the compliance and privacy challenges faced in the hyper regulated industry. With her experience in health informatics, Besa shared some best practices and challenges faced by data science groups in health informatics and other similar groups in regulated space. This podcast is great for anyone looking to learn about data science compliance and privacy challenges.

TIMELINE: 0:28 Besa's journey. 6:05 Besa's current role. 9:30 Privacy and compliance in health informatics. 14:44 Are the current privacy regulations sufficient? 16:15 Data management in different organizations. 22:37 The negatives for compliance policies on data. 26:28 Hiring a good chief data officer. 30:20 Vetting a company as a CDO. 32:38 Challenges for a startup in the healthcare sector. 36:25 Common challenges for data officers in the healthcare sector. 38:29 Millenials and technology. 40:05 Leadership dealing with compliance policies. 46:26 Requirements for working in health informatics. 49:18 Ingredients of a perfect hire. 50:40 Besa's success mantra. 52:35 How does Besa stay updated? 54:37 Besa's favorite read. 57:04 Key takeaway. Besa's Recommended Read: The Art Of War by Sun Tzu and Lionel Giles https://amzn.to/2Jx2PYm

Podcast Link: https://futureofdata.org/compliance-and-privacy-in-health-informatics-by-besabauta/

Besa's BIO: Dr. Besa Bauta is the Chief Data Officer and Chief Compliance Officer for MercyFirst, a social service organization providing health and mental health services to children and adolescents in New York City. She oversees the Research, Evaluation, Analytics, and Compliance for Health (REACH) division, including data governance and security measures, analytics, risk mitigation, and policy initiatives. She is also an Adjunct Assistant Professor at NYU and previously worked as a Research Director for a USAID project in Afghanistan and as the Senior Director of Research and Evaluation at the Center for Evidence-Based Implementation and Research (CEBIR). She holds a Ph.D. in implementation science with a focus on health services, an MPH in Global Health, and an MSW. Her research has focused on health systems, mental health, and technology integration to improve population-level outcomes.

About #Podcast:

FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to discuss their journey to create the data-driven future.

Want to sponsor? Email us @ [email protected]

Keywords:

FutureOfData #DataAnalytics #Leadership #Podcast #BigData #Strategy

Mastering Kibana 6.x

Mastering Kibana 6.x is your guide to leveraging Kibana for creating impactful data visualizations and insightful dashboards. From setting up basic visualizations to exploring advanced analytics and machine learning integrations, this book equips you with the necessary skills to dive deep into your data and gain actionable insights at scale. You'll also learn to effectively manage and monitor data with powerful tools such as X-Pack and Beats. What this Book will help me do Build sophisticated dashboards to visualize elastic stack data effectively. Understand and utilize Timelion expressions for analyzing time series data. Incorporate X-Pack capabilities to enhance security and monitoring in Kibana. Extract, analyze, and visualize data from Elasticsearch for advanced analytics. Set up monitoring and alerting using Beats components for reliable data operations. Author(s) With extensive experience in big data technologies, the author brings a practical approach to teaching advanced Kibana topics. Having worked on real-world data analytics projects, their aim is to make complex concepts accessible while showing how to tackle analytics challenges using Kibana. Who is it for? This book is ideal for data engineers, DevOps professionals, and data scientists who want to optimize large-scale data visualizations. If you're looking to manage Elasticsearch data through insightful dashboards and visual analytics, or enhance your data operations with features like machine learning, then this book is perfect for you. A basic understanding of the Elastic Stack is helpful, though not required.

In this podcast @DanDeGrazia from @IBM spoke with @Vishaltx from @AnalyticsWeek to discuss the mingling of chief data scientist with open sources. He sheds light into some of the big opportunities in open source and how businesses could work together to achieve progress in data science. Dan also shared the importance of smooth communication for success as a data scientist.

TIMELINE: 0:29 Dan's journey. 9:40 Dan's role in IBM. 11:26 Tips on staying consistent while creating a database. 16:23 Chief data scientist and open-source put together. 20:28 The state of open source when it comes to data. 23:50 Evaluating the market to understand business requirements. 29:19 Future of data and open-source market. 33:23 Exciting opportunities in data. 37:06 Data scientist's role in integrating business and data. 49:41 Ingredients of a successful data scientist. 53:04 Data science and trust issues. 59:35 Human element behind data. 1:01:20 Dan's success mantra. 1:06:52 Key takeaways.

Dan's Recommended Read: The Five Temptations of a CEO, Anniversary Edition: A Leadership Fable by Patrick Lencioni https://amzn.to/2Jcm5do What Every BODY is Saying: An Ex-FBI Agent8217;s Guide to Speed-Reading People by Joe Navarro, Marvin Karlins https://amzn.to/2J1RXxO

Podcast Link: https://futureofdata.org/where-chief-data-scientist-open-source-meets-dandegrazia-futureofdata-podcast/

Dan's BIO: Dan has almost 30 years of experience working with large data sets. Starting with the unusual work of analyzing potential jury pools in the 1980s, Dan also did some of the first PC based voter registration analytics in the Chicago area, including putting the first complete list of registered voters on a PC (as hard as that is to imagine today a 50-megabyte hard drive on DOS systems was staggering). Interested in almost anything new and technical, he worked at The Chicago Board of Trade. He taught himself BASIC to write algorithms while working as an Arbitrager in financial futures. After the military, Dan moved to San Francisco. He worked with several small companies and startups designing and implementing some of the first PC-based fax systems (who cares now!), enterprise accounting software, and early middleware connections using the early 3GL/4GL languages. Always perusing the technical edge cases, Dan worked for InfoBright, a Column store Database startup in the US and EMEA, at Lingotek, an In-Q-Tel funded company working in large data set translations and big data analytics companies like Datameer and his current position as a Chief Data Scientist for Open Source in the IBM Channels organization. Dan's current just for fun Project is working to create an app that will record and analyze bird songs and provide the user with information on the bird and the specifics of the current song.

About #Podcast:

FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to discuss their journey to create the data-driven future.

Want to sponsor? Email us @ [email protected]

Keywords:

FutureOfData #DataAnalytics #Leadership #Podcast #BigData #Strategy

Principles and Practice of Big Data, 2nd Edition

Principles and Practice of Big Data: Preparing, Sharing, and Analyzing Complex Information, Second Edition updates and expands on the first edition, bringing a set of techniques and algorithms that are tailored to Big Data projects. The book stresses the point that most data analyses conducted on large, complex data sets can be achieved without the use of specialized suites of software (e.g., Hadoop), and without expensive hardware (e.g., supercomputers). The core of every algorithm described in the book can be implemented in a few lines of code using just about any popular programming language (Python snippets are provided). Through the use of new multiple examples, this edition demonstrates that if we understand our data, and if we know how to ask the right questions, we can learn a great deal from large and complex data collections. The book will assist students and professionals from all scientific backgrounds who are interested in stepping outside the traditional boundaries of their chosen academic disciplines. Presents new methodologies that are widely applicable to just about any project involving large and complex datasets Offers readers informative new case studies across a range scientific and engineering disciplines Provides insights into semantics, identification, de-identification, vulnerabilities and regulatory/legal issues Utilizes a combination of pseudocode and very short snippets of Python code to show readers how they may develop their own projects without downloading or learning new software

Streaming Systems

Streaming data is a big deal in big data these days. As more and more businesses seek to tame the massive unbounded data sets that pervade our world, streaming systems have finally reached a level of maturity sufficient for mainstream adoption. With this practical guide, data engineers, data scientists, and developers will learn how to work with streaming data in a conceptual and platform-agnostic way. Expanded from Tyler Akidau’s popular blog posts "Streaming 101" and "Streaming 102", this book takes you from an introductory level to a nuanced understanding of the what, where, when, and how of processing real-time data streams. You’ll also dive deep into watermarks and exactly-once processing with co-authors Slava Chernyak and Reuven Lax. You’ll explore: How streaming and batch data processing patterns compare The core principles and concepts behind robust out-of-order data processing How watermarks track progress and completeness in infinite datasets How exactly-once data processing techniques ensure correctness How the concepts of streams and tables form the foundations of both batch and streaming data processing The practical motivations behind a powerful persistent state mechanism, driven by a real-world example How time-varying relations provide a link between stream processing and the world of SQL and relational algebra

In this podcast, @BillFranksGA talks about the ingredients of a successful analytics ecosystem. He shared his analytics journey, his perspective on how other businesses are engaging in data analytics practice. He also sheds some light on best practices that businesses could adopt to execute a successful data strategy.

Timeline: 0:28 Bill's journey. 4:00 Bill's journey as an analyst. 9:29 Maturity of the analytics market. 11:56 Business, IT, and Data. 16:18 Introducing centralized analytics practice in an enterprise. 19:50 Tips and strategies for chief data officers to deliver the goods. 26:07 What do businesses don't get about data analytics? 29:40 Is the future aligned with data or analytics. 34:25 Importance for leadership to understand analytics. 36:35 The role of analytics professionals in the age of AI. 41:42 Upgrading analytics models. 47:50 How much should a business experiment on AI. 55:25 Evaluating blockchain. 59:50 Bill's success mantra. 1:05:25 Bill's favorite reads. 1:07:17 Key takeaway.

Podcast Link: https://futureofdata.org/billfranksga-on-the-ingredients-of-successful-analytics-ecosystem-futureofdata-podcast/

Bill's BIO: Bill Franks is Chief Analytics Officer for The International Institute For Analytics (IIA), where he provides perspective on trends in the analytics and big data space and helps clients understand how IIA can support their efforts to improve analytic performance. He also serves on the advisory boards of multiple university and professional analytic programs. He has held a range of executive positions in the analytics space in the past, including several years as Chief Analytics Officer for Teradata (NYSE: TDC).

Bill is the author of the book Taming The Big Data Tidal Wave (John Wiley & Sons). In the book, he applies his two decades of experience working with clients on large-scale analytics initiatives to outline what it takes to succeed in today’s world of big data and analytics. The book made Tom Peter’s list of 2014 “Must Read” books and also the Top 10 Most Influential Translated Technology Books list from CSDN in China.

His focus has always been to help translate complex analytics into terms that business users can understand and to then help an organization implement the results effectively within their processes. His work has spanned clients in a variety of industries for companies ranging in size from Fortune 100 companies to small non-profit organizations.

He earned a Bachelor’s degree in Applied Statistics from Virginia Tech and a Master’s degree in Applied Statistics from North Carolina State University.

About #Podcast:

FutureOfData podcast is a conversation starter to bring leaders, influencers and lead practitioners to come on show and discuss their journey in creating the data driven future.

Want to sponsor? Email us @ [email protected]

Keywords:

FutureOfData #DataAnalytics #Leadership #Podcast #BigData #Strategy

Apache Spark Deep Learning Cookbook

Embark on a journey to master distributed deep learning with the "Apache Spark Deep Learning Cookbook". Designed specifically for leveraging the capabilities of Apache Spark, TensorFlow, and Keras, this book offers over 80 problem-solving recipes to efficiently train and deploy state-of-the-art neural networks, addressing real-world AI challenges. What this Book will help me do Set up and configure a working Apache Spark environment optimized for deep learning tasks. Implement distributed training practices for deep learning models using TensorFlow and Keras. Develop and test neural networks such as CNNs and RNNs targeting specific big data problems. Apply Spark's built-in libraries and integrations for enhanced NLP and computer vision applications. Effectively manage and preprocess large datasets using Spark DataFrames for machine learning tasks. Author(s) Authors Ahmed Sherif and None Ravindra bring years of experience in deep learning, Apache Spark use cases, and hands-on practical training. Their collective expertise has contributed to designing this cookbook approach, focusing on clarity and usability for readers tackling challenging machine learning scenarios. Who is it for? This book is ideal for IT professionals, data scientists, and software developers with foundational understanding of machine learning concepts and Apache Spark framework capabilities. If you aim to scale deep learning and integrate efficient computing with Spark's power, this guide is for you. Familiarity with Python will help maximize the book's potential.

In this podcast @AndyPalmer from @Tamr sat with @Vishaltx from @AnalyticsWeek to talk about the emergence/need/market for Data Ops, a specialized capability emerging from merging data engineering and dev ops ecosystem due to increased convoluted data silos and complicated processes. Andy shared his journey on what some of the businesses and their leaders are doing wrong and how businesses need to rethink their data silos to future proof themselves. This is a good podcast for any data leader thinking about cracking the code on getting high-quality insights from data.

Timelines: 0:28 Andy's journey. 4:56 What's Tamr? 6:38 What's Andy's role in Tamr. 8:16 What's data ops? 13:07 Right time for business to incorporate data ops. 15:56 Data exhaust vs. data ops. 21:05 Tips for executives in dealing with data. 23:15 Suggestions for businesses working with data. 25:48 Creating buy-in for experimenting with new technologies. 28:47 Using data ops for the acquisition of new companies. 31:58 Data ops vs. dev ops. 36:40 Big opportunities in data science. 39:35 AI and data ops. 44:28 Parameters for a successful start-up. 47:49 What still surprises Andy? 50:19 Andy's success mantra. 52:48 Andy's favorite reads. 54:25 Final remarks.

Andy's Recommended Read: Enlightenment Now: The Case for Reason, Science, Humanism, and Progress by Steven Pinker https://amzn.to/2Lc6WqK The Three-Body Problem by Cixin Liu and Ken Liu https://amzn.to/2rQyPvp

Andy's BIO: Andy Palmer is a serial entrepreneur who specializes in accelerating the growth of mission-driven startups. Andy has helped found and/or fund more than 50 innovative companies in technology, health care, and the life sciences. Andy’s unique blend of strategic perspective and disciplined tactical execution is suited to environments where uncertainty is the rule rather than the exception. Andy has a specific passion for projects at the intersection of computer science and the life sciences.

Most recently, Andy co-founded Tamr, a next-generation data curation company, and Koa Labs, a start-up club in the heart of Harvard Square, Cambridge, MA.

Specialties: Software, Sales & Marketing, Web Services, Service Oriented Architecture, Drug Discovery, Database, Data Warehouse, Analytics, Startup, Entrepreneurship, Informatics, Enterprise Software, OLTP, Science, Internet, eCommerce, Venture Capital, Bootstrapping, Founding Team, Venture Capital firm, Software companies, early-stage venture, corporate development, venture-backed, venture capital fund, world-class, stage venture capital

About #Podcast:

FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to discuss their journey to create the data-driven future.

Podcast link: https://futureofdata.org/emergence-of-dataops-age-andypalmer-futureofdata-podcast/

Wanna Join? If you or any you know wants to join in, Register your interest and email at [email protected]

Want to sponsor? Email us @ [email protected]

Keywords:

FutureOfData #DataAnalytics #Leadership #Podcast #BigData #Strategy

Data Management Solutions Using SAS Hash Table Operations

Hash tables can do a lot more than you might think! Data Management Solutions Using SAS Hash Table Operations: A Business Intelligence Case Study concentrates on solving your challenging data management and analysis problems via the power of the SAS hash object, whose environment and tools make it possible to create complete dynamic solutions. To this end, this book provides an in-depth overview of the hash table as an in-memory database with the CRUD (Create, Retrieve, Update, Delete) cycle rendered by the hash object tools. By using this concept and focusing on real-world problems exemplified by sports data sets and statistics, this book seeks to help you take advantage of the hash object productively, in particular, but not limited to, the following tasks: Using this book, you will be able to answer your toughest questions quickly and in the most efficient way possible! select proper hash tools to perform hash table operations use proper hash table operations to support specific data management tasks use the dynamic, run-time nature of hash object programming understand the algorithmic principles behind hash table data look-up, retrieval, and aggregation learn how to perform data aggregation, for which the hash object is exceptionally well suited manage the hash table memory footprint, especially when processing big data use hash object techniques for other data processing tasks, such as filtering, combining, splitting, sorting, and unduplicating.

Apache Hive Essentials - Second Edition

"Apache Hive Essentials" provides a focused guide to mastering the essential techniques of processing and analyzing big data with Apache Hive. What this Book will help me do Set up and configure a Hive environment for big data analysis. Compose effective queries using Hive's SQL-like language to extract insights. Optimize Hive performance to handle complex datasets efficiently. Implement data security and user-defined functions to extend capabilities. Integrate Hive with Hadoop tools for comprehensive data solutions. Author(s) Dayong Du, the author of "Apache Hive Essentials," has years of experience working with big data technologies and tools. With hands-on expertise in Hadoop and the entire ecosystem, he brings a practical and informed perspective to this complex field. His approach is to make these technologies accessible to developers and analysts of all levels. Who is it for? This book is perfect for data analysts, developers, or professionals familiar with SQL who are looking to start with Apache Hive for big data processing. It is suitable for those acquainted with Hadoop and its environment and want to expand their skills into efficient data querying and management. Readers should have an interest in how to leverage big data tools for real-world solutions.

PySpark Cookbook

Dive into the world of big data processing and analytics with the "PySpark Cookbook". This book provides over 60 hands-on recipes for implementing efficient data-intensive solutions using Apache Spark and Python. By mastering these recipes, you'll be equipped to tackle challenges in large-scale data processing, machine learning, and stream analytics. What this Book will help me do Set up and configure PySpark environments effectively, including working with Jupyter for enhanced interactivity. Understand and utilize DataFrames for data manipulation, analysis, and transformation tasks. Develop end-to-end machine learning solutions using the ML and MLlib modules in PySpark. Implement structured streaming and graph-processing solutions to analyze and visualize data streams and relationships. Deploy PySpark applications to the cloud infrastructure efficiently using best practices. Author(s) This book is co-authored by None Lee and None Drabas, who are experienced professionals in data processing and analytics leveraging Python and Apache Spark. With their deep technical expertise and a passion for teaching through practical examples, they aim to make the complex concepts of PySpark accessible to developers of varied experience levels. Who is it for? This book is ideal for Python developers who are keen to delve into the Apache Spark ecosystem. Whether you're just starting with big data or have some experience with Spark, this book provides practical recipes to enhance your skills. Readers looking to solve real-world data-intensive challenges using PySpark will find this resource invaluable.

In this podcast, Aaron Black from Inova Translational Medicine Institute talks about his journey in creating/leading data science practice in healthcare. He shared some of the best practices, opportunities, and challenges concerning team dynamics, process orientation, and leadership relationship building. This podcast is great for anyone from ADP who talked about big data in HR. He shared some of the best practices and opportunities that reside in HR data. Aaron also shared some tactical steps to help build a better data-driven team to execute data-driven strategies. This podcast is great for folks looking to explore the depth of HR data and opportunities in the health and medicine domain.

Timeline: 0:28 Aaron's journey. 8:16 Defining translational medicine. 11:47 Defining precision medecine. 12:47 Data sharing between pharma companies. 15:03 Defining biobanking. 18:50 Data and healthcare industry. 22:20 Best practices in creating a healthcare database. 25:46 Tackling data regulations. 30:17 Best practices in creating data literacy in employees. 33:27 The culture of data scientists in the healthcare space. 36:09 Challenges that a data science leader faces in the healthcare space. 39:25 Opportunities in health data space. 42:19 Ingredients of a good data science leader in the healthcare space. 44:38 Tips for data science leaders in the healthcare space. 47:00 Putting together a data team in the healthcare space. 50:22 Aaron's success tips. 52:49 Aaron's reading list. 55:25 Closing remark.

Podcast link: https://futureofdata.org/understanding-futureofdata-in-health-medicine-thedataguru-inovahealth-futureofdata/

Aaron's Book Recommendations: Smartcuts: The Breakthrough Power of Lateral Thinking by Shane Snow amzn.to/2rH9xzJ When: The Scientific Secrets of Perfect Timing by Daniel H. Pink amzn.to/2rElebc

Aaron's BIO: Aaron Black, Chief Data Officer at the Inova Translational Medicine Institute. Healthcare Information Technology Executive and Data Evangelist. A results-driven technical leader with a 20+ year record of successful project and program implementations; Visionary, collaborative, and able to devise creative solutions and culture to complex business challenges.

Key thought leader, international speaker, team builder, and data architect in building advanced and one-of-a-kind technical and data infrastructure to support precision medicine initiatives in large and cutting edge health care institutions. A featured speaker and panelist at National Conferences and Councils including TEDx Tysons, NIH, Amazon ReInvent, Precision Medicine World Conference, Labroots, HIMSS, and an invited speaker at the National Research Council’s Standing Committee on Biological and Physical Sciences in Space (CBPSS).

Experience in start-up and new team development. Proven change-agent in diverse organizations and politically charged environments. A catalyst to create vision, motivation, and results across an entire enterprise. Creative thinker; organized, resolute, and able to direct multiple competing priorities with great precision while meeting strict deadlines and budget requirements. Strong healthcare and research industry knowledge, particularly in Life Sciences, with expertise in developing, implementing, and supporting large data enterprise architectures. Excellent interpersonal skills, work effectively with individuals of diverse backgrounds, and inspire teams to work to their fullest potential.

About #Podcast:

FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to discuss their journey to create the data-driven future.

Wanna Join? If you or any you know wants to join in, Register your interest @ play.analyticsweek.com/guest/

Want to sponsor? Email us @ [email protected]

Keywords:

FutureOfData #DataAnalytics #Leadership #Podcast #BigData #Strategy

futureofdata

leadership

data in hr

hr data

hris

big data

Practical Enterprise Data Lake Insights: Handle Data-Driven Challenges in an Enterprise Big Data Lake

Use this practical guide to successfully handle the challenges encountered when designing an enterprise data lake and learn industry best practices to resolve issues. When designing an enterprise data lake you often hit a roadblock when you must leave the comfort of the relational world and learn the nuances of handling non-relational data. Starting from sourcing data into the Hadoop ecosystem, you will go through stages that can bring up tough questions such as data processing, data querying, and security. Concepts such as change data capture and data streaming are covered. The book takes an end-to-end solution approach in a data lake environment that includes data security, high availability, data processing, data streaming, and more. Each chapter includes application of a concept, code snippets, and use case demonstrations to provide you with a practical approach. You will learn the concept, scope, application, and starting point. What You'll Learn Get to know data lake architecture and design principles Implement data capture and streaming strategies Implement data processing strategies in Hadoop Understand the data lake security framework and availability model Who This Book Is For Big data architects and solution architects

Big Data Architect???s Handbook

Big Data Architect's Handbook is your comprehensive guide to mastering the art of building sophisticated big data solutions. As you delve into this book, you'll learn to design end-to-end big data pipelines and integrate data from various sources for insightful analysis. What this Book will help me do Understand the Hadoop ecosystem and familiarize yourself with major Apache projects. Make informed decisions when designing cloud infrastructures for big data needs. Gain expertise in analyzing structured and unstructured data using machine learning. Develop skills to implement scalable and efficient big data pipelines. Enhance your ability to visualize and monitor data insights effectively. Author(s) None Akhtar has amassed a wealth of experience in big data architecture and related technologies. With years of hands-on involvement in development, analysis, and implementation of big data systems, None brings a pragmatic and insightful perspective. This passion for educating others about data-driven technologies shines through in a user-first approach to making complex topics accessible. Who is it for? This book caters to aspiring data professionals, software developers, and tech enthusiasts aiming to enhance their expertise in big data. Readers with basic programming and data analysis skills will find the content approachable yet challenging enough to deepen their understanding. If your career goal involves managing, analyzing, and making decisions based on large datasets, this book will help bridge the gap between skill and application.

In this podcast, Harsh Tiwari, Former CDO CUNA Mutual Group, sheds light on data science leadership in the financial / risk sector. He shares some key takeaway insights for aspiring leaders to take for managing large enterprise data science practice. He shared the importance of collaborations and a growth mindset via a partnership. He discussed his "So what" approach to problem-solving. This podcast is great for any listener willing to understand some best practices for being a data-driven leader.

Timeline: 0:28 Harsh's journey. 5:44 Harsh's current role. 10:17 Ideal location for a chief data officer. 14:42 Ideal CDO role and placement. 20:15 Capital One's best practices in managing data. 25:28 How are the credit unions and regional banks placed in terms of data management. 31:20 Introducing data to well-performing banks. 38:05 Getting started as a CDO in a bank. 43:21 Checklist for a business to hire a CDO. 48:35 Keeping oneself sane during the technological disruption. 54:13 Harsh's success mantra. 58:51 Harsh's favorite read. 1:02:14 Parting thoughts.

Harsh's Recommended Read: Good to Great: Why Some Companies Make the Leap and Others Don't by Jim Collins https://amzn.to/2I7DHGM

Podcast Link: https://futureofdata.org/harsh-tiwari-talks-about-fabric-of-data-driven-leader-in-financial-sector-futureofdata-podcast/

Harsh's BIO: Harsh Tiwari is the Senior Vice President and Chief Data Officer for CUNA Mutual Group in Madison, Wisconsin. His primary responsibilities include leading enterprise-wide data initiatives providing strategy and policy guidance for data acquisition, usage, and management. He joined the company in July 2015. Before joining CUNA Mutual Group, Harsh spent many years working in information technology, analytics, and data intelligence. He worked at Capital One Financial Group in Plano, Texas, for 17 years, where he most recently focused on creating an effective data and business intelligence environment to manage risks across the company as the Head of Risk Management Data and Business Intelligence. He has also served as the Divisional CIO for Small Business Credit Card and Consumer Lending, Head of Portfolio and Delivery Management, Head of Auto Finance Data and Business Intelligence, Business Information Officer of Capital One Canada, and Analyst –Senior Manager of Small Business Data & System Analysis.

A native of India, Harsh earned a B.S. in Mechanical engineering from Mysore University in Mysore, Karnataka, India, and an M.B.A. in Finance / MIS Drexel University in Philadelphia, Pennsylvania. In his spare time, Harsh enjoys golfing and spending time with his wife, Rashmi, and their son, who is 12, and a daughter, who is 8.

About #Podcast:

FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to discuss their journey to create the data-driven future.

Wanna Join? If you or any you know wants to join in, Register your interest @ http://play.analyticsweek.com/guest/

Keywords:

FutureOfData #DataAnalytics #Leadership #Podcast #BigData #Strategy

Next-Generation Big Data: A Practical Guide to Apache Kudu, Impala, and Spark

Utilize this practical and easy-to-follow guide to modernize traditional enterprise data warehouse and business intelligence environments with next-generation big data technologies. Next-Generation Big Data takes a holistic approach, covering the most important aspects of modern enterprise big data. The book covers not only the main technology stack but also the next-generation tools and applications used for big data warehousing, data warehouse optimization, real-time and batch data ingestion and processing, real-time data visualization, big data governance, data wrangling, big data cloud deployments, and distributed in-memory big data computing. Finally, the book has an extensive and detailed coverage of big data case studies from Navistar, Cerner, British Telecom, Shopzilla, Thomson Reuters, and Mastercard. What You’ll Learn Install Apache Kudu, Impala, and Spark to modernize enterprise data warehouse and business intelligence environments, complete with real-world, easy-to-follow examples, and practical advice Integrate HBase, Solr, Oracle, SQL Server, MySQL, Flume, Kafka, HDFS, and Amazon S3 with Apache Kudu, Impala, and Spark Use StreamSets, Talend, Pentaho, and CDAP for real-time and batch data ingestion and processing Utilize Trifacta, Alteryx, and Datameer for data wrangling and interactive data processing Turbocharge Spark with Alluxio, a distributed in-memory storage platform Deploy big data in the cloud using Cloudera Director Perform real-time data visualization and time series analysis using Zoomdata, Apache Kudu, Impala, and Spark Understand enterprise big data topics such as big data governance, metadata management, data lineage, impact analysis, and policy enforcement, and how to use Cloudera Navigator to perform common data governance tasks Implement big data use cases such as big data warehousing, data warehouse optimization, Internet of Things, real-time data ingestion and analytics, complex event processing, and scalable predictive modeling Study real-world big data case studies from innovative companies, including Navistar, Cerner, British Telecom, Shopzilla, Thomson Reuters, and Mastercard Who This Book Is For BI and big data warehouse professionals interested in gaining practical and real-world insight into next-generation big data processing and analytics using Apache Kudu, Impala, and Spark; and those who want to learn more about other advanced enterprise topics

In this podcast, Drew Conway (@DrewConway) from Aluvium talks about his journey to start an IoT startup. He sheds light on the opportunities in the industrial IoT space and shares some insights into the mechanism of running a data science startup in the IoT space. She shared some tactical suggestions for any future leader. This podcast is great for data science startup entrepreneurs and/or Sr. executives in IoT.

Timeline: 0:28 Drew's journey from counter-terrorism to IoT startup. 9:29 Data science in the industrial space. 12:01 Entrepreneurship in the IoT start-up. 18:36 Selling data analysis to executives in the industrial space. 24:14 Automation in the industrial setting. 29:27 What is an IoT ready company? 32:40 Challenges in integrating data tools in the industrial sector. 37:27 Data science talent pool in industrial and manufacturing companies. 41:52 Challenges in IoT adoption for industrial companies. 46:31 Alluvium's interaction with industries. 50:57 Picking the right use case as an IoT start-up. 52:49 Right customers for an IoT start-up. 59:26 Words of wisdom for anyone building a IoT start-up.

Drew's Recommended Listen: Gödel, Escher, Bach: An Eternal Golden Braid by Douglas R. Hofstadter https://amzn.to/2x0uo7d

Podcast Link: https://futureofdata.org/drewconway-on-fabric-of-an-iot-startup-futureofdata-podcast/

Drew's BIO: Drew Conway, CEO and founder of Alluvium, is a leading expert in the application of computational methods to social and behavioral problems at large-scale. Drew has been writing and speaking about the role of data — and the discipline of data science — in industry, government, and academia for several years.

Drew has advised and consulted companies across many industries, ranging from fledgling start-ups to Fortune 100 companies, as well as academic institutions and government agencies at all levels. Drew started his career in counter-terrorism as a computational social scientist in the U.S. intelligence community.

About #Podcast:

FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to discuss their journey to create the data-driven future.

Want to sponsor? Email us @ [email protected]

Keywords:

FutureOfData #DataAnalytics #Leadership #Podcast #BigData #Strategy

Big Data Analytics with Hadoop 3

Big Data Analytics with Hadoop 3 is your comprehensive guide to understanding and leveraging the power of Apache Hadoop for large-scale data processing and analytics. Through practical examples, it introduces the tools and techniques necessary to integrate Hadoop with other popular frameworks, enabling efficient data handling, processing, and visualization. What this Book will help me do Understand the foundational components and features of Apache Hadoop 3 such as HDFS, YARN, and MapReduce. Gain the ability to integrate Hadoop with programming languages like Python and R for data analysis. Learn the skills to utilize tools such as Apache Spark and Apache Flink for real-time data analytics within the Hadoop ecosystem. Develop expertise in setting up a Hadoop cluster and performing analytics in cloud environments such as AWS. Master the process of building practical big data analytics pipelines for end-to-end data processing. Author(s) Sridhar Alla is a seasoned big data professional with extensive industry experience in building and deploying scalable big data analytics solutions. Known for his expertise in Hadoop and related ecosystems, Sridhar combines technical depth with clear communication in his writing, providing practical insights and hands-on knowledge. Who is it for? This book is tailored for data professionals, software engineers, and data scientists looking to expand their expertise in big data analytics using Hadoop 3. Whether you're an experienced developer or new to the big data ecosystem, this book provides the step-by-step guidance and practical examples needed to advance your skills and achieve your analytical goals.

In this podcast, Drew Conway (@DrewConway) from Alluvium talks about his journey on creating a socially connected and responsible data science practice. He shared tactical steps and suggestions to help recruit the right talent, build the right culture, and nurture the relationship to create a sustained and impactful data science practice. The session is great for folks caring to create a self-sustaining and growth compliant data science practice.

Timeline: 0:28 Drew's journey from counter-terrorism to IoT startup. 9:29 Data science in the industrial space. 12:01 Entrepreneurship in the IoT start-up. 18:36 Selling data analysis to executives in the industrial space. 24:14 Automation in the industrial setting. 29:27 What is an IoT ready company? 32:40 Challenges in integrating data tools in the industrial sector. 37:27 Data science talent pool in industrial and manufacturing companies. 41:52 Challenges in IoT adoption for industrial companies. 46:31 Alluvium's interaction with industries. 50:57 Picking the right use case as an IoT start-up. 52:49 Right customers for an IoT start-up. 59:26 Words of wisdom for anyone building an IoT start-up.

Drew's Recommended Listen: Gödel, Escher, Bach: An Eternal Golden Braid by Douglas R. Hofstadter https://amzn.to/2x0uo7d

Podcast Link: https://futureofdata.org/drewconway-on-creating-socially-responsible-data-science-practice-futureofdata-podcast/

Drew's BIO: Drew Conway, CEO, and founder of Alluvium, is a leading expert in applying computational methods to social and behavioral problems at a large-scale. Drew has been writing and speaking about the role of data — and the discipline of data science — in industry, government, and academia for several years.

Drew has advised and consulted companies across many industries, ranging from fledgling start-ups to Fortune 100 companies, as well as academic institutions and government agencies at all levels. Drew started his career in counter-terrorism as a computational social scientist in the U.S. intelligence community.

About #Podcast:

FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to discuss their journey to create the data-driven future.

Wanna Join? If you or any you know wants to join in, Register your interest @ http://play.analyticsweek.com/guest/

Want to sponsor? Email us @ [email protected]

Keywords:

FutureOfData #DataAnalytics #Leadership #Podcast #BigData #Strategy