talk-data.com talk-data.com

Topic

Big Data

data_processing analytics large_datasets

1217

tagged

Activity Trend

28 peak/qtr
2020-Q1 2026-Q1

Activities

1217 activities · Newest first

A Data Scientist's Guide to Acquiring, Cleaning, and Managing Data in R

The only how-to guide offering a unified, systemic approach to acquiring, cleaning, and managing data in R Every experienced practitioner knows that preparing data for modeling is a painstaking, time-consuming process. Adding to the difficulty is that most modelers learn the steps involved in cleaning and managing data piecemeal, often on the fly, or they develop their own ad hoc methods. This book helps simplify their task by providing a unified, systematic approach to acquiring, modeling, manipulating, cleaning, and maintaining data in R. Starting with the very basics, data scientists Samuel E. Buttrey and Lyn R. Whitaker walk readers through the entire process. From what data looks like and what it should look like, they progress through all the steps involved in getting data ready for modeling. They describe best practices for acquiring data from numerous sources; explore key issues in data handling, including text/regular expressions, big data, parallel processing, merging, matching, and checking for duplicates; and outline highly efficient and reliable techniques for documenting data and recordkeeping, including audit trails, getting data back out of R, and more. The only single-source guide to R data and its preparation, it describes best practices for acquiring, manipulating, cleaning, and maintaining data Begins with the basics and walks readers through all the steps necessary to get data ready for the modeling process Provides expert guidance on how to document the processes described so that they are reproducible Written by seasoned professionals, it provides both introductory and advanced techniques Features case studies with supporting data and R code, hosted on a companion website A Data Scientist's Guide to Acquiring, Cleaning and Managing Data in R is a valuable working resource/bench manual for practitioners who collect and analyze data, lab scientists and research associates of all levels of experience, and graduate-level data mining students.

In this podcast, Joel Comm from The Bad Crypto Podcast sat with Vishal Kumar, CEO AnalyticsWeek, and discuss the World of Crypto Currencies. The discussion sheds light on the nuances in the rapidly exploding world of Crypto Currencies and some of the thinking behind the currencies. The discussion also sheds light on the opportunities and risks in the industry. Joel sheds his insights about how to think about these currencies and the long term implications of the algos that run these currencies. The podcast is a great listen for anyone who wants to understand the world of cryptocurrencies.

Timeline: 0:29 Joel's journey. 5:45 Thinking behind "Bad Crypto". 7:50 Getting into the domain of cryptocurrency. 13:30 Underlying technology behind cryptocurrency. 17:00 On Bitcoin. 18:50 Tracing back a ledger. 20:36 The use of blockchain. 23:00 Every bitcoin is a country. 25:11 Parameters to invest in Cryptocurrency. 26:05 Some better known Cryptocurrency. 31:52 The security aspect of Cryptocurrency. 41:52 Security and regulations of tokens. 44:03 The consensus element of blockchain. 46:25 Alternatives to the blockchain. 49:30 Bitcoin as payment. 58:15 Manipulation of the Crypto market. 1:00:51 Joel's favorite reads.

Youtube: https://youtu.be/xJucEIDitas iTunes: http://apple.co/2ynxopz

Please note, this podcast and/or its content in no way advocate any investment advice and nor intended to generate any positive or negative influence. Crypto Currencies are highly volatile in nature, and any investor must use absolute caution and care while evaluating such currencies.

Joel's Recommended Read: Cryptocurrencies 101 By James Altucher http://bit.ly/2Bi5FMv

Podcast Link: https://futureofdata.org/discussing-world-crypto-joelcomm-badcrypto/

Joel's BIO: As a knowledgeable & inspirational speaker, Joel speaks on a variety of business and entrepreneurial topics. He presents a step-by-step playbook on how to use social media as a leveraging tool to expand the reach of your brand, increase your customer base, and create fierce brand loyalty for your business. Joel is also able to speak with authority on the various ways to harness the marketing power of technology to explode profits. He offers an inspiring yet down-to-earth call to action for those who dream of obtaining growth and financial success. As someone who went from having only 87 cents in his bank account to creating multiple successful businesses, Joel is uniquely poised to instruct and inspire when it comes to using the various forms of new media as avenues towards the greater goal of business success. He is a broadcast veteran with thousands of hours in radio, podcasting, television, and online video experience. Joel is the host of two popular yet completely different podcasts. FUN with Joel Comm features the lighter side of the top business and social leaders. The Bad Crypto Podcast makes cryptocurrency and bitcoin understandable to the masses.

Joel is the New York Times best-selling author of 14 books, including The AdSense Code, Click Here to Order: Stories from the World's Most Successful Entrepreneurs, KaChing: How to Run an Online Business that Pays and Paysm Twitter Power 3.0, and Self Employed: 50 Signs That You Might Be an Entrepreneur. He has also written over 40 ebooks. He has appeared in The New York Times, on Jon Stewart's The Daily Show, on CNN online, on Fox News, and many other places.

About #Podcast:

FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to discuss their journey to create the data-driven future.

Wanna Join? If you or any you know wants to join in, Register your interest @ http://play.analyticsweek.com/guest/

Want to sponsor? Email us @ [email protected]

Keywords:

FutureOfData #DataAnalytics #Leadership #Podcast #BigData #Strategy

Expert Apache Cassandra Administration

Follow this handbook to build, configure, tune, and secure Apache Cassandra databases. Start with the installation of Cassandra and move on to the creation of a single instance, and then a cluster of Cassandra databases. Cassandra is increasingly a key player in many big data environments, and this book shows you how to use Cassandra with Apache Spark, a popular big data processing framework. Also covered are day-to-day topics of importance such as the backup and recovery of Cassandra databases, using the right compression and compaction strategies, and loading and unloading data. Expert Apache Cassandra Administration provides numerous step-by-step examples starting with the basics of a Cassandra database, and going all the way through backup and recovery, performance optimization, and monitoring and securing the data. The book serves as an authoritative and comprehensive guide to the building and management of simpleto complex Cassandra databases. The book: Takes you through building a Cassandra database from installation of the software and creation of a single database, through to complex clusters and data centers Provides numerous examples of actual commands in a real-life Cassandra environment that show how to confidently configure, manage, troubleshoot, and tune Cassandra databases Shows how to use the Cassandra configuration properties to build a highly stable, available, and secure Cassandra database that always operates at peak efficiency What You'll Learn Install the Cassandra software and create your first database Understand the Cassandra data model, and the internal architecture of a Cassandra database Create your own Cassandra cluster, step-by-step Run a Cassandra cluster on Docker Work with Apache Spark by connecting to a Cassandra database Deploy Cassandra clusters in your data center, or on Amazon EC2 instances Back up and restore mission-critical Cassandra databases Monitor, troubleshoot, and tune production Cassandra databases, and cut your spending on resources such as memory, servers, and storage Who This Book Is For Database administrators, developers, and architects who are looking for an authoritative and comprehensive single volume for all their Cassandra administration needs. Also for administrators who are tasked with setting up and maintaining highly reliable and high-performing Cassandra databases. An excellent choice for big data administrators, database administrators, architects, and developers who use Cassandra as their key data store, to support high volume online transactions, or as a decentralized, elastic data store.

PySpark Recipes: A Problem-Solution Approach with PySpark2

Quickly find solutions to common programming problems encountered while processing big data. Content is presented in the popular problem-solution format. Look up the programming problem that you want to solve. Read the solution. Apply the solution directly in your own code. Problem solved! PySpark Recipes covers Hadoop and its shortcomings. The architecture of Spark, PySpark, and RDD are presented. You will learn to apply RDD to solve day-to-day big data problems. Python and NumPy are included and make it easy for new learners of PySpark to understand and adopt the model. What You Will Learn Understand the advanced features of PySpark2 and SparkSQL Optimize your code Program SparkSQL with Python Use Spark Streaming and Spark MLlib with Python Perform graph analysis with GraphFrames Who This Book Is For Data analysts, Python programmers, big data enthusiasts

When computers became commodity hardware and storage became incredibly cheap, we entered the era of so-call "big" data. Most definitions of big data will include something about not being able to process all the data on a single machine. Distributed computing is required for such large datasets. Getting an algorithm to run on data spread out over a variety of different machines introduced new challenges for designing large-scale systems. First, there are concerns about the best strategy for spreading that data over many machines in an orderly fashion. Resolving ambiguity or disagreements across sources is sometimes required. This episode discusses how such algorithms related to the complexity class NC.

In this podcast, Igor Volovich(@CyberIgor) talks about the strategic side of cybersecurity. He shared some practices that businesses could adopt to keep their infrastructure safe. Igor sheds some light on some easy ways to measure security for your business and understand the leadership commitment needed to establish a security mindset. Igor also shares the need for metric lead strategies to quantify the outcome. This podcast is great for future information security leaders to understand data science and metrics led cybersecurity strategy.

Timeline: 0:29 Igor's journey. 10:37 Recognizing innovation in small companies. 16:30 Aligning with an incubator. 25:16 Creating robust risk metric. 39:29 Right way of thinking about cybersecurity. 50:42 Can a company be offensive about security. 57:43 Igor's favorite read. 59:17 Igor's upcoming book.

Igor's Recommended Read: How to Measure Anything in Cybersecurity Risk by Douglas W. Hubbard, Richard Seiersen http://amzn.to/2BOoK6D

Podcast Link: https://futureofdata.org/563505-2/

Igor's BIO: Strategist, advisor, advocate, mentor, author, speaker, and cyber leader. Passionate about the craft of cybersecurity and its role in protecting the computing public, the integrity of global commerce and international trade, and defense of critical national infrastructure.

Internationally experienced cybersecurity executive and senior advisor with 20 years of service to the world's largest private and public-sector entities, Fortune 100's, US legislative and executive branches, and regulatory agencies

About #Podcast:

FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to discuss their journey to create the data-driven future.

Wanna Join? If you or any you know wants to join in, Register your interest @ http://play.analyticsweek.com/guest/

Want to sponsor? Email us @ [email protected]

Keywords:

FutureOfData #DataAnalytics #Leadership #Podcast #BigData #Strategy

In this podcast, George Corugedo(@RedpointCTO) / @Redpoint talks about the ingredients of a technologist in a data-driven world. He sheds light on technology & technologist bias and how companies could work progressively to respond in an unbiased manner. He shared some insights on leading a data science product as a technologist and shared some takeaways for future technologists. This podcast is great for future technologists thinking of shaping their organization to take advantage of technological disruptions to stay competitive.

Timeline: 0:29 George's journey. 3:35 Challenges in George's journey. 7:22 The relevance of mathematics in this data-driven world. 13:02 Statistitians getting into the technology stack. 22:38 Data-driven customer engagement platform. 24:24 Challenges for a technologist to connect with various platforms and prospects. 28:52 Customer challenges for businesses. 31:55 What do businesses get about marketing? 34:04 Bridging the gap between data and analytics. 42:42 Hacks for mitigating bias. 46:18 Appification: a bane or an opportunity. 48:45 An candidate for a data analytics startup. 52:40 Important KPIs for a data-driven customer engagement company. 56:33 How does George keep himself updated? 57:58 What keeps George up at night? 59:15 George's favorite read. 1:01:05 Closing remarks.

Youtube: https://youtu.be/u6CtN-TYjXI iTunes: http://apple.co/2AJDnuz

Ed's Recommended Read: To Kill a Mockingbird by Harper Lee http://amzn.to/2hZnwwx Self-Reliance and Other Essays (Dover Thrift Editions) by Ralph Waldo Emerson http://amzn.to/2i0WcOx

Podcast Link: https://futureofdata.org/redpointcto-redpointglobal-on-becoming-an-unbiased-technologist-in-datadriven-world/

George's BIO: A former math professor and seasoned technology executive, RedPoint Chief Technology Officer and Co-Founder George Corugedo has more than two decades of business and technical experience. George is responsible for directing the development of the RedPoint Customer Engagement Hub, RedPoint’s leading enterprise customer engagement solution. George left academia in 1997 to co-found Accenture’s Customer Insights Practice, which specialized in strategic data utilization, analytics, and customer strategy. George’s previous positions include director of client delivery at ClarityBlue, Inc., a provider of hosted customer intelligence solutions, and COO/CIO of Riscuity, a receivables management company that specialized in using analytics to drive collections.

About #Podcast:

FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to discuss their journey to create the data-driven future.

Wanna Join? If you or any you know wants to join in, Register your interest @ http://play.analyticsweek.com/guest/

Want to sponsor? Email us @ [email protected]

Keywords:

FutureOfData #DataAnalytics #Leadership #Podcast #BigData #Strategy

Big Data Analytics with SAS

Discover how to leverage the power of SAS for big data analytics in 'Big Data Analytics with SAS.' This book helps you unlock key techniques for preparing, analyzing, and reporting on big data effectively using SAS. Whether you're exploring integration with Hadoop and Python or mastering SAS Studio, you'll advance your analytics capabilities. What this Book will help me do Set up a SAS environment for performing hands-on data analytics tasks efficiently. Master the fundamentals of SAS programming for data manipulation and analysis. Use SAS Studio and Jupyter Notebook to interface with SAS efficiently and effectively. Perform preparatory data workflows and advanced analytics, including predictive modeling and reporting. Integrate SAS with platforms like Hadoop, SAP HANA, and Cloud Foundry for scaling analytics processes. Author(s) None Pope is a seasoned data analytics expert with extensive experience in SAS and big data platforms. With a passion for demystifying complex data workflows, None teaches SAS techniques in an approachable way. Their expert insights and practical examples empower readers to confidently analyze and report on data. Who is it for? If you're a SAS professional or a data analyst looking to expand your skills in big data analysis, this book is for you. It suits readers aiming to integrate SAS into diverse tech ecosystems or seeking to learn predictive modeling and reporting with SAS. Both beginners and those familiar with SAS can benefit.

In this podcast, @CRGutowski from @GE_Digital talks about the importance of data and analytics in transforming sales organizations. She sheds light on challenges and opportunities with transforming the sales organization of a transnational enterprise using analytics and implement a growth mindset. Cate shared some of the tenets of the transformation mindset. This podcast is great for future leaders who are thinking of shaping their sales organization and empower them with the digital mindset.

Timeline: 0:29 Cate's journey. 7:40 Cate's typical day. 9:07 How does the sales cope up with disruption? 13:25 Data science in sales. 14:48 Planning a digital software for 25000 workforces. 18:00 The thin line between marketing and sales. 22:13 Safeguarding the workforce against tech disruption. 24:57 The culture of sales. 27:55 Designing a digitally connected strategy. 30:08 Designing customer experience. 33:48 Sales strategy for a startup. 36:43 Selling transformative sales strategies to executives. 40:55 How can organizations go digital? 43:25 Digital thread. 44:14 How can a sales organization deal with IT? 45:54 Pitfalls in the process of digitization. 48:44 Challenges for sales folks amid disruption. 50:30 How does Cate keep herself updated? 52:10 Cate's success mantra. 54:06 Closing remarks.

Youtube: https://youtu.be/3jcpYgvIli4 iTunes: http://apple.co/2hM9r5E

Cate's Recommended Read: Start with Why: How Great Leaders Inspire Everyone to Take Action by Simon Sinek http://amzn.to/2hGvc6w

Podcast Link: https://futureofdata.org/crgutowski-ge_digital-using-analytics-transform-sales/

Cate's BIO: Cate has 20 years of technical sales, marketing, and product leadership experience across various global divisions in GE. Cate is currently based in Boston, MA, and works as the VP – Commercial Digital Thread, leading the digital transformation of GE’s 25,000+ sales organization globally. Prior to relocating to Boston, Cate and her family lived in Budapest, Hungary, where she led product management, marketing, and commercial operations across EMEA for GE Current. Cate holds an M.B.A. from the University of South Florida and a Bachelor’s degree in Communications and Business Administration from the University of Illinois at Urbana-Champaign.

About #Podcast:

FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to discuss their journey to create the data-driven future.

Wanna Join? If you or any you know wants to join in, Register your interest @ http://play.analyticsweek.com/guest/

Want to sponsor? Email us @ [email protected]

Keywords:

FutureOfData, #Data, #Analytics, #Leadership Podcast, #Big Data, #Strategy

Functional Data Structures in R: Advanced Statistical Programming in R

Get an introduction to functional data structures using R and write more effective code and gain performance for your programs. This book teaches you workarounds because data in functional languages is not mutable: for example you’ll learn how to change variable-value bindings by modifying environments, which can be exploited to emulate pointers and implement traditional data structures. You’ll also see how, by abandoning traditional data structures, you can manipulate structures by building new versions rather than modifying them. You’ll discover how these so-called functional data structures are different from the traditional data structures you might know, but are worth understanding to do serious algorithmic programming in a functional language such as R. By the end of Functional Data Structures in R, you’ll understand the choices to make in order to most effectively work with data structures when you cannot modify the data itself. These techniques are especially applicable for algorithmic development important in big data, finance, and other data science applications. What You'll Learn Carry out algorithmic programming in R Use abstract data structures Work with both immutable and persistent data Emulate pointers and implement traditional data structures in R Build new versions of traditional data structures that are known Who This Book Is For Experienced or advanced programmers with at least a comfort level with R. Some experience with data structures recommended.

Mastering MongoDB 3.x

"Mastering MongoDB 3.x" is your comprehensive guide to mastering the world of MongoDB, the leading NoSQL database. This book equips you with both foundational and advanced skills to effectively design, develop, and manage MongoDB-powered applications. Discover how to build fault-tolerant systems and dive deep into database internals, deployment strategies, and much more. What this Book will help me do Gain expertise in advanced querying using indexing and data expressions for efficient data retrieval. Master MongoDB administration for both on-premise and cloud-based environments efficiently. Learn data sharding and replication techniques to ensure scalability and fault tolerance. Understand the intricacies of MongoDB internals, including performance optimization techniques. Leverage MongoDB for big data processing by integrating with complex data pipelines. Author(s) Alex Giamas is a seasoned database developer and administrator with strong expertise in NoSQL technologies, particularly MongoDB. With years of experience guiding teams on creating and optimizing database structures, Alex ensures clear and practical methods for learning the essential aspects of MongoDB. His writing focuses on actionable knowledge and practical solutions for modern database challenges. Who is it for? This book is perfect for database developers, system architects, and administrators who are already familiar with database concepts and are looking to deepen their knowledge in NoSQL databases, specifically MongoDB. Whether you're working on building web applications, scaling data systems, or ensuring fault tolerance, this book provides the guidance to optimize your database management skill set.

In this podcast, @EdwardBoudrot from @Optum talks about how leaders could induce design thinking into product design and process engineering. Ed shares some of the ways organizations (small or big) could create lean processes that induce not only efficient people-centric products but also help future proof companies by bringing them closer to their customer. This podcast is great for future leaders who are thinking of shaping their organization around design thinking concepts.

Timeline: 0:29 Edward's journey. 4:55 Innovation in a culturally thick company. 10:46 Life cycle of design thinking. 15:45 Designing thinking's role in business strategy. 19:28 Attributes of design thinking in business strategy. 23:07 Edward's expansion strategy. 25:30 Favorite design thinking concepts. 29:40 How to move a product mindset to a design thinking mindset. 32:22 Lab atmosphere to execute design thinking ideas. 34:15 Tips for startups to get started with design thinking. 35:40 Steps for companies to adopt design thinking. 38:15 Collaboration in design thinking. 41:00 Getting started with a human-centered design. 43:42 Tenets of a successful design thinking executive. 46:30 KPIs to measure the success of your design. 48:58 Design thinking and disruption. 53:22 Businesses that are doing well at design thinking. 55:33 How can design thinking protect itself from market changes. 59:17 Edward's favorite reads. 1:00:33 Closing remarks.

Ed's Recommended Read: Ten Types of Innovation: The Discipline of Building Breakthroughs http://amzn.to/2ywxKLx 101 Design Methods: A Structured Approach for Driving Innovation in Your Organization http://amzn.to/2AFiWvE Tools of Titans: The Tactics, Routines, and Habits of Billionaires, Icons, and World-Class Performers http://amzn.to/2zAkwAJ

Podcast Link: https://futureofdata.org/edwardboudrot-optum-designthinking-data-driven-products/

Ed's BIO: Ed Boudrot is the Vice President of Fusion, an enterprise accelerator for Optum. Optum’s mission is to help people live healthier lives and to help make the health system work better for everyone. Boudrot has founded and has been a part of serval startups in the Boston area and Intuit innovation labs. He specializes in the convergence of Human-centered design, business strategy, and rapid development to optimize experiences and business outcomes.

About #Podcast:

FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to discuss their journey to create the data-driven future.

Wanna Join? If you or any you know wants to join in, Register your interest @ http://play.analyticsweek.com/guest/

Want to sponsor? Email us @ [email protected]

Keywords: FutureOfData Data Analytics Leadership Podcast Big Data Strategy

The State of Data Analytics and Visualization Adoption

Businesses regardless of industry or company size increasingly rely on data analytics and visualization to gain competitive advantage. That’s why organizations today are racing to gather, store, and analyze data from many sources in a wide range of formats. In the spring of 2017, Zoomdata commissioned an O’Reilly survey to assess the state of data analytics and visualization technology adoption across several industries, including manufacturing, financial services, and healthcare. Roughly 875 respondents answered questions online about their industry, job role, company size, and reasons for using analytics, as well as technologies they use in analytics programs, the perceived value of analytics programs, and many other topics. This report reveals: The industries furthest along in adopting big data analytics and visualization technologies The most commonly analyzed sources of big data The most commonly used technologies for analyzing streaming data Which analytics skills are in most demand The most valued characteristic of big data across all industries The types of users big data analytics and visualization projects typically target If you’re a technology decision maker, a product manager looking to embed analytics, a business user relying on analytics, or a developer pursuing the most marketable skills, this report provides valuable details on today’s data analytics trends.

In This podcast, Brian Haugli from The Hanover Insurance Group sat with Vishal to talk about some of the security led leader's mindset. From discussing some of the leadership mindset to practitioner tactical guide to help future security leaders to understand how to secure their organization. This session is great for any security, passionate leader willing to create a security wrapped growth mindset.

Timeline: 0:28 Brian's journey. 3:45 Brian's current role. 7:43 CSO combining with physical security. 10:12 Physical security infrastructure. 11:55 Brian's journey from Military and corporate. 14:42 Common challenges for a CSO. 17:37 Do security certifications help professionals secure an organization? 22:14 Advice for those wanting to join the security industry. 27:14 Recommendations for a startup to stay secure. 34:32 CSO's necessity in understanding tech and business. 36:35 Hacks to cope with new company integrations and operations. 40:50 Security vs. business innovation. 44:13 Security is seen as professional janitors. 52:30 The role of government and regulations in providing security. 55:30 Brian's keys to success. 58:36 Closing remarks.

Brian's Read Recommendation: On The Road by Jack Kerouac http://amzn.to/2hMhOhG

Podcast Link: https://futureofdata.org/brianhaugli-the_hanover-%e2%80%8fon-building-leadership-security-mindset/

GooglePlay: http://math.im/gplay

Brian's BIO: Brian Haugli is a Certified Information Systems Security Professional (CISSP) and a Global Industrial Cyber Security Professional (GICSP). Brian previously served as a senior advisor on cybersecurity and information risk management for the Department of Defense, US Army ITA, and Pentagon. He has 20 years of professional experience and expertise in network topologies, design, implementation, architecture, and cybersecurity. He has extensive knowledge of and has implemented risk management frameworks, methodologies, and processes. He has been responsible for creating compliant and secure networks for multiple sites through his extensive background in intrusion detection and full network end-to-end testing. He has outstanding communication skills, a positive demeanor, and the ability to interface with all levels of an organization.

About #Podcast:

FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to discuss their journey to create the data-driven future.

Wanna Join? If you or any you know wants to join in, Register your interest @ http://play.analyticsweek.com/guest/

Want to sponsor? Email us @ [email protected]

Keywords: FutureOfData Data Analytics Leadership Podcast Big Data Strategy

In this podcast, Andrea Gallego, Principal & Global Technology Lead @ Boston Consulting Group, talks about her journey as a data science practitioner in the consulting space. She talks about some of the industry practices that up and rising data science professionals must deploy and talks about some operational hacks to help create a robust data science team. It is a must-listen conversation for practitioner folks in the industry trying to deploy a data science team and build solutions for a service industry.

Timeline: 0:29 Andrea's journey. 5:41 Andrea's current role. 8:02 Seasoned data professional to COO role. 11:27 The essentials for having analytics at scale. 14:56 First steps to creating an analytics practice. 18:33 Defining an engineering first company. 22:33 A different understanding of data engineering. 26:40 Mistakes businesses make in their data science practice. 30:21 Some good business problems that data science can solve. 36:42 Democratization of data vs. privacy in companies. 38:04 Tech to business challenges. 40:11 Important KPIs for building a data science practice. 43:47 Hacks to hiring good data science candidates. 49:07 Art of doing business and science of doing business. 52:16 Andrea's secret to success. 55:12 Andrea's favorite read. 58:35 Closing remarks.

Andrea's Recommended Read: Arrival by Ted Chiang http://amzn.to/2h6lJpv Build to Last by Jim Collins http://amzn.to/2yMCsam Designing Agentive Technology: AI That Works for People Paperback http://amzn.to/2ySDHGp

Podcast Link: https://futureofdata.org/andrea-gallego-bcg-managing-analytics-practice/

Andrea's BIO: Andrea is Principal & Global Technology Lead @ Boston Consulting Group. Prior to BCG, Andrea was COO of QuantumBlack’s Cloud platform. She also manages the cloud platform team and helps drive the vision and future of McKinsey Analytics’ digital capabilities. Andrea has broad expertise in computer science, cloud computing, digital transformation strategy, and analytics solutions architecture. Prior to joining the Firm, Andrea was a technologist at Booz Allen Hamilton. She holds a BS in Economics and MS in Analytics (with a concentration in computing methods for analytics).

About #Podcast:

FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to discuss their journey to create the data-driven future.

Wanna Join? If you or any you know wants to join in, Register your interest @ http://play.analyticsweek.com/guest/

Want to sponsor? Email us @ [email protected]

Keywords: FutureOfData Data Analytics Leadership Podcast Big Data Strategy

In this podcast, Sid Probstein, CTO AIFoundry, talks about the mindset of technology transformist in a data-driven world. He discusses some of the challenges he faces as a technologist and provides some ways to mitigate them. Sid also talks about the mindset of technologists in a startup vs. a larger enterprise. It is a must-listen conversation for technology folks in the industry trying to navigate the technology and business divide.

Timeline: 0:28 Sid's journey. 7:02 Sid's current role. 15:26 Regulatory bottlenecks. 17:51 Efficiency of Banking Technologies of today. 20:22 Evolution of storage and processing. 23:30 How can legacy models upgrade themselves to newer ones. 27:40 Breaking the cultural mould and moving to big data. 32:56 Convincing the leadership for new technology. 35:55 CTO relation with CDO. 39:18 Difference in working style between a startup and an established company. 43:17 Quantifying and evaluating a product for enterprise projects. 46:02 How can the leadership pick the right software? 49:57 Team dynamics and hiring process. 51:55 Sid's success mantra. 53:47 Sid's favorite read. 54:50 Closing remarks.

Sid's Recommended Read: Arrival by Ted Chiang http://amzn.to/2h6lJpv Build to Last by Jim Collins http://amzn.to/2yMCsam Designing Agentive Technology: AI That Works for People Paperback http://amzn.to/2ySDHGp

Podcast Link: https://futureofdata.org/sidprobstein-aifoundry-becoming-technology-transformist-data-driven-world/

Sid's BIO: Sid Probstein is the CTO and VP of Solution Delivery for AI Foundry, the enterprise software arm, and the new face of Kodak Alaris. AI Foundry is disrupting the mortgage business by taking origination automation to the next level - enabling self-service, distributed capture, and the automatic classification and extraction of scanned & imaged documents into actionable intelligence. He was previously co-founder and CTO at Attivio and held executive positions at FAST Search & Transfer, Northern Light Technology, and John Hancock Financial Services.

About #Podcast:

FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to discuss their journey to create the data-driven future.

Wanna Join? If you or any you know wants to join in, Register your interest @ http://play.analyticsweek.com/guest/

Want to sponsor? Email us @ [email protected]

Keywords: FutureOfData Data Analytics Leadership Podcast Big Data Strategy

Machine Learning with R Cookbook - Second Edition

Machine Learning with R Cookbook, Second Edition, is your hands-on guide to applying machine learning principles using R. Through simple, actionable examples and detailed step-by-step recipes, this book will help you build predictive models, analyze data, and derive actionable insights. Explore core topics in data science, including regression, classification, clustering, and more. What this Book will help me do Apply the Apriori algorithm for association analysis to uncover relationships in transaction datasets. Effectively visualize data patterns and associations using a variety of plots and graphing methods. Master the application of regression techniques to address predictive modeling challenges. Leverage the power of R and Hadoop for performing big data machine learning efficiently. Conduct advanced analyses such as survival analysis and improve machine learning model performance. Author(s) Yu-Wei, Chiu (David Chiu), the author, is an experienced data scientist and R programmer who specializes in applying data science and machine learning principles to solve real-world problems. David's pragmatic and comprehensive teaching style provides readers with deep insights and practical methodologies for using R effectively in their projects. His passion for data science and expertise in R and big data make this book a reliable resource for learners. Who is it for? This book is ideal for data scientists, analysts, and professionals working with machine learning and R. It caters to intermediate users who are versed in the basics of R and want to deepen their skills. If you aim to become the go-to expert for machine learning challenges and enhance your efficiency and capability in machine learning projects, this book is for you.

In this podcast, John T Langton, Director of Applied Data Science, sat with Vishal, President AnalyticsWeek, and discussed his data analytics journey. He shared his insights, from his startup days to running a data science group within a big enterprise.

Timeline: 0:28 John's journey. 13:28 John's current role. 17:06 Succeeding as a data scientist in different organizations. 26:47 Challenges in putting together a data science company. 38:36 Hacks to selling innovative ideas to clients and customers. 47:20 Defining a good data science hire. 51:50 Maturity level of enterprise AI. 1:00:00 Closing remarks.

John's Recommended Read: Designing Agentive Technology: AI That Works for People Paperback http://amzn.to/2ySDHGp

Podcast Link: https://futureofdata.org/johntlangton-wolters_kluwer-discussed-ai-lead-startup-journey/

John's BIO: John Langton is Director of Applied Data Science at Wolters Kluwer. He was previously worked as Director of Data Science at athenahealth, CEO of VisiTrend, a visual analytics company that was acquired by Carbon Black in 2015. He has a Ph.D. in computer science and an extensive background in AI, machine learning, big data analytics, and visualization. Prior to founding VisiTrend, John was Principal Investigator (PI) on several DoD projects at Charles River Analytics (CRA). He has taught classes at Brandeis University and has several peer-reviewed publications.

About #Podcast:

FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to discuss their journey to create the data-driven future.

Wanna Join? If you or any you know wants to join in, Register your interest @ http://play.analyticsweek.com/guest/

Want to sponsor? Email us @ [email protected]

Keywords: FutureOfData Data Analytics Leadership Podcast Big Data Strategy

Introduction to GPUs for Data Analytics

Moore’s law has finally run out of steam for CPUs. The number of x86 cores that can be placed cost-effectively on a single chip has reached a practical limit, making higher densities prohibitively expensive for most applications. Fortunately, for big data analytics, machine learning, and database applications, a more capable and cost-effective alternative for scaling compute performance is already available: the graphics processing unit, or GPU. In this report, executives at Kinetica and Sierra Communications explain how incorporating GPUs is ideal for keeping pace with the relentless growth in streaming, complex, and large data confronting organizations today. Technology professionals, business analysts, and data scientists will learn how their organizations can begin implementing GPU-accelerated solutions either on premise or in the cloud. This report explores: How GPUs supplement CPUs to enable continued price/performance gains The many database and data analytics applications that can benefit from GPU acceleration Why GPU databases with user-defined functions (UDFs) can simplify and unify the machine learning/deep learning pipeline How GPU-accelerated databases can process streaming data from the Internet of Things and other sources in real time The performance advantage of GPU databases in demanding geospatial analytics applications How cognitive computing—the most compute-intensive application currently imaginable—is now within reach, using GPUs

Jeff Palmucci / @TripAdvisor talk about building a Machine Learning Team and shared some best practices for running a data-driven startup

Timeline: 0:29 Jeff's journey. 8:28 Jeff's experience of working in different eras of data science. 10:34 Challenges in working on a futuristic startup. 13:40 Entrepreneurship and ML solutions. 16:42 Putting together a ML team. 20:32 How to chose the right use case to work on? 22:20 Hacks for putting together a group for ML solutions. 24:40 Convincing the leadership of changing the culture. 29:00 Thought process of putting together an ML group. 31:36 How to gauge the right data science candidate? 35:46 Important KPIs to consider while putting together a ML group. 38:30 The merit of shadow groups within a business unit. 41:05 Jeff's key to success. 42:58 How is having a hobby help a data science leader? 45:05 Appifying is good or bad? 52:07 The fear of what ML throws out. 54:09 Jeff's favorite reads. 55:34 Closing remarks.

Podcast Link: https://futureofdata.org/jeff-palmucci-tripadvisor-discusses-managing-machinelearning-ai-team/

About Jeff Palmucci: As a serial entrepreneur, Jeff has started several companies. He was VP of Software Development for Optimax Systems, a developer of scheduling systems for manufacturing operations acquired by i2 Technologies. As a Founder and CTO of programmatic hedge fund Percipio Capital Management, he helped lead the company to an acquisition by Link Ventures. Jeff is currently leading the Machine Learning group at Tripadvisor, which does various machine learning projects across the company, including natural language processing, review fraud detection, personalization, information retrieval, and machine vision. Jeff has publications in natural language processing, machine learning, genetic algorithms, expert systems, and programming languages. When Jeff is not writing code, he enjoys going to innumerable rock concerts as a professional photographer.

Jeff's Favorite Authors (Genre: Science Fiction): Vernor Vinge http://amzn.to/2ygDPOu Stephen Baxter http://amzn.to/2ygG6cn

About #Podcast:

FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to discuss their journey to create the data-driven future.

Wanna Join? If you or any you know wants to join in, Register your interest @ http://play.analyticsweek.com/guest/

Want to sponsor? Email us @ [email protected]

Keywords: FutureOfData Data Analytics Leadership Podcast Big Data Strategy