talk-data.com talk-data.com

Topic

API

Application Programming Interface (API)

integration software_development data_exchange

856

tagged

Activity Trend

65 peak/qtr
2020-Q1 2026-Q1

Activities

856 activities · Newest first

IBM FlashSystem and VMware Implementation and Best Practices Guide

This IBM® Redbooks® publication details the configuration and best practices for using the IBM FlashSystem® family of storage products within a VMware environment. The first version of this book was published in 2021 and specifically addressed IBM Spectrum® Virtualize Version 8.4 with VMware vSphere 7.0. This second version of this book includes all the enhancements that are available with IBM Spectrum Virtualize 8.5. Topics illustrate planning, configuring, operations, and preferred practices that include integration of IBM FlashSystem storage systems with the VMware vCloud suite of applications: VMware vSphere Web Client (vWC) vSphere Storage APIs - Storage Awareness (VASA) vSphere Storage APIs – Array Integration (VAAI) VMware Site Recovery Manager (SRM) VMware vSphere Metro Storage Cluster (vMSC) Embedded VASA Provider for VMware vSphere Virtual Volumes (vVols) This book is intended for presales consulting engineers, sales engineers, and IBM clients who want to deploy IBM FlashSystem storage systems in virtualized data centers that are based on VMware vSphere. Note: There is a newer version of this book: "IBM Storage Virtualize and VMware: Integrations, Implementation and Best Practices, SG24-8549". This book addresses IBM Storage Virtualize Version 8.6 with VMware vSphere 8. The new IBM Storage plugin for vSphere is covered in this book.

Web Scraping with Python, 3rd Edition

If programming is magic, then web scraping is surely a form of wizardry. By writing a simple automated program, you can query web servers, request data, and parse it to extract the information you need. This thoroughly updated third edition not only introduces you to web scraping but also serves as a comprehensive guide to scraping almost every type of data from the modern web. Part I focuses on web scraping mechanics: using Python to request information from a web server, performing basic handling of the server's response, and interacting with sites in an automated fashion. Part II explores a variety of more specific tools and applications to fit any web scraping scenario you're likely to encounter. Parse complicated HTML pages Develop crawlers with the Scrapy framework Learn methods to store the data you scrape Read and extract data from documents Clean and normalize badly formatted data Read and write natural languages Crawl through forms and logins Scrape JavaScript and crawl through APIs Use and write image-to-text software Avoid scraping traps and bot blockers Use scrapers to test your website

Hands-On Entity Resolution

Entity resolution is a key analytic technique that enables you to identify multiple data records that refer to the same real-world entity. With this hands-on guide, product managers, data analysts, and data scientists will learn how to add value to data by cleansing, analyzing, and resolving datasets using open source Python libraries and cloud APIs. Author Michael Shearer shows you how to scale up your data matching processes and improve the accuracy of your reconciliations. You'll be able to remove duplicate entries within a single source and join disparate data sources together when common keys aren't available. Using real-world data examples, this book helps you gain practical understanding to accelerate the delivery of real business value. With entity resolution, you'll build rich and comprehensive data assets that reveal relationships for marketing and risk management purposes, key to harnessing the full potential of ML and AI. This book covers: Challenges in deduplicating and joining datasets Extracting, cleansing, and preparing datasets for matching Text matching algorithms to identify equivalent entities Techniques for deduplicating and joining datasets at scale Matching datasets containing persons and organizations Evaluating data matches Optimizing and tuning data matching algorithms Entity resolution using cloud APIs Matching using privacy-enhancing technologies

The usage of GA4 and BigQuery real-time reports features can be quite challenging, especially in high-traffic volume websites and other demanding environments. For example, assuming that you find a viable solution for a specific project, it is crucial to determine in advance the projected BigQuery expenses, in order to avoid unpleasant surprises. Architecture, data management, limits and quotas on API Requests are also part of this complex equation. Matteo and Roberto will share some real-world solutions for GA4 and BigQuery real-time needs tested with several clients in different industries.

Data Engineering with Scala and Spark

Data Engineering with Scala and Spark guides you through building robust data pipelines that process massive datasets efficiently. You will learn practical techniques leveraging Scala and Spark with a hands-on approach to mastering data engineering tasks including ingestion, transformation, and orchestration. What this Book will help me do Set up a data pipeline development environment using Scala Utilize Spark APIs like DataFrame and Dataset for effective data processing Implement CI/CD and testing strategies for pipeline maintainability Optimize pipeline performance through tuning techniques Apply data profiling and quality enforcement using tools like Deequ Author(s) Eric Tome, Rupam Bhattacharjee, and David Radford bring decades of combined experience in data engineering and distributed systems. Their work spans cutting-edge data processing solutions using Scala and Spark. They aim to help professionals excel in building reliable, scalable pipelines. Who is it for? This book is tailored for working data engineers familiar with data workflow processes who desire to enhance their expertise in Scala and Spark. If you aspire to build scalable, high-performance data solutions or transition raw data into strategic assets, this book is ideal.

Summary

Monitoring and auditing IT systems for security events requires the ability to quickly analyze massive volumes of unstructured log data. The majority of products that are available either require too much effort to structure the logs, or aren't fast enough for interactive use cases. Cliff Crosland co-founded Scanner to provide fast querying of high scale log data for security auditing. In this episode he shares the story of how it got started, how it works, and how you can get started with it.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics. Trusted by teams of all sizes, including Comcast and Doordash, Starburst is a data lake analytics platform that delivers the adaptability and flexibility a lakehouse ecosystem promises. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data. Want to see Starburst in action? Go to dataengineeringpodcast.com/starburst and get $500 in credits to try Starburst Galaxy today, the easiest and fastest way to get started using Trino. Your host is Tobias Macey and today I'm interviewing Cliff Crosland about Scanner, a security data lake platform for analyzing security logs and identifying issues quickly and cost-effectively

Interview

Introduction How did you get involved in the area of data management? Can you describe what Scanner is and the story behind it?

What were the shortcomings of other tools that are available in the ecosystem?

What is Scanner explicitly not trying to solve for in the security space? (e.g. SIEM) A query engine is useless without data to analyze. What are the data acquisition paths/sources that you are designed to work with?- e.g. cloudtrail logs, app logs, etc.

What are some of the other sources of signal for security monitoring that would be valuable to incorporate or integrate with through Scanner?

Log data is notoriously messy, with no strictly defined format. How do you handle introspection and querying across loosely structured records that might span multiple sources and inconsistent labelling strategies? Can you describe the architecture of the Scanner platform?

What were the motivating constraints that led you to your current implementation? How have the design and goals of the product changed since you first started working on it?

Given the security oriented customer base that you are targeting, how do you address trust/network boundaries for compliance with regulatory/organizational policies? What are the personas of the end-users for Scanner?

How has that influenced the way that you think about the query formats, APIs, user experience etc. for the prroduct?

For teams who are working with Scanner can you describe how it fits into their workflow? What are the most interesting, innovative, or unexpected ways that you have seen Scanner used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Scanner? When is Scanner the wrong choice? What do you have planned for the future of Scanner?

Contact Info

LinkedIn

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning. Visit the site to subscribe to the show, sign up for the mailing list, and read the s

We talked about:

Ivan’s background How Ivan became interested in investing Getting financial data to run simulations Open, High, Low, Close, Volume Risk management strategy Testing your trading strategies Sticking to your strategy Important metrics and remembering about trading fees Important features Deployment How DataTalks.Club courses helped Ivan Ivan’s site and course sign-up

Links:

Exploring Finance APIs: https://pythoninvest.com/long-read/exploring-finance-apis Python Invest Blog Articles: https://pythoninvest.com/blog

Free ML Engineering course: http://mlzoomcamp.com Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

Elasticsearch in Action, Second Edition

Build powerful, production-ready search applications using the incredible features of Elasticsearch. In Elasticsearch in Action, Second Edition you will discover: Architecture, concepts, and fundamentals of Elasticsearch Installing, configuring, and running Elasticsearch and Kibana Creating an index with custom settings Data types, mapping fundamentals, and templates Fundamentals of text analysis and working with text analyzers Indexing, deleting, and updating documents Indexing data in bulk, and reindexing and aliasing operations Learning search concepts, relevancy scores, and similarity algorithms Elasticsearch in Action, Second Edition teaches you to build scalable search applications using Elasticsearch. This completely new edition explores Elasticsearch fundamentals from the ground up. You’ll deep dive into design principles, search architectures, and Elasticsearch’s essential APIs. Every chapter is clearly illustrated with diagrams and hands-on examples. You’ll even explore real-world use cases for full text search, data visualizations, and machine learning. Plus, its comprehensive nature means you’ll keep coming back to the book as a handy reference! About the Technology Create fully professional-grade search engines with Elasticsearch and Kibana! Rewritten for the latest version of Elasticsearch, this practical book explores Elasticsearch’s high-level architecture, reveals infrastructure patterns, and walks through the search and analytics capabilities of numerous Elasticsearch APIs. About the Book Elasticsearch in Action, Second Edition teaches you how to add modern search features to websites and applications using Elasticsearch 8. In it, you’ll quickly progress from the basics of installation and configuring clusters, to indexing documents, advanced aggregations, and putting your servers into production. You’ll especially appreciate the mix of technical detail with techniques for designing great search experiences. What's Inside Understanding search architecture Full text and term-level search queries Analytics and aggregations High-level visualizations in Kibana Configure, scale, and tune clusters About the Reader For application developers comfortable with scripting and command-line applications. About the Author Madhusudhan Konda is a full-stack lead engineer, architect, mentor, and conference speaker. He delivers live online training on Elasticsearch and the Elastic Stack. Quotes Madhu’s passion comes across in the depth and breadth of this book, the enthusiastic tone, and the hands-on examples. I hope you will take what you have read and put it ‘in action’. - From the Foreword by Shay Banon, Founder of Elasticsearch Practical and well-written. A great starting point for beginners and a comprehensive guide for more experienced professionals. - Simona Russo, Serendipity The author’s excitement is evident from the first few paragraphs. Couple that with extensive experience and technical prowess, and you have an instant classic. - Herodotos Koukkides and Semi Koen, Global Japanese Financial Institution

Matteo Pelati: Challenges of Building Blazing Fast Data APIs

Join Matteo Pelati as he delves into the world of blazing fast Data APIs, sharing his extensive experience in overcoming the challenges of crafting efficient, customer-facing data interfaces. 🚀📊 Discover valuable insights and leaner approaches, including the use of cutting-edge tools like Rust, in this enlightening session. 🛠️🔥 #DataAPIs #Efficiency

Bobur Umurzokov: Querying Live Data With LLM App

Unlock the secrets of querying live data with Bobur Umurzokov as he presents 'Querying Live Data With LLM App.' 🌐🤖 Discover how to build your own AI app in just 30 lines of code, harnessing the power of OpenAI's API and Pathway Python libraries. 🚀 Explore a revolutionary approach to handling real-time, ever-changing data for information retrieval, content recommendation, and dynamic chatbots! 📈📚 #LiveData #AIApp #openai

✨ H I G H L I G H T S ✨

🙌 A huge shoutout to all the incredible participants who made Big Data Conference Europe 2023 in Vilnius, Lithuania, from November 21-24, an absolute triumph! 🎉 Your attendance and active participation were instrumental in making this event so special. 🌍

Don't forget to check out the session recordings from the conference to relive the valuable insights and knowledge shared! 📽️

Once again, THANK YOU for playing a pivotal role in the success of Big Data Conference Europe 2023. 🚀 See you next year for another unforgettable conference! 📅 #BigDataConference #SeeYouNextYear

Pasha Finkelshteyn: Sparking Success: Unveiling the Journey of Apache Spark Application Development

Embark on a journey of Apache Spark application development with Pasha Finkelshteyn! 🚀 Explore the stages from concept to execution, delving into data exploration, transformation, and analysis powered by Spark's high-level APIs. 📊 Learn testing and validation approaches for accuracy and reliability, and empower yourself to create robust Spark applications for unlocking insights from massive datasets. 💡🔥 #ApacheSpark #BigData #DevelopmentJourney

✨ H I G H L I G H T S ✨

🙌 A huge shoutout to all the incredible participants who made Big Data Conference Europe 2023 in Vilnius, Lithuania, from November 21-24, an absolute triumph! 🎉 Your attendance and active participation were instrumental in making this event so special. 🌍

Don't forget to check out the session recordings from the conference to relive the valuable insights and knowledge shared! 📽️

Once again, THANK YOU for playing a pivotal role in the success of Big Data Conference Europe 2023. 🚀 See you next year for another unforgettable conference! 📅 #BigDataConference #SeeYouNextYear