talk-data.com talk-data.com

Topic

SaaS

Software as a Service (SaaS)

cloud_computing software_delivery subscription

310

tagged

Activity Trend

23 peak/qtr
2020-Q1 2026-Q1

Activities

310 activities · Newest first

Summary One of the biggest challenges in building reliable platforms for processing event pipelines is managing the underlying infrastructure. At Snowplow Analytics the complexity is compounded by the need to manage multiple instances of their platform across customer environments. In this episode Josh Beemster, the technical operations lead at Snowplow, explains how they manage automation, deployment, monitoring, scaling, and maintenance of their streaming analytics pipeline for event data. He also shares the challenges they face in supporting multiple cloud environments and the need to integrate with existing customer systems. If you are daunted by the needs of your data infrastructure then it’s worth listening to how Josh and his team are approaching the problem.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With 200Gbit private networking, scalable shared block storage, a 40Gbit public network, fast object storage, and a brand new managed Kubernetes platform, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. And for your machine learning workloads, they’ve got dedicated CPU and GPU instances. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show! You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data management. For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Corinium Global Intelligence, ODSC, and Data Council. Upcoming events include the Software Architecture Conference in NYC, Strata Data in San Jose, and PyCon US in Pittsburgh. Go to dataengineeringpodcast.com/conferences to learn more about these and other events, and take advantage of our partner discounts to save money when you register today. Your host is Tobias Macey and today I’m interviewing Josh Beemster about how Snowplow manages deployment and maintenance of their managed service in their customer’s cloud accounts.

Interview

Introduction How did you get involved in the area of data management? Can you start by giving an overview of the components in your system architecture and the nature of your managed service? What are some of the challenges that are inherent to private SaaS nature of your managed service? What elements of your system require the most attention and maintenance to keep them running properly? Which components in the pipeline are most subject to variability in traffic or resource pressure and what do you do to ensure proper capacity? How do you manage deployment of the full Snowplow pipeline for your customers?

How has your strategy for deployment evolved since you first began Soffering the managed service? How has the architecture of the pipeline evolved to simplify operations?

How much customization do you allow for in the event that the customer has their own system that they want to use in place of one of your supported components?

What are some of the common difficulties that you encounter when working with customers who need customized components, topologies, or event flows?

How does that reflect in the tooling that you use to manage their deployments?

What types of metrics do you track and what do you use for monitoring and alerting to ensure that your customers pipelines are running smoothly? What are some of the most interesting/unexpected/challenging lessons that you have learned in the process of working with and on Snowplow? What are some lessons that you can generalize for management of data infrastructure more broadly? If you could start over with all of Snowplow and the infrastructure automation for it today, what would you do differently? What do you have planned for the future of the Snowplow product and infrastructure management?

Contact Info

LinkedIn jbeemster on GitHub @jbeemster1 on Twitter

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

Thank you for listening! Don’t forget to check out our other show, Podcast.init to learn about the Python language, its community, and the innovative ways it is being used. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat

Links

Snowplow Analytics

Podcast Episode

Terraform Consul Nomad Meltdown Vulnerability Spectre Vulnerability AWS Kinesis Elasticsearch SnowflakeDB Indicative S3 Segment AWS Cloudwatch Stackdriver Apache Kafka Apache Pulsar Google Cloud PubSub AWS SQS AWS SNS AWS Redshift Ansible AWS Cloudformation Kubernetes AWS EMR

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Support Data Engineering Podcast

Summary

Building an ETL pipeline is a common need across businesses and industries. It’s easy to get one started but difficult to manage as new requirements are added and greater scalability becomes necessary. Rather than duplicating the efforts of other engineers it might be best to use a hosted service to handle the plumbing so that you can focus on the parts that actually matter for your business. In this episode CTO and co-founder of Alooma, Yair Weinberger, explains how the platform addresses the common needs of data collection, manipulation, and storage while allowing for flexible processing. He describes the motivation for starting the company, how their infrastructure is architected, and the challenges of supporting multi-tenancy and a wide variety of integrations.

Preamble

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline you’ll need somewhere to deploy it, so check out Linode. With private networking, shared block storage, node balancers, and a 40Gbit network, all controlled by a brand new API you’ve got everything you need to run a bullet-proof data platform. Go to dataengineeringpodcast.com/linode to get a $20 credit and launch a new server in under a minute. For complete visibility into the health of your pipeline, including deployment tracking, and powerful alerting driven by machine-learning, DataDog has got you covered. With their monitoring, metrics, and log collection agent, including extensive integrations and distributed tracing, you’ll have everything you need to find and fix performance bottlenecks in no time. Go to dataengineeringpodcast.com/datadog today to start your free 14 day trial and get a sweet new T-Shirt. Go to dataengineeringpodcast.com to subscribe to the show, sign up for the newsletter, read the show notes, and get in touch. Your host is Tobias Macey and today I’m interviewing Yair Weinberger about Alooma, a company providing data pipelines as a service

Interview

Introduction How did you get involved in the area of data management? What is Alooma and what is the origin story? How is the Alooma platform architected?

I want to go into stream VS batch here What are the most challenging components to scale?

How do you manage the underlying infrastructure to support your SLA of 5 nines? What are some of the complexities introduced by processing data from multiple customers with various compliance requirements?

How do you sandbox user’s processing code to avoid security exploits?

What are some of the potential pitfalls for automatic schema management in the target database? Given the large number of integrations, how do you maintain the

What are some challenges when creating integrations, isn’t it simply conforming with an external API?

For someone getting started with Alooma what does the workflow look like? What are some of the most challenging aspects of building and maintaining Alooma? What are your plans for the future of Alooma?

Contact Info

LinkedIn @yairwein on Twitter

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Links

Alooma Convert Media Data Integration ESB (Enterprise Service Bus) Tibco Mulesoft ETL (Extract, Transform, Load) Informatica Microsoft SSIS OLAP Cube S3 Azure Cloud Storage Snowflake DB Redshift BigQuery Salesforce Hubspot Zendesk Spark The Log: What every software engineer should know about real-time data’s unifying abstraction by Jay Kreps RDBMS (Relational Database Management System) SaaS (Software as a Service) Change Data Capture Kafka Storm Google Cloud PubSub Amazon Kinesis Alooma Code Engine Zookeeper Idempotence Kafka Streams Kubernetes SOC2 Jython Docker Python Javascript Ruby Scala PII (Personally Identifiable Information) GDPR (General Data Protection Regulation) Amazon EMR (Elastic Map Reduce) Sequoia Capital Lightspeed Investors Redis Aerospike Cassandra MongoDB

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA Support Data Engineering Podcast

In this podcast, Rahul Kashyap(@RCKashyap) talks about the state of security, technology, and business crossroad on Security and the mindset of a security led technologist. He sheds some light on past, present, and future security risks discussed some common leadership concerns, and how a technologist could circumvent that. This podcast is a must for all technologists and wannabe technologists to grow their organization.

Timeline: 0:29 Rahul's journey. 4:40 Rahul's current role. 7:58 How the types of cyberattacks have changed. 12:53 How has IT interaction evolved? 16:50 Problems security industry. 20:12 Market mindset vs. security mindset. 23:10 Ownership of data. 27:02 Cloud, saas, and security. 31:40 Priorities for securing an enterprise. 34:50 How security is secure enough. 37:40 Providing a stable core to the business. 41:11 The state of data science vis a vis security. 44:05 Future of security, data science, and AI. 46:14 Distributed computing and security. 50:30 Tenets of Rahul's success. 53:15 Rahul's favorite read. 54:35 Closing remarks.

Rahul's Recommended Read: Mindset: The New Psychology of Success – Carol S. Dweck http://amzn.to/2GvEX2F

Podcast Link: https://futureofdata.org/rckashyap-cylance-on-state-of-security-technologist-mindset-futureofdata-podcast/

Rahul's BIO: Rahul Kashyap is the Global Chief Technology Officer at Cylance, where he is responsible for strategy, products, and architecture.

Rahul has been instrumental in building several key security technologies viz: Network Intrusion Prevention Systems (NIPS), Host Intrusion Prevention Systems (HIPS), Web Application Firewalls (WAF), Whitelisting, Endpoint/Server Host Monitoring (EDR), and Micro-virtualization. He has been awarded several patents for his innovations. Rahul is an accomplished pen-tester and has in-depth knowledge of OS, networking, and security products.

Rahul has written several security research papers, blogs, and articles that are widely quoted and referenced by media around the world. He has built, led, and scaled award-winning teams that innovate and solve complex security challenges in both large and start-up companies.

He is frequently featured in several podcasts, webinars, and media briefings. Rahul has been a speaker at several top security conferences like BlackHat, BlueHat, Hack-In-The-Box, RSA, DerbyCon, BSides, ISSA International, OWASP, InfoSec UK, and others. He was named 'Silicon Valley's 40 under 40' by Silicon Valley Business Journal.

Rahul mentors entrepreneurs who work with select VC firms and is on the advisory board of tech start-ups.

About #Podcast:

FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to discuss their journey to create the data-driven future.

Wanna Join? If you or any you know wants to join in, Register your interest @ http://play.analyticsweek.com/guest/

Want to sponsor? Email us @ [email protected]

Keywords:

FutureOfData #DataAnalytics #Leadership #Podcast #BigData #Strategy

Fast Data: Smart and at Scale

The need for fast data applications is growing rapidly, driven by the IoT, the surge in machine-to-machine (M2M) data, global mobile device proliferation, and the monetization of SaaS platforms. So how do you combine real-time, streaming analytics with real-time decisions in an architecture that’s reliable, scalable, and simple? In this O’Reilly report, Ryan Betts and John Hugg from VoltDB examine ways to develop apps for fast data, using pre-defined patterns. These patterns are general enough to suit both the do-it-yourself, hybrid batch/streaming approach, as well as the simpler, proven in-memory approach available with certain fast database offerings. Their goal is to create a collection of fast data app development recipes. We welcome your contributions, which will be tested and included in future editions of this report.

SAP in 24 Hours, Sams Teach Yourself, Fifth Edition

Thoroughly updated and expanded! Includes new coverage on HANA, the cloud, and using SAP’s applications! In just 24 sessions of one hour or less, you’ll get up and running with the latest SAP technologies, applications, and solutions. Using a straightforward, step-by-step approach, each lesson strengthens your understanding of SAP from both a business and technical perspective, helping you gain practical mastery from the ground up on topics such as security, governance, validations, release management, SLA, and legal issues. Step-by-step instructions carefully walk you through the most common questions, issues, and tasks. Quizzes and exercises help you build and test your knowledge. Notes present interesting pieces of information. Tips offer advice or teach an easier way to do something. Cautions advise you about potential problems and help you steer clear of disaster. Learn how to… Understand SAP terminology, concepts, and solutions Install SAP on premises or in the cloud Master SAP’s revamped user interface Discover how and when to use in-memory HANA databases Integrate SAP Software as a Service (SaaS) solutions such as Ariba, Successfactors, Fieldglass, and hybris Find resources at SAP’s Service Marketplace, Developer Network, and Help Portal Avoid pitfalls in SAP project implementation, migration, and upgrades Discover how SAP fits with mobile devices, social media, big data, and the Internet of Things Start or accelerate your career working with SAP technologies

Mule in Action, Second Edition

Mule in Action, Second Edition is a totally-revised guide covering Mule 3 fundamentals and best practices. It starts with a quick ESB overview and then dives into rich examples covering core concepts like sending, receiving, routing, and transforming data. About the Technology An enterprise service bus is a way to integrate enterprise applications using a bus-like infrastructure. Mule is the leading open source Java ESB. It borrows from the Hohpe/Woolf patterns, is lightweight, can publish REST and SOAP services, integrates well with Spring, is customizable, scales well, and is cloud-ready. About the Book Mule in Action, Second Edition is a totally revised guide covering Mule 3 fundamentals and best practices. It starts with a quick ESB overview and then dives into rich examples covering core concepts like sending, receiving, routing, and transforming data. You'll get a close look at Mule's standard components and how to roll out custom ones. You'll also pick up techniques for testing, performance tuning, and BPM orchestration, and explore cloud API integration for SaaS applications. Written for developers, architects, and IT managers, this book requires familiarity with Java but no previous exposure to Mule or other ESBs. What's Inside Full coverage of Mule 3 Integration with cloud services Common transports, routers, and transformers Security, routing, orchestration, and transactions About the Reader Written for developers, architects, and IT managers, this book requires familiarity with Java but no previous exposure to Mule or other ESBs. About the Authors David Dossot is a software architect and has created numerous modules and transports for Mule. John D'Emic is a principal solutions architect and Victor Romero a solutions architect, both at MuleSoft, Inc. Quotes Captures the essence of pragmatism that is the founding principle of Mule. - From the Foreword by Ross Mason, Creator of Mule A new, in-depth perspective. - Dan Barber, Penn Mutual Excellent topic coverage and code examples. - Davide Piazza, Thread Solutions srl, MuleSoft Partner This edition has grown, with more real-world examples and a thorough grounding in messaging. - Keith McAlister, CGI

The New Era of Enterprise Business Intelligence: Using Analytics to Achieve a Global Competitive Advantage

A Complete Blueprint for Maximizing the Value of Business Intelligence in the Enterprise The typical enterprise recognizes the immense potential of business intelligence (BI) and its impact upon many facets within the organization–but it’s not easy to transform BI’s potential into real business value. In top BI expert Mike Biere presents a complete blueprint for creating winning BI strategies and infrastructure, and systematically maximizing the value of information throughout the enterprise. The New Era of Enterprise Business Intelligence, This product-independent guide brings together start-to-finish guidance and practical checklists for every senior IT executive, planner, strategist, implementer, and the actual business users themselves. Drawing on thousands of hours working with enterprise customers, Biere helps decision-makers choose from today’s unprecedented spectrum of options, including the latest BI platform suites and appliances. He offers practical, “in-the-trenches” insights on a wide spectrum of planning and implementation issues, from segmenting and supporting users to working with unstructured data. Coverage includes • Understanding the scope of today’s BI solutions and how they fit into existing infrastructure • Assessing new options such as SaaS and cloud-based technologies • Avoiding technology biases and other “project killers” • Developing effective RFIs/RFPs and proofs of concept • Setting up competency centers and planning for skills development • Crafting a better experience for all your business users • Supporting the requirements of senior executives, including performance management • Cost-justifying BI solutions and measuring success • Working with enterprise content management, text analytics, and search • Planning and constructing portals, mashups, and other user interfaces • Previewing the future of BI

Bolster your data security in the AI era with Microsoft and Netskope

Want to have consistent data security and regulatory compliance with enterprise-wide visibility and policy enforcement across all your endpoints, SaaS, IaaS, and network traffic? Join this session to learn how expanded data discovery and classification from Microsoft Purview integrated with unified security and advanced controls from Netskope for all your data at rest and in motion delivers a seamless administrator and user experience to help you extend the value of your Microsoft investment.

Is your business ready to embrace the software as a service (SaaS) revolution? Discover how Google Cloud can fast-track your SaaS journey. This session reveals how Google Cloud empowers you to scale your SaaS offerings, reduce time to market, and deliver exceptional customer experiences using a comprehensive toolbox with per-tenant controls, feature rollout management, and insightful monitoring. You’ll also learn from Avathon’s success story: how they leveraged these capabilities to conquer the complexities of SaaS management, automate tenant onboarding, and centralize operations.

JoinMicrosoft’s product team for a hands-on lab where you'll design and deploy anAI-powered application using SQL Database in Microsoft Fabric. This sessiondives into HTAP capabilities, enabling seamless transactional and analyticalprocessing. You'll provision a SaaS-native SQL Database, use Copilot togenerate schema and queries, and implement advanced patterns like RAG withvector search. Walk away with practical skills and a working solution you canapply immediately.