talk-data.com talk-data.com

Mandy Chessell

Speaker

Mandy Chessell

5

talks

Mandy Chessell CBE FREng CEng FBCS is a trusted advisor to executives from large organisations, working with them to develop their strategy and architecture relating to the governance, integration and management of information. Mandy worked for IBM for 35 years, the last 15 as an IBM Distinguished Engineer. She is now one of the founders of Pragmatic Data Research Ltd, dedicated to improving the transparency, security and efficiency of digital operations and data management. Mandy is also the honorary president of the Institution of Engineering Designers (IED).

Mandy has been developing integration software throughout her career. Her focus has always been on using and supporting open standards to achieve heterogeneous-interoperability. Today Mandy is the leader and top contributor to the Egeria Open Source project (https://egeria-project.org) which is part of the LF AI & Data Foundation. Egeria is focused on providing an open metadata and governance technology that is able to exchange, integrate and correlate metadata from different tools, engines and platforms.

Mandy is a Fellow of the Royal Academy of Engineering. In 2015 she received a CBE for services to software engineering. In 2000, she was identified as one of MIT Technology Review's hundred young people most likely

Bio from: Big Data LDN 2025

Filter by Event / Source

Talks & appearances

5 activities · Newest first

Search activities →
Face To Face
with Dan Wolfson (Pragmatic Data Research Ltd.) , Mandy Chessell (Pragmatic Data Research)

So this is pretty cool. You have just created a set of data products. The pipelines are running and the data is ready ... but no-one is using them. Why not? Taking the consumer's perspective, I will show you 10 top tips on how to attract more people to your data.

Following on from the Building consumable data products keynote, we will dive deeper into the interactions around the data product catalog, to show how the network effect of explicit data sharing relationships starts to pay dividends to the participants. Such as:

For the product consumer:

• Searching for products, understanding content, costs, terms and conditions, licenses, quality certifications etc

• Inspecting sample data, choosing preferred data format, setting up a secure subscription, and seeing data provisioned into a database from the product catalog.

• Providing feedback and requesting help

• Reviewing own active subscriptions

• Understanding the lineage behind each product along with outstanding exceptions and future plans

For the product manager/owner:

• Setting up a new product, creating a new release of an existing product and issuing a data correction/restatement

• Reviewing a product’s active subscriptions and feedback/requests from consumers

• Interacting with the technical teams on pipeline implementations along with issues and proposed enhancements

• For the data governance team

• Viewing the network of dependencies between data products (the data mesh) to understand the data value chains and risk concentrations

• Reviewing a dashboard of metrics around the data products including popularity, errors/exceptions, subscriptions, interaction

• Show traceability from a governance policy relating to, say data sovereignty or data privacy to the product implementations.

• Building trust profiles for producers and consumers

The aim of the demonstrations and discussions is to explore the principles and patterns relating to data products, rather than push a particular implementation approach.

Having said that, all of the software used in the demonstrations is open source. Principally this is Egeria, Open Lineage and Unity Catalog from the Linux Foundation, plus Apache Airflow, Apache Kafka and Apache SuperSet from the Apache Software Foundation.  

Videos of the demonstrations will be available on YouTube after the conference and the complete demo software can be downloaded and run on a laptop so you can share your experiences with your teams after the event.

When data users choose to circumvent official sources of data, it increases the risks to the organisation. How do you encourage teams to share data widely, effectively and legally? Are data products the answer? We explore the data sharing user journey to highlight the key features of a data product strategy that significantly improves the effectiveness of your data.

Graph Processing for Open Metadata and Governance by Mandy Chessell

Big Data Europe Onsite and online on 22-25 November in 2022 Learn more about the conference: https://bit.ly/3BlUk9q

Join our next Big Data Europe conference on 22-25 November in 2022 where you will be able to learn from global experts giving technical talks and hand-on workshops in the fields of Big Data, High Load, Data Science, Machine Learning and AI. This time, the conference will be held in a hybrid setting allowing you to attend workshops and listen to expert talks on-site or online.

Designing and Operating a Data Reservoir

Together, big data and analytics have tremendous potential to improve the way we use precious resources, to provide more personalized services, and to protect ourselves from unexpected and ill-intentioned activities. To fully use big data and analytics, an organization needs a system of insight. This is an ecosystem where individuals can locate and access data, and build visualizations and new analytical models that can be deployed into the IT systems to improve the operations of the organization. The data that is most valuable for analytics is also valuable in its own right and typically contains personal and private information about key people in the organization such as customers, employees, and suppliers. Although universal access to data is desirable, safeguards are necessary to protect people's privacy, prevent data leakage, and detect suspicious activity. The data reservoir is a reference architecture that balances the desire for easy access to data with information governance and security. The data reservoir reference architecture describes the technical capabilities necessary for a system of insight, while being independent of specific technologies. Being technology independent is important, because most organizations already have investments in data platforms that they want to incorporate in their solution. In addition, technology is continually improving, and the choice of technology is often dictated by the volume, variety, and velocity of the data being managed. A system of insight needs more than technology to succeed. The data reservoir reference architecture includes description of governance and management processes and definitions to ensure the human and business systems around the technology support a collaborative, self-service, and safe environment for data use. The data reservoir reference architecture was first introduced in Governing and Managing Big Data for Analytics and Decision Makers, REDP-5120, which is available at: http://www.redbooks.ibm.com/redpieces/abstracts/redp5120.html. This IBM® Redbooks publication, Designing and Operating a Data Reservoir, builds on that material to provide more detail on the capabilities and internal workings of a data reservoir.