Public Sector Code-Along

We at the National Innovation Centre for Data (NICD) are joining forces with Azure Databricks and the Ministry of Justice (MoJ)'s Splink team to bring you the Public Sector Code-Along, taking place at The Catalyst in Newcastle upon Tyne on Tuesday 31st January from 8.30 a.m. to 4.00 p.m.

This event will provide participants with the opportunity to get to grips with Databricks and the MoJ's Splink package. Databricks is a unified set of tools for building, deploying, sharing, and maintaining enterprise-grade data solutions at scale, while Splink is a PySpark package that allows you to link millions of distinct records that refer to an individual entity but lack a consistent identifier.

NB: There is also an event running in parallel with this one at a location in London. If you would like to attend that event, visit this page.

The first mission of the National Data Strategy, devised by the UK Government, is to unlock the value of data across the economy, where it is noted that there is much untapped potential in linking datasets from different organisations.

However, many organisations struggle to link their own datasets, never mind being able to link with datasets from a completely different organisation. When analysts and other data professionals spend huge amounts of time retrieving, merging, cleaning and verifying their data, it's time not spent doing the valuable work of understanding and synthesising their analysis into actionable information.

In this Code-Along workshop, you will join us live at The Catalyst in Newcastle upon Tyne as we address this challenge by focussing on real-world examples of linking data, through looking at corporate officers - using publicly available data sourced from Companies House - in conjunction with a dataset of payment practices from large organisations.

Who should attend this event?

Public sector workers who are interested in methods for combining datasets with inconsistent identifiers.

Public sector workers who are interested in learning more about Databricks. You should attend this event if you are comfortable performing basic exploratory data analysis (EDA) in Python, which will be the predominant programming language used throughout.

There is no formal requirement for attendees to know Databricks or PySpark, or anything about data linking or entity resolution. Enablement and support for this will be provided on the day across both locations.

Groups will be organised into beginner and immediate skillsets across both locations to allow for the best collaboration and enablement experience. All attendees are required to bring their own laptops.


08.30 a.m. | Registration, Meet Team Members + Refreshments 09.00 a.m. | Keynote + Introductions | Speakers TBC 10.30 a.m. | Code-Along Begins 12.30 p.m. | Informal Lunch 1.30 p.m. | Code-Along Resumes 3.00 p.m. | Sharing of Work + Closing Remarks 4.00 p.m. | Networking + Drinks 5.30 p.m. | Event Finishes

Photo by Joshua Sortino on Unsplash

More Events

View All Events

Our logo is a live picture of Newcastle, right now.
Captured with almost 500 sensors.