Infinitypool: Model-Assisted Labeling
Joe Anderson, Daniel Cornell, Alon Greyber, Simon Griggs, Kaylie Naylo, Stephen Williamson, Sean Lynch, Skip Smith, Aaron Wiechmann
Confidence in one’s data is essential for machine learning projects. It is stated succinctly in a recent KDnuggets post where it is said, “Good quality data becomes imperative and a basic building block of an ML [machine learning] pipeline. The ML model can only be as good as its training data.”
Human evaluations of data are often better than automated evaluations, especially when tailoring or evaluating a machine learning model for a specific use case. In addition to jump starting machine learning projects, human data annotation and feedback are important across the machine learning pipeline. This includes human interventions to help evaluate model reliability, give assessments on data quality, and provide a basis for pivoting a given model to a new, more impactful version.
Recognizing the need for human interactions with data, LAS has partnered to build a data labeling application called Infinitypool. Infinitypool is a customizable, collaborative, and compliant data labeling and data evaluation application that provides an essential human interaction needed to support data triage and annotation for supervised machine learning pipelines. Infinitypool is designed to provide a rapid, tiger team approach to data triage to compliment current analytic and machine learning pipelines.
Human evaluations of data is important, yet the cost, in time and money, of creating these human evaluations can be extremely high. Yet there are opportunities to retain data quality while also reducing cost by intentional human-machine teaming within the data evaluation process. To address this challenge and pilot a solution, LAS partnered with a North Carolina State University (NC State) Computer Science Senior Design Team to incorporate a model-assisted labeling service into LAS’s Infinitypool application. The goal was to use a deployed machine learning model to provide a “good guess” for a label to a given data object, and then ask the human, who has the expertise, to confirm or correct the answer. This, therefore, uses the human’s expertise to ensure data quality while reducing the time spent evaluating the data.
We begin with an example, consider the following image where one is asked to put a box and label each car.
This task will take a long time! And this is just one image. What if you were asked to do this for 100 or even 10,000 images. This would take a very, very long time. The task would be tedious. The task would be exhausting. A person’s ability to remain focused and repeatably perform this task with quality will be challenging.
But instead, what if we had a machine learning model that places a box around the cars and all that is needed is to adjust the boxes so they fit correctly and assign the correct label. This would save a lot of time. This would save one’s sanity.
This is exactly what the NC State Senior Design team implemented into Infinitypool.
The NC State Senior Design team engineered a solution that interacts with a deployed machine learning model through API connections to automatically show inferred labels for tasks loaded into Infinitypool. In addition, the team added components to the user interface to adapt the model-assist feature to complement the existing customizable labeling components. As a result, multiple models may be used for a given labeling exercise.
For more information about this project or the LAS/NC State Senior Design partnership, stop by the demo at the LAS Symposium.
- Categories: