Hitting the balance between quality and cost in data labelling
How do we achieve High Quality at Minimal cost?
That is a question every project manager is accustomed to. Especially companies or teams working in the field of Artificial Intelligence and Machine learning are facing this question over and over again. Usually, the cost of software products is worth the investment due to its endless scalability as intangible goods. However, there is one thing which makes AI development an insane resource-crushing occupation and, unfortunately, it constitutes the foundation of every machine learning model:
Labelled Datasets are a Precondition
There is a common opinion that an algorithm can only be as good as the datasets upon which it is trained. Depending on the later use it is wise to keep in mind when setting up a road map for AI development; Computer Vision system that distinguish mature tomatoes from unripe ones will need less training than a system that aims to enable automated driving. “If you talk about autonomous driving, one hour of video data can lead up to 800 man-hours of work” says Siddarth Mall, CEO at Playment, an Annotation Platform. Knowing this we can safely assume that an annotation process takes up a large share of the human capital that is dedicated towards an AI project.
Let’s take an example here: What if your valuable resources that are currently dedicated towards the collection, segmentation and annotation of huge datasets goes back to their core responsibilities, that they were hired for and are expert in ? It has long been neglected without the realization that the most efficient way of allocating human capital in the AI development does not lie within the old practice but by outsourcing annotation process to experts in the market. This way. Software developers can return to their core skills for utmost results
Which leads us to an important conclusion:
Leave Data Annotation to the experts and channelize your valuable in-house resources’ skill that aligns with the objective of your business. However, from this stems another concern that AI teams and project managers face: “How can we leave this project that demands an extraordinary amount of expertise in our field to an external workforce?” What you would need is not just a dedicated team that has sufficient knowledge on the application case of a model (can we distinguish a skin tumor from a mole?) but one that is specialized in all different kinds of image annotations.
Since not only images can be annotated, we are looking at a broad range of data including LIDAR/RADAR-scans, video, even audio and text data that can undergo an annotation process. Different data types vary in the annotation process that can be applied upon them. We have created a short list of different annotation techniques that can be used on certain data types to give you a concise overview.
As you can see there is a large variation of markup-techniques that vary depending on the data type requirement. However, it is difficult to find teams that can work on all different data types with high quality outcome. The problem is twofold: Either teams have specialized on a specific subject within the annotation landscape, or they are capable of doing a bit of everything. In that case, quality will suffer under the attempt to have a broader focus. Once teams realize the need to be not just broad in scope but also achieve high quality, they tend to get extremely pricey up to a point where the operation may not get financed anymore.
Borek Solutions (BS) however has taken a different approach. Unlike other annotation services, BS builds teams that are designated to fulfill a broad range of projects in image annotation. Experts are integrated for several annotation types, so that they can form a unit of broad expertise. What differentiates the modern data labelling workforce from conventional data labelling workforce is that these teams are created with a long-term cooperative vision.The focus remains on economies of scale in order to achieve declining marginal cost after an initial investment,
With the vision of pristine services for our clients, BS is able to offer time- and cost- efficiency at the same time. Our track records shows that Borek Solutions’ clients not only saved 15 to 25% of time by engaging with us but savings of up to 30-50% could also be measured for the projects that our teams performed for them. Besides this, our clients save on fixed labor cost and overhead expenses which can be channelized towards the value added objective of the business.