The 2022 Label Studio Community Survey shows how data teams are
increasing investments in data preparation and management to
achieve better ML/AI outcomes.
SAN
FRANCISCO, Dec. 8, 2022 /PRNewswire-PRWeb/ -- Data
science teams are shifting their focus from model development to
dataset development in order to deliver Machine Learning (ML) and
Artificial Intelligence (AI) initiatives that are more performant,
differentiated and aligned with business goals. This and other
findings are available in the first Label Studio Community Survey,
where data scientists, ML engineers and researchers from the global
open source community shared insights into the state of ML and
AI.
Label Studio is the most popular open source data labeling
platform with more than 150,000 users worldwide, 95,000,000+
annotations created and over 11,000 stars on GitHub. Community
members from more than 40 countries participated in the survey, and
75% of the survey respondents currently have ML/AI models in
production with another 15% planning to have models in production
soon.
"We're in the midst of a fundamental shift in how organizations
approach ML and AI," said Michael
Malyuk, co-founder and CEO of Heartex, creators of Label
Studio. "Model development was once the source of differentiated
value, but as the results of this survey highlight, organizations
now spend 50-80% of their time iterating on the dataset and quality
of its labeling to train accurate models. We call this emerging
practice dataset development."
Successful ML and AI applications rely on models trained using
high quality data. The 2022 Label Studio Community Survey explores
the current state of the ML/AI ecosystem, with a focus on how teams
are approaching data labeling, preparation and management as a key
part of the pipeline.
*Key Findings in the Label Studio Community Survey*
Machine Learning and AI are becoming increasingly strategic.
- 73% of respondents noted their organizations will make a higher
level of investment in their ML/AI initiatives in the coming
year.
Data poses the biggest challenge to putting ML/AI models into
production.
- 80% of respondents state that accurately labeled data is one of
the biggest challenges to getting ML/AI models in production (the
top response), while 46% cited lack of data as one of the biggest
challenges (the second most popular response).
Data science teams now spend the majority of their time on
dataset preparation, management and iteration, known as dataset
development.
- 72% of respondents reported spending 50% or more of their time
on data preparation, iteration and management, while more than
one-third (34%) of respondents said they spend 75% or more of their
time on the data.
Data preparation and labeling are becoming increasingly
cross-functional.
- While most respondents have the traditional roles of data
scientists and data engineers, the responsibility for data labeling
is broad, requiring engagement across organizations from interns to
executives and business leaders. Notably, 20% reported that a mix
of roles held the data prep responsibility, including subject
matter experts, who accounted for 5% of responses, and business
analysts, who accounted for 3%.
The Label Studio Community Survey also dives into popular
technology choices, finding that ML/AI workloads are primarily
hosted on cloud offerings, while HuggingFace is the most popular
source for pre-trained models. More details can be found in the
full report.
*About Heartex*
Heartex is the company behind Label Studio, the most popular open
source data labeling platform for Machine Learning and AI. Founded
in 2019 by data scientists and engineers who faced common
challenges with model accuracy due to poor quality training data,
the team believed the only viable solution was to enable internal
teams with domain expertise to annotate and curate training data.
They created Label Studio with a focus on usability, flexibility
and collaborative workflows that support internal data labeling
operations at scale and increase the accuracy of ML/AI models.
Today, Label Studio has been used by more than 150,000 people
around the world to label nearly 100 million pieces of data,
including production ML/AI initiatives for enterprises like
Bombora, Geberit, Outreach, Wyze, Yext and more. For more
information, visit http://www.heartex.com.
###
Media Contact
Robert Cathey, Cathey
Communications for Heartex, +1 865-386-6118, robert@cathey.co
SOURCE Label Studio