In this special guest feature, Ryohei Fujimaki, Ph.D., Founder and CEO of dotData, discusses how AI and ML are having a profound impact on enterprise digital transformation becoming crucial as a competitive advantage and even for survival. As the field grows, four trends emerge, shaping data science in the next five years. dotData is a spin-off of NEC Corporation and the first company focused on delivering full-cycle data science automation for the enterprise. Dr. Fujimaki is a world-renowned data scientist and was the youngest research fellow appointed in the 119-year history of NEC.
According to the Gartner Group, digital business reached a tipping point last year, with 49% of CIOs reporting that their enterprises have already changed their business models or are in the process of doing so. When Gartner asked CIOs and IT leaders which technologies they expect to be most disruptive, artificial intelligence (AI) was the top-mentioned technology.
AI and ML are having a profound impact on
enterprise digital transformation becoming crucial as a competitive advantage
and even for survival. As the field grows, four trends emerge, shaping data
science in the next five years:
The Full Data Science Life-Cycle
The pressure to grow ROI from AI and ML
initiatives has pushed demand for new innovative solutions that accelerate AI
and data science. Although data science processes are iterative and highly
manual, more than 40% of data science tasks are expected to be automated by
2020, according to Gartner, resulting in increased productivity and broader
usage of data across the enterprise.
Recently, automated machine learning (AutoML)
has become one of the fastest-growing technologies for data science. Machine
learning, however, typically accounts
for only 10-20% of the entire data science process. Real pains exist before the
machine learning stage with data and feature engineering. The new concept of data science automation
goes beyond machine learning automation, including data preparation, feature
engineering, machine learning, and the production of full data science
pipelines. With data science automation, enterprises can genuinely accelerate
AI and ML initiatives.
Existing Resources for Democratization
Despite substantial investments in data science
across many industries, the scarcity of data science skills and resources often
limits the advancement of AI and ML projects in organizations. The shortage of data scientists has created a
challenge for anyone implementing AI and ML initiatives, forcing a closer look
at how to build and leverage data science resources.
Other than the need for highly specialized
technical skills and mathematical aptitude, data scientists must also couple
these skills with domain/industry knowledge that is relevant to a specific
business area. Domain knowledge is required for problem definition and result
validation and is a crucial enabler to deliver business value from data science.
Relying on “data science unicorns” that have all these skill sets is
neither realistic nor scalable.
Enterprises are focusing on repurposing existing
resources as “citizen” data scientists. The rise of AutoML and data
science automation can unlock data science to a broader user base and allow the
practice to scale. By empowering citizen data scientists allowing them to
execute standard use cases, skilled data scientists can focus on high-impact,
technically-challenging projects to produce higher values.
Insights for Greater Transparency
As more organizations are adopting data science
in their business process, relying on AI-derived recommendations that lack
transparency is becoming problematic. Increased regulatory oversight like the
GDPR has exacerbated the problem. Transparent insights make AI models more
“oversight” friendly and have the added benefit of being far more actionable.
White-box AI models help organizations maintain
accountability in data-driven decisions and allow them to live within the
boundaries of regulations. The challenge is the need for high-quality and
transparent inputs (aka “features”), often requiring multiple manual
iterations to achieve the needed transparency. Data science automation allows
data scientists to explore millions of hypotheses and augments their ability to
discover transparent and predictive features as business insights.
Data Science in Business
Although ML models are
often tiny pieces of code, when models are finally deemed ready for production,
deploying them can be complicated and problematic. For example, since data
scientists are not software engineers, the quality of their code may not be
production-ready. Data scientists often validate the models with down-sampled
datasets in labs environments and models may not be scalable enough for
production-scale datasets. Also, the performance of deployed models decreases
as data invariably changes, making model maintenance pivotal to extract
business value from AI and ML models continuously. Data and feature pipelines
are much bigger and more complex than ML models themselves, and
operationalizing data and feature pipelines is even more complicated. One of the promising approaches is to
leverage concepts from continuous deployment through APIs. Data science
automation can generate APIs to execute the full data science pipeline,
accelerating deployments while also providing an ongoing connection to
development systems to accelerate the optimization and maintenance of models.
Data science is at the heart of AI and ML. While the promise of AI is real, the problems associated with data science are also real. Through better planning, closer cooperation with line of business and by automating the more tedious and repetitive parts of the process, data scientists can finally begin to focus on what to solve, rather than how to solve.
Sign up for the free insideBIGDATA newsletter.