Codeless and low-code machine learning platforms still require personnel

Codeless and low-code (horizontal) machine learning platforms are useful in advancing data science in an enterprise. Yet, as many organizations are now discovering, there are many ways that data science can go wrong to solve new problems. Zillow suffered billions of dollars in losses buying homes using a data-driven flawed home valuation model. Data-driven HR technology, especially when based on facial recognition software, has been shown to bias hiring decisions against protected classes.

While automation is a great tool to have in your arsenal, you need to consider the challenges before using a horizontal ML rig. These platforms must be flexible, configurable and controllable in order to be robust and deliver consistent added value over time. They must allow the data to be weighted in a flexible manner in a user-controlled manner and have data visualization tools to detect outliers and noise contributors. They also need automated model settings and data drift monitors to alert users to changes. As you can see, we haven’t evolved beyond the point where algorithms surpass human intelligence.

So don’t be fooled by AI / ML / low code… you still need people. Let’s take a closer look at the reasons why.

Machines learn from humans

Trying to replace human data scientists, domain experts, and engineers with automation is a haphazard proposition that could spell disaster if applied to critical decision-making systems. Why? Because humans understand data in a way that automated systems still struggle.

Humans can tell the difference between data errors and unusual data (e.g. Game / Stop / GME trade in february) and align unusual data models with real world events (e.g. 9/11, COVID, financial crises, elections). We also understand the impact of calendar events such as holidays. Depending on the data used in ML algorithms and the predicted data, the semantics of the data can be difficult for machine learning algorithms to discover. Forcing them to uncover these hidden relationships is unnecessary if they are not hidden from the human operator.

Semantics aside, the trickiest part of data science is distinguishing between statistically good results and useful results. It’s easy to use estimation statistics to convince yourself that you are performing well or that a new model performs better than an old model, when in fact neither model is useful. to solve a real problem. However, even with valid statistical methodologies, there is still a component to interpreting the modeling results that requires human intelligence.

When developing a model, you often encounter issues regarding the model estimation statistics to measure: how to weight them, evaluate them over time, and decide which results are significant. Then there’s the whole problem of over-testing: If you test too frequently on the same set of data, you end up “learning” your test data, making your test results overly optimistic. Finally, you need to build models and figure out how to pull all of these statistics together into a simulation methodology that will be achievable in the real world. You should also consider that just because a machine learning platform has been successfully deployed to solve a specific modeling and prediction problem does not mean that you repeat the same process on a different problem in that area or in that area. a different vertical will lead to the same success.

There are so many choices that need to be made at every step of the data science research, development, and deployment process. You need experienced data scientists to design experiments, domain experts to understand the boundary conditions and nuances of data, and production engineers who understand how models will be deployed in the real world.

Visualization is a gem of data science

In addition to data weighting and modeling, data scientists also benefit from data visualization, a very manual process, and more of an art than a science. Plotting raw data, correlations between data and predicted quantities, and time series of coefficients resulting from estimates over time can produce observations that can be fed back into the model building process.

You may notice data periodicity, perhaps a day of the week effect, or abnormal behavior while on vacation. You can detect extreme movements in the coefficients which suggest that the outliers are not handled well by your learning algorithms. You may notice different behavior between the subsets of your data, which suggests that you could separate the subsets of your data to generate more refined models. Again, self-organizing learning algorithms can be used to try to uncover some of these patterns hidden in the data. But a human might be better equipped to find these models and then integrate the information from them into the process of building the model.

ML horizontal platforms should be monitored

Another important role that people play in the deployment of ML-based AI systems is model monitoring. Depending on the type of model used, what it predicts, and how those predictions are used in production, different aspects of the model need to be monitored so that deviations in behavior are tracked and problems can be anticipated before they result. a degradation. in real world performance.

If the models are regularly reformed using more recent data, it is important to monitor the consistency of the new data entering the training process with the data previously used. If production tools are updated with new models trained on more recent data, it is important to verify that the new models are as similar to the old models as one would expect, where expectations depend on the model and task.

There are clearly huge benefits to applying automation to a wide range of problems in many industries, but human intelligence is still intrinsic to these developments. You can automate human behavior to a certain extent, and in controlled environments, replicate the power and performance of their work with ML-based, low-code and codeless AI systems. But, in a world where machines are still heavily dependent on humans, never forget the power of people.

Comments are closed.