Calculating machine learning costs: price factors and estimates from the JustSoftLab portfolio

Roman Vinokurov
Sep 17, 2024
10 min read

TL;DR

Generally, machine learning development costs can start from as low as $10,000 and go upwards of $1,000,000 for a customized enterprise solution

Several essential factors come into play when determining the ML costs, including the complexity of the solution, an ML model training method, and data quality and availability, to name a few

Please note that the ML cost estimates we provide in this article refer solely to the development of the machine learning component. Other expenses, such as infrastructure costs, application interfaces, and back-end solution development, must also be added to the final ML cost estimate.

Recently, we published an article shedding light on the costs of developing an AI solution. In this blog post, we will focus on one of AI subsets, machine learning, and estimate how much it costs to train, deploy, and maintain intelligent algorithms.

To keep it practical, we sat down with Illia Sivach, JustSoftLab CTO, and asked him to draw machine learning cost estimates from our portfolio. He also shared his expertise in developing ML solutions and listed the steps businesses should take in order to reduce investments in machine learning—without sacrificing quality or time to market.

Machine learning costs: factors to consider

But before getting down to numbers, let’s quickly highlight the factors determining the final cost of a machine learning solution.

1. The complexity of the solution you’re eyeing to create

Machine learning solves many problems of different complexity. Social media engines making friend and content suggestions, smart surveillance cameras recognizing faces in video footage, and healthcare expert systems predicting heart failures run on machine learning. However, their complexity, performance, responsiveness, compliance requirements, and, hence, costs vary a lot.

2. The approach to training an ML model

Training a machine learning model can take one of three approaches: supervised learning, unsupervised learning, or reinforcement learning.

The choice among these methods significantly influences the costs associated with machine learning development. Let’s investigate how.

Supervised learning. This method involves training algorithms on datasets, with each example labeled with the correct output. It teaches the algorithm how to accurately classify data or predict outcomes using these examples. While supervised learning appears to require fewer computing resources than other methods, it is important to consider the potentially high costs of acquiring or creating a well-labeled dataset.

Unsupervised learning. In unsupervised learning, algorithms analyze and identify patterns in data without any prior labeling. This method necessitates human intervention, not only for validating the results but also for data preprocessing and pattern analysis. Although it requires significant computational power to handle large amounts of unclassified data, unsupervised learning can reveal insights that were not obvious or possible to define in advance.

Reinforcement learning. This method involves an agent that learns to make decisions by performing actions in a given environment and receiving feedback in the form of rewards or penalties. In contrast to supervised learning, which is based on a predefined dataset, reinforcement learning requires the model to interact dynamically with its environment, learning from the consequences of each action. This can be computationally intensive and may necessitate advanced infrastructure, particularly in complex environments.

Let us consider the cost implications of selecting a specific approach to ML model training.

Using supervised learning may require less computational power, making it appear cost-effective. However, the costs associated with creating or obtaining a labeled dataset can be significant.

Unsupervised and reinforcement learning, while computationally more demanding, do not require labeled data, which can lead to ML cost savings in situations where unlabeled data is abundant but labeling is impractical or prohibitively expensive.

Organizations looking to minimize ML model training costs might consider using foundation models like the GPT series from OpenAI. Specifically, this approach works well if you’re working on a generative AI solution.

Foundation models are pre-trained on large datasets and can be customized for specific tasks. This approach significantly reduces the need for large-scale data collection and computational resources, making it a more cost-effective alternative to training a model from scratch.

3. The availability and quality of training data

No matter the approach to machine learning, you will need enough data to train the algorithms on. Machine learning costs thus include the price of acquiring, preparing, and—in the case of supervised learning—annotating training data.

If you have enough training data on hand, you’re lucky. However, it’s rarely the case. Numerous researchers state that around 96% of enterprises do not initially have enough training data. For your reference, a study by Dimensional Research shows that, on average, ML projects need around 100,000 data samples to perform well.

You can synthetically generate the needed volume of data or augment the data you already have. Generating 100,000 data points via Amazon’s Mechanical Turk, for example, can cost you around $70,000.

Once you have enough data on hand, you need to make sure it’s of high quality. The study referenced above suggests that 66% of companies run into errors and bias in their training data sets. Removing those can take 80 to 160 hours for a 100,000-sample data set.

In case you opt for supervised learning (which is often the case for commercial ML solutions), you need to add the price of data annotation to the total machine learning cost, too. Depending on the complexity of labeling, it can take 300 to 850 hours to get 100,000 data samples labeled.

Drawing the line, a solid training data set of high quality can cost you anything from $25,000 to $65,000, depending on the nature of your data, the complexity of annotation, as well as the composition and location of your ML team.

4. The complexity and length of the exploratory stage

During an exploratory phase, you carry out a feasibility study, search for an optimal algorithm, and run experiments to confirm the chosen approach.

The cost of exploration depends on the complexity of the business problem, the expected time to market, and, subsequently, team composition.

As a rule, a team of a business analyst, a data engineer, an ML engineer, and—optionally—a project manager is enough to carry out the task. In that case, you can expect the exploratory stage to round out at $39,000 to $51,000.

5. The cost of production

Machine learning costs include the cost of production, too. Production costs involve the costs of the needed infrastructure (including cloud computing and data storage), integration costs (including designing a data pipeline and developing APIs), and maintenance costs.

Cloud resources
The price of the cloud infrastructure depends on the complexity of the models being trained. If you are building a simpler solution that relies on data of low dimensionality, you may get by with four virtual CPUs running on one to three nodes. This may cost you around $150 to $300 a month, or $1,460 to $3,600 a year.
If the solution you’re eyeing to create requires high latency and relies on complex deep learning algorithms, expect a monthly cost of $10,000 minimum to be added to the total ML price.

Integrations
Developing integrations involves designing and developing the data pipeline and the needed APIs. Putting together a data pipeline takes up around 80 development hours. Putting two to three API endpoints in place and documenting them to be used by the rest of the system requires another 20 to 30 hours, the cost of which should be added to the final machine learning cost estimates.

Support and maintenance
Machine learning models need ongoing support during their entire life cycle: incoming data must be cleansed and annotated; models must be retrained, tested, and deployed
According to the study conducted by Dimensional Research, businesses commit 25% to 75% of the initial resources to maintaining ML algorithms.

Assuming that the initial solution architecture and data pipelines are well designed and part of the recurring tasks is automated, the support cost can range from $20,000 to $150,000 per year based on the selected support model.

6. The cost of consulting

If you’re just tipping toes in the machine learning waters, you can’t really get too far without an experienced ML consultant.

Two main factors determining the cost of ML consulting include:

Consultant’s experience. It is worth making experience a critical factor in your hiring decision. You want to partner with someone who has enough expertise in the field you may not necessarily be familiar with.

Project scope. The more complicated the project, the more consulting involvement it will require. Moreover, if the project scope is undefined, search for a consultant who can carry out a discovery phase for you and offer a compelling proposal with all the necessary estimations.

Our ML consulting rates usually start from $65 per hour and depend on the seniority level of a specialist.

7. Opportunity costs

Opportunity costs can be defined as forfeiting all benefits associated with not taking an alternative route. To put things into perspective, think of Blockbuster, a former leader in the movie rental market. Foregoing innovation, the company lost to a newly emerged leader—Netflix. The opportunity cost equaled $6 billion and a near-bankruptcy.

The same idea goes for machine learning initiatives. Enterprises lagging in ML adoption can’t tap into the predictive insights and informed decision-making that come with it.

On the opposite side, implementing machine learning just for the sake of innovation, say, to solve problems that require rule-based solutions, is a loss as well.

Therefore, you may want to consider the cost-to-benefit ratio and carefully weigh implementation risks before introducing AI in business. This is where expert AI consulting services may help.

So, how much does ML cost? These estimates from JustSoftLab's portfolio might give you an idea

Now that you're familiar with the factors influencing the overall cost of ML, let’s look at a few examples from JustSoftLab's portfolio to help you better understand the associated costs.

Please note that we also estimate efforts. This is because the cost of developing an ML solution largely depends on the composition and location of your ML development team. You can get an idea of the overall cost associated with developing a similar ML solution based on the following hourly rates for ML engineers:

Location	Average Hourly Rate
United States	$130
Central Europe	$75–85
Eastern Europe	$65–75
Asia	$30
Latin America	$20

Keep in mind that the estimated budgets below apply exclusively to the development of the machine learning component within these solutions. It's important to consider additional costs like infrastructure, productization, and other associated expenses, as machine learning works in tandem with various elements within the broader solution.

Project 1: Emotion Recognition Solution

A multinational media and entertainment company wanted to analyze surveillance footage to recognize people's emotions. The challenge was compounded by deteriorating visual conditions, such as the quality of the video, and people wearing masks, glasses, and other items that hindered recognition.

The media giant sought a reliable software provider to conduct extensive research and support future developments. The JustSoftLab team of two ML engineers tested three neural networks, selected the optimal one for the task, fine-tuned it for better performance, and provided additional strategies for achieving higher accuracy.

ML Team Effort:

350 hours

ML Costs:

Approximately $26,000

Project 2: Fitness Mirror with Personal Trainer Inside

The client aimed to create an innovative fitness mirror that could serve as a personal trainer, offering personalized workout plans and providing real-time guidance during exercises.

The JustSoftLab team built the hardware components of the smart device and provided end-to-end software development, infrastructure setup, firmware development, and content management.

Regarding the machine learning component, we developed and trained a deep learning model using a dataset of workout recordings to provide guidance to users. Additionally, we implemented computer vision algorithms for motion tracking and pose estimation, along with object recognition algorithms to monitor the fitness equipment used during workouts.

ML Team Effort:

Approximately 640–700 hours

Cost:

Approximately $51,000–$56,000

Project 3: Automated Document Recognition Solution

Our client wanted to create a solution that would automate document processing. The primary goal of the project was to develop an independent optical character recognition (OCR) solution that could recognize and index incoming document batches and seamlessly integrate with the client’s existing document processing system.

The OCR solution we developed helps automate the traditionally resource-intensive process of labeling and indexing documents, resulting in time and cost savings. By dramatically reducing the manual effort typically allocated to document labeling and indexing, the solution allows more documents to be processed in the same timeframe. The result? Increased efficiency and fast, accurate processing of critical documents.

ML Team Effort:

Approximately 3000–4000 hours

ML Costs:

$225,000–$300,000

How to Reduce ML Development Costs and Ensure a Quick ROI

If you are considering venturing into AI development and looking for ways to reduce machine learning costs without compromising product quality, take a look at our practical recommendations.

Start Small, but Keep the Bigger Picture in Mind

When launching an ML project, it often makes sense to limit the initial scope. By starting with a minimum viable product (MVP), you can focus your resources on a specific problem and iterate quickly. This approach helps save ML costs in several ways:

Starting small allows you to test your ideas and hypotheses with a smaller dataset and a reduced set of features. This, in turn, allows you to quickly assess the feasibility and effectiveness of your ML solution without committing significant resources upfront.
Reducing scope helps you identify and resolve potential issues or bottlenecks in your machine learning pipeline early on, avoiding costly rework in later development stages.
By prioritizing critical use cases and features, you can more effectively allocate resources and focus on areas that deliver the quickest return on investment, rather than tackling the entire project at once.

Follow MLOps Best Practices from Day One to Avoid Scalability Issues

MLOps refers to a set of practices that enhance collaboration and automation in ML development projects. By setting up an MLOps pipeline from the start, you can mitigate potential scalability issues and reduce machine learning costs. Cost savings come from:

Optimized Development Process: MLOps promotes standardization and automation while reducing the need for manual, error-prone operations.
Scalable Infrastructure: MLOps focuses on building scalable infrastructures to support the entire ML development lifecycle, from data preprocessing to model deployment. This helps accommodate growing data volumes, increasing model complexity, and user demand without requiring significant infrastructure changes.

CI/CD: Continuous Integration/Continuous Deployment (CI/CD) practices ensure that changes made to your ML solution are automatically integrated, tested, and deployed in a reliable and automated manner.

Leverage Pre-Trained Machine Learning Models

Using pre-trained machine learning models helps reduce machine learning costs in several ways:

Transfer Learning: Pre-trained models, serving as a starting point for many ML tasks, allow you to transfer knowledge gained from solving another, but related, problem to the one at hand, significantly saving on computational resources and training time.

Reduced Data Requirements: Training machine learning models from scratch requires large volumes of annotated data, which can be costly and time-consuming to collect and label. Pre-trained models can be fine-tuned with relatively small domain-specific datasets.

Faster Prototyping and Iteration: Pre-trained models allow for quick prototyping and iteration of your ML solution.

#MachineLearning #AI #MLDevelopment #ArtificialIntelligence #MLCosts #TechInnovation #SupervisedLearning #ReinforcementLearning #MLOps #AIConsulting #DataScience #JustSoftLab #DigitalTransformation #BusinessEfficiency #GreenTech