Simulating Data Sets for AI Enabled Infrastructure Software Products

- Solving the Data Sets Problem

1. Industry 

AI specifically Machine Learning has come a long way from being just another buzzword in the IT Infrastructure software products space to become a key differentiator for enterprises worldwide when choosing solutions to manage their IT platforms 

Intelligent inference for anomaly detection and auto-healing can clear the noise and help focus on tasks that drive real valueTherefore, new age ISVs in this space are investing in adding AI features of predicting, anomaly detection & auto-healing capabilities 

2. Problem of Datasets for Modelling (AI/ML) 

Machine Learning models, as any data scientist will tell you are only as good as the data that is used to build them. For any product to be created using machine learning, large volumes and variety of data needs to be fed to the model to train it, before it can be launched for production use. Also, datasets need to be generated under varied scenarios to evolve and mature the ML models. And here in lies the real problem. Enterprises are not going to share their production data from live systems for ISVs to develop their products. So, where would ISVs get all this data regularly? 

3.Solving the Dataset Problem 

The answer is for ISVs to create their own data, to train the models by running simulations. Creating data in itself is a fairly involved process and it pays dividends to have a standard approach to creation of such data sets. Especially, when the ISVs are targeting multiple tech stacks and variations as targets for their products. 

1CloudHub Nimbus Automation framework is spread across 3 phases: 

  1. Environment creation for deploying target tech stack, monitoring tools and data generation systems. 
  2. Generating test data and simulation scenarios for the target tech stack to be monitored. 
  3. Running the simulations and monitoring to generate the metrics data to be used in ML model creation. 


4. A Success Story – A US Based Product Company

The Need

A cloud application performance assurance company was adding AI/ML features that proactively and continuously assures application performance, using anomaly detection and auto-healing recommendations. To enhance the model performance, generated additional metrics data from MongoDB clusters on Kubernetes through various configurations. 

Through our Nimbus Automation Framework

  • Deployment of MongoDB on Kubernetes using Kops on EC2 instances.  
  • Automated host level configuration of the cluster nodes.  
  • Generated test records to simulate MongoDB load.  
  • Deployed Prometheus with customized node exporter, MongoDB exporter to collect performance data of the cluster.  
  • Ran test simulations with the test data generated on MongoDB. 
  • Exported and presented the simulated performance of MongoDB, Kubernetes cluster for model creation. 


Faster deployment of the environment, reducing deployment time from weeks to hours.

Repeatable tests for with configurable variations across the stack.

Cost savings, leveraging on demand cloud resources.

Contact us for more details about solving data sets problem.

Sharing is caring!

Subscribe to our Newsletter1CloudHub