data processing Archives - Sigma IT
Skip to content
SOFTWARE Development

Data as the Fuel for the AI Future: 

The Role of Data in Modern Software Development


One of the sparks of inspiration for this article was reading a book, “Scary Smart: The Future of Artificial Intelligence and How You Can Save Our World.” The author, Mo Gawdat, Google’s former Chief Business Officer, shares the insights, fears, and hopes. He ignites curiosity and caution, unraveling AI’s impact on our world while balancing potential and responsibility. Additionally, I’ve incorporated insights about AI from my colleagues working at Sigma IT Poland, who offered valuable real-world perspectives.


Artificial intelligence (AI) advancements have significantly impacted various industries recently, opening new possibilities for innovation and growth. One of the key approaches involves leveraging vast amounts of data (big data) for machine learning (ML) and enabling intelligent decision-making algorithms. This approach is crucial in finance, healthcare, transportation, and medicine.

However, data-driven AI poses new challenges and risks that need careful analysis. Understanding these issues and developing appropriate solutions is essential for introducing artificial intelligence responsibly and ensuring its ethical and sustainable use. The risks associated with AI stem from human influence – those who write the code and teach AI to mimic human behavior.

The Role of Data Quality

In artificial intelligence development, leveraging data is crucial for progress. This approach uses data as the driving force, empowering machines to learn and make smart decisions. AI machines utilize algorithms to sift through extensive datasets, identifying patterns and evolving through a process akin to natural selection. Observing the learning patterns of machines reveals a striking resemblance to how young children learn.

The quality of the data we provide plays a critical role in influencing the capabilities and outcomes of these AI systems. The data’s accuracy, relevance, and diversity shape the learning journey of AI machines, determining the depth and breadth of their understanding and decision-making abilities. In essence, the effectiveness of AI is intricately tied to the quality of the data it is exposed to during its developmental phase.


An experienced Data Scientist, Joanna Dubiel, gained valuable expertise in artificial intelligence while working on a project that predicted customer churn based on behavior. This is what Joanna says about the data preparation phase

Collecting, cleaning, and processing information was crucial. In our case, we analyzed historical data to identify patterns in customer behavior. It was based on this data that we trained our AI model.

Balancing Fear and Excitement

The emergence of new challenges and risks, previously overlooked, is a natural phenomenon observed in introducing new technologies. Human limitations in processing information led to a perceived threat from things they cannot fully understand or control, and this is the direction in which AI development is heading.


Kamil Gierczyk is working on a project for a pharmaceutical company where artificial intelligence (AI) is utilized in the drug discovery process. He serves as a backend engineer, with his primary responsibility being the development of applications and the provision of a platform enabling the efficient deployment of AI solutions.

AI models accelerate drug discovery, enabling scientists to respond quickly to evolving project needs. In the model training process, I find it fascinating how rapidly and efficiently we can generate new solutions.


In business, adapting to the pace of model evolution is crucial to avoid obsolescence and maintain competitiveness.

For instance, during the first meeting with a client for whom I designed lamps, utilizing AI solutions allowed us to develop three models in a timeframe that would typically require a week’s work for a single lamp. This demonstrates that the speed of action is pivotal.

Potential scenarios

Mo Gawdat presents insightful observations on AI, its potential scenarios of development, and coexistence with humans in his book “Scary Smart.” The author anticipates three highly probable, unavoidable stages in the development of artificial intelligence.

Stage 1: Emergence of Uncontrolled, Independent AI Entity

There is virtually no current possibility to halt the ongoing and rapidly accelerating process of artificial intelligence becoming an uncontrolled, independent entity globally.

Stage 2: AI Surpassing Human Intelligence

AI will surpass human intelligence with the ability to process vast amounts of data and instantly identify complex patterns. At this stage, humans will lose understanding of the decision-making processes occurring in AI algorithms, relinquishing control.

Stage 3: Inevitable Challenges

With the lack of understanding of algorithms and other human factors in the initial stages of algorithm creation, the third practically unavoidable stage emerges—undesirable events will occur.

Pitfalls in AI Algorithm Development

We live in a time of accelerated development of artificial intelligence, leading to new areas of potential use. However, the development of this technology often focuses on maximizing profits for individuals, groups, or those with access to power and funds. A single-minded pursuit of profit may close our eyes to potential threats.

Additionally, the high availability of online resources for programming and the increasing media buzz around ML/AI contribute to the creation of algorithms containing human errors in the early stages, propagated uncontrolled. While mistakes are inevitable, artificial intelligence is based on machine learning algorithms, which rely heavily on data type, quality, and quantity.


This is Joanna’s perspective on human influence, specifically about collaboration with the business team concerning client behavior:

The first step was defining the problem we aimed to solve. Collaborating with the business team, we formulated the hypothesis precisely, so the model could recognize and classify data effectively. This is a crucial stage because people change their behaviors, and as a result, certain assumptions may become outdated. For example, customers used to rely more on computers, but now they use smartphones more. This must be considered to ensure that predictions are as close to reality as possible.

The Role of Machine Learning (ML) in Artificial Intelligence (AI)

Machine learning, by definition, is a subset of artificial intelligence. It focuses on teaching computers how to learn from data and improve as they gain experience, like how a child learns new activities. The learning and improvement processes are not determined by code. In the context of machine learning, algorithms are trained to detect patterns and relationships in extensive datasets and make optimal decisions or draw conclusions based on them. Over time, ML-based systems become more precise, and the quality of their predictions improves with access to better datasets.

The necessity of working with vast amounts of data gave rise to a new term—Big Data. This concept refers to processing enormous, dynamic, and diverse datasets. Despite the complexity of processing such information, analyzing these data sets is valuable, providing new perspectives and knowledge.

With the advent of big data, the IT industry market has evolved, leading to a high demand for data engineers. These specialists assist in proper data collection and processing, evaluating their usefulness for a specific purpose. Often, these individuals possess broad interdisciplinary knowledge with a programming background. 


Once we had a solid data foundation, it was time to choose the appropriate model. We analyzed various architectures, attempting to find the one that best-represented dependencies in our problem.

The AI model training process involved a lengthy stage of testing different configurations to achieve the best results. We monitored and adjusted the results in real time to ensure that our model performed well on the training data and had a high generalization ability for new data.

It was an iterative process where continuous adjustment of parameters and model architecture led to optimal outcomes.

Working with Large Amounts of Data: New Possibilities and Challenges

Working with a large amount of data and its acquisition brings new possibilities and challenges, especially regarding security. At least four general potential threats should be considered when implementing solutions based on AI.

Threat 1: Lack of Proper Data Verification and Data Stream Replication (Process Repeatability)

One of the main challenges in the AI security domain is ensuring the accuracy and standardization of data acquisition. Inadequate data control and a lack of repeatability in the data acquisition process can lead to inconsistency, affecting the quality of models based on AI. Data quality control aims to verify the accuracy and completeness of the dataset, while data stream replication (process repeatability) ensures consistency and reliability in data processing.


I collaborate with a team of designers who create AI models for training machines used in drug discovery.

Our client’s team possesses deep expertise in the nuances of training these models, while my contribution primarily centers on the infrastructure aspects and ensuring the seamless delivery of solutions for model deployment and serving.

Threat 2: Risks Associated with Data Providers and Data Origin

This is one of the key threats to AI security, especially considering the origin of information. Companies often rely on external data sources (data warehouses), and without proper control, they may unknowingly introduce inaccurate or even harmful data into their AI systems. It is crucial to thoroughly understand data sources, including their context, history, and potential biases.

Threat 3: Insufficient Volume and Complexity of Data Sets

Another crucial element of AI models is data. If datasets are incomplete or inadequately complex, it can pose a significant risk to the quality of obtained results. Limited data can lead to overfitting, where the model performs well with training datasets but struggles with previously unknown information. This may result in biased models due to insufficiently diversified data access during learning.


In the project, we encounter challenges, particularly in ensuring high system performance under significant load, even with millions of iterations daily.

Our current focus is on orchestrating the entire infrastructure. This involves intricately managing the interplay between various AI models and developing a robust authorization system.

Our goal is to enhance the efficiency and security of our platform, making it an exemplar in the field.

Threat 4: Limited Internal Data Availability

AI models require large amounts of training data to be sufficiently accurate. When internal data sources within an organization are insufficient, organizations often seek assistance from external data providers, which can generate additional problems related to security and privacy policy. Using data from external sources comes with the risk of data quality, authenticity, and potential biases.

These questions aim to provoke thoughtful insights from AI specialists and contribute to the ongoing dialogue surrounding the responsible development and deployment of artificial intelligence.

Never miss a thing With Sigma IT´s newsletter you get all the latest updates on everything we do.

With Sigma IT´s newsletter you get all the latest updates on everything we do.