Creating an annotation platform from scratch
Introduction
Since annotations are critical to ensuring quality data for natural language processing (NLP) projects. The core idea of the project is to provide a simple and intuitive user interface for annotating on any dataset efficiently.
The platform has two APIs, one for the ingestion layer, and another for the serving layer, to simplify the integration process with other applications. The ingestion API is used to upload data and annotations to the platform, while the serving API is used to retrieve annotations for processed data.
With a product mindset, I focused on building the simplest version first and see the limitations and advantages of the platform, at the end I will point out some improvements that will be done at the end.
In this project I will use FastAPI, Argilla and Docker.
Architecture
The architecture is the following:
The main challenge to get this project going is to create an easy interface between the data producer and Argilla (Ingestion), and the data consumer and Argilla (Serving).
In my humble opinion, when it comes to data curation platforms for NLP tasks, Argilla stands out as the ultimate choice. With its exceptional user-friendly interface, it undoubtedly surpasses other options available in the market.
We can see that the potential fail points consist mainly on adding and fetching data from Argilla. The assumption is that over a certain load it will not insert any data on Argilla, and we will not be able to fetch data using the serving API as well. This is explored in the next session using load testing.
Load testing
Load testing is the process of measuring the performance of a system or application under a specific workload. In the context of an API, load testing is used to simulate numerous concurrent requests to the API in order to evaluate how well it can handle the load.
When testing an API solution, it is essential to know how well it will perform under heavy loads. If an API is designed to handle only a few requests at a time and is subjected to a sudden surge in traffic, it could become overwhelmed, leading to decreased performance, errors, or even crashes. Load testing helps identify potential bottlenecks in the architecture of the solution before it is deployed to production, which can save time, money, and prevent negative impacts on end-users.
By performing load testing, developers can simulate real-world scenarios where the API is used by multiple users simultaneously. It measures the API’s ability to handle numerous transactions per second, assesses its response time, and detects any potential issues. It also provides data on the maximum capacity of the API and helps to identify areas of improvement in the architecture.
For example, if the load testing identifies a specific endpoint that is causing high CPU usage, it indicates a potential bottleneck in the code or database. By identifying such issues early on, developers can take steps to optimize the API’s performance, such as caching or query optimization.
In conclusion, load testing is an essential step in the API development process that helps to ensure that the solution can handle large volumes of requests and identify any potential issues before deployment. It provides developers with valuable insights into the API’s performance and scalability, allowing them to make informed decisions and optimize the solution for better performance.
Ingestion API
For the first load testing, we run 50 users at a rate of 1 second each. We have two interesting results:
1) We find an interesting error:
ERROR:argilla.client.client: client.py:103
Cannot log data in dataset
'toxic_texts'
Error: LiveError
Details: Only one live display may be active at once``
This error means that we have data that is lost on the upload to argilla. This problem, as far as I know, comes up because that by adding the data directly from the Ingestion API we make requests to add data to argilla using the python API asynchronously and it appears that Argilla doesn’t support this. We can surpass this problem by delegating the upload to Argilla to another part, for instance, the Ingestion API adds data to a database or file storage and then after an amount of time we upload this data to Argila in batch format.
2) The plot regarding the load testing is the following:
When the number of users increases, the load on the system also increases, which can cause a strain on the available resources such as server capacity and bandwidth. If the resources are insufficient to handle the increased load, it could lead to slower response times and an increased number of failed requests, which could ultimately impact the user experience.
The limited Request per Second is an indication of the maximum number of requests that the API can handle in a given period. When this limit is reached, any additional requests will be rejected, leading to failed requests and possibly a degraded user experience. Response times of ~60 seconds indicate that the ingestion API is struggling to process requests efficiently. When response times are slow, it leads to a poor user experience, and should be addressed.
One way of addressing it is storing data in a database or file could be a feasible solution to this problem. By doing so, data can be processed and analyzed in batches, which could improve the overall efficiency of the system. This approach could also provide a buffer for handling a sudden surge in requests. However, implementing a database or file storage system requires additional resources and could increase the complexity of the system. Therefore, it is essential to carefully evaluate the trade-offs between performance, cost, and complexity when considering this solution.
Serving API
This plot shows load testing of the serving API. However, the test has limited data, with only ~100 toxic and ~100 non-toxic texts. As the number of texts increases, the query time will also increase:
The relationship between the number of users and response time is an important consideration when designing and scaling software systems. As the number of users accessing a system grows, the demand for resources increases, and the system may experience performance issues, such as increased response time.
As stated before, the best solution is to serve the data on a database, in this case an analytical database is best suited as it can handle a huge amount of data.
Improvements
One possible enhancement that immediately comes to mind is implementing a database layer prior to loading data into Argilla in batch. This addition would facilitate efficient data management and retrieval processes.
Furthermore, it would be advantageous to incorporate a mechanism for fetching data randomly from the Serving API. This feature would be particularly valuable for training machine learning models, as it would allow users to access diverse datasets each time they interact with the API.
Another improvement is that now Argilla have user role management, which could be added to this platform.
Feel free to check the project on GitHub and give it a star: https://github.com/nahumsa/annotation-platform