Discover how Blue scaled CSV imports and exports 10x using Rust and scalable architecture and strategic tech choices in B2B SaaS.
July 18, 2024
At Blue, we're constantly pushing the boundaries of what's possible in project management software. Over the years, we've released hundreds of features
Our latest engineering feat?
A complete overhaul of our CSV import and export system, dramatically improving performance and scalability.
This post takes you behind the scenes of how we tackled this challenge, the technologies we employed, and the impressive results we achieved.
The most interesting thing here is that we had to step outside of our typical technology stack to achieve the results that we wanted. This is decision that has to be made thoughtfully, because the long-term repercussion can be severe in terms of technology debt and long-term maintenance overhead.
Our journey began with a request from an enterprise customer in the events industry. This client uses Blue as their central hub for managing vast lists of events, venues, and speakers, integrating it seamlessly with their website.
For them, Blue isn't just a tool — it's the single source of truth for their entire operation.
While we are always proud to hear that customers use us for such mission-critical needs, there is also a large amount of responsibility on our side to ensure a fast, reliable system.
As this customer scaled their operations, they faced a significant hurdle: importing and exporting large CSV files contaning 100,000 to 200,000+ records.
This was beyond the capabilty of our system at the time. In fact, our previous import/export system was already struggling with imports and exports containing more than 10,000 to 20,000 records! So 200,000+ records was out of the question.
Users experienced frustratingly long wait times, and in some cases, imports or exports would fail to complete altogether. This significantly affected their operations as they relied on daily imports and exports to manage certain aspects of their operations.
Multi-tenancy is an architecture where a single instance of software serves multiple customers (tenants). While efficient, it requires careful resource management to ensure one tenant's actions don't negatively impact others.
And this limitation wasn't just affecting this particular client.
Due to our multi-tenant architecture—where multiple customers share the same infrastructure—a single resource-intensive import or export could potentially slow down operations for other users, which in practice often happened.
As usual, we did a build vs buy analysis, to understand if we should spend the time to upgrade our own system or buy a system from someone esle. We looked at various possibilities.
The vendor that did stand out was a SaaS provider called Flatfile. Their system and capabilities looked like exactly what we needed.
But, after reviewing their pricing, we decided that this would end up being an extremely expensive solution for application of our scale — $2/file starts to add up really quickly! —and it was better to extend our built-in CSV import/export engine.
To tackle this challenge, we made a bold decision: introduce Rust into our primarly Javascript tech stack. This systems programming language, known for its performance and safety, was the perfect tool for our performance-critical CSV parsing and data mapping needs.
Here's how we approached the solution.
The foundation of our solution was the introduction of background services to handle resource-intensive tasks. This approach allowed us to offload heavy processing from our main server, significantly improving overall system performance. Our background services architecture is designed with scalability in mind. Like all components of our infrastructure, these services auto-scale based on demand.
This means that during peak times, when multiple large imports or exports are being processed simultaneously, the system automatically allocates more resources to handle the increased load. Conversely, during quieter periods, it scales down to optimize resource usage.
This scalable background service architecture has benefitted Blue not only for CSV imports & exports. Over time, we've moved a substantial number of features into background services to take load off our main servers:
The heart of our CSV processing solution is a custom Rust module. While this marked our first venture outside our core tech stack of Javascript, the decision to use Rust was driven by its exceptional performance in concurrent operations and file processing tasks.
Rust's strong suits align perfectly with the demands of CSV parsing and data mapping. Its zero-cost abstractions allow for high-level programming without sacrificing performance, while its ownership model ensures memory safety without the need for garbage collection. These features make Rust particularly adept at handling large datasets efficiently and safely.
For CSV parsing, we leveraged Rust's csv crate, which offers high-performance reading and writing of CSV data. We combined this with custom data mapping logic to ensure seamless integration with Blue's data structures.
The learning curve for Rust was steep but manageable. Our team dedicated about two weeks to intensive learning for this.
The improvements were impressive:
Our new system can process the same amount of records that our old system could process in 15 minutes in around 30 seconds.
For the web server component of our Rust implementation, we chose Rocket as our framework. Rocket stood out for its combination of performance and developer-friendly features. Its static typing and compile-time checking align well with Rust's safety principles, helping us catch potential issues early in the development process. On the database front, we opted for SQLx. This async SQL library for Rust offers several advantages that made it ideal for our needs:
Our journey to the optimal batching configuration was one of rigorous testing and careful analysis. We ran extensive benchmarks with various combinations of concurrent transactions and chunk sizes, measuring not just raw speed but also resource utilization and system stability.
The process involved creating test datasets of varying sizes and complexity, simulating real-world usage patterns. We then ran these datasets through our system, adjusting the number of concurrent transactions and the chunk size for each run.
After analyzing the results, we found that processing 5 concurrent transactions with a chunk size of 500 records provided the best balance of speed and resource utilization. This configuration allows us to maintain high throughput without overwhelming our database or consuming excessive memory.
Interestingly, we found that increasing concurrency beyond 5 transactions didn't yield significant performance gains and sometimes led to increased database contention. Similarly, larger chunk sizes improved raw speed but at the cost of higher memory usage and longer response times for small to medium-sized imports/exports.
The final piece of our solution addresses the challenge of delivering large exported files to users. Instead of providing a direct download from our web app, which could lead to timeout issues and increased server load, we implemented a system of emailed download links.
When a user initiates a large export, our system processes the request in the background. Once complete, rather than holding the connection open or storing the file on our web servers, we upload the file to a secure, temporary storage location. We then generate a unique, secure download link and email it to the user.
These download links are valid for 2 hours, striking a balance between user convenience and information security. This timeframe gives users ample opportunity to retrieve their data while ensuring that sensitive information isn't left accessible indefinitely.
The security of these download links was a top priority in our design. Each link is:
This approach offers several benefits:
User feedback on this feature has been overwhelmingly positive, with many appreciating the flexibility it offers in managing large data exports.
The other obvious improvement was to allow users to only export data that was already filtered in their project view. This means if there is active tag "priority", then only records that have this tag would end up in the CSV export. This means less time manipulating data in Excel to filter things out that are not important, and also helps us reduce the number of rows to process.
While we don't have immediate plans to expand our use of Rust, this project has shown us the potential of this technology for performance-critical operations. It's an exciting option we now have in our toolkit for future optimization needs. This CSV import and export overhaul aligns perfectly with Blue's commitment to scalability.
We're dedicated to providing a platform that grows with our customers, handling their expanding data needs without compromising on performance.
The decision to introduce Rust into our technology stack wasn't taken lightly. It raised an important question that many engineering teams face: When is it appropriate to venture outside your core tech stack, and when should you stick with familiar tools?
There's no one-size-fits-all answer, but at Blue, we've developed a framework for making these crucial decisions:
One of the primary risks of frequently adopting new technologies is ending up with what we call a "technology zoo" - a fragmented ecosystem where different parts of your application are written in different languages or frameworks, requiring a wide range of specialized skills to maintain.
This project exemplifies Blue's approach to engineering: we're not afraid to step outside our comfort zone and adopt new technologies when it means delivering a significantly better experience for our users.
By reimagining our CSV import and export process, we've not only solved a pressing need for one enterprise client but improved the experience for all our users dealing with large datasets.
As we continue to push the boundaries of what's possible in project management software, we're excited to tackle more challenges like this.
Stay tuned for more deep dives into the engineering that powers Blue!