6 lessons from scaling payments

In order to process millions of transactions daily, MoonPay has developed a high-performance payments engine—one that operates with relentless precision and never skips a beat.

Designing and scaling this system has been one of our most formidable engineering challenges, pushing the boundaries of what we thought was possible.

In this article, I’ll break down the key lessons we’ve learned while scaling our payments engine.

Further reading: Building MoonPay’s on-ramp

Lesson 1: Payments should have their own data model

When building an e-commerce app, it is common (and recommended) to store all the orders in a dedicated table.

Let’s say we have an order table and accept card payments. To keep it simple, we start by storing the paymentId on the same table. Later on we decide to accept bank transfers with a new provider, but the paymentId column alone isn’t enough to identify how the order was paid, so we add an extra column paymentMethod that allows us to understand which payment paymentId refers to.

This goes on and on until the order table is completely clogged with payment-related columns. That’s why it’s best to make the investment early and create a dedicated space to store payments data.

This allows both the payments platform and the order system to evolve separately, making it much easier to apply changes to one or the other.

Lesson 2: Payments should be processed async

No payment is truly immediate. Most payment methods require 1-2 requests to complete, which makes it tempting to execute them when the customer clicks “Buy”. But this makes the request slower and more vulnerable to errors.

Processing requests asynchronously requires a bigger investment upfront, but it brings added value to the system:

Faster feedback for the user, since you’re just dispatching a message instead of processing the payment.
If there’s a network issue, application crash, or API downtime, the message is not lost and the payment can be retried later.
By using messaging, the two services can execute in separate runtimes.

MoonPay started with a synchronous approach for simplicity, but ended up moving to async as the system grew more complex. Even if the payment takes slightly more time, the user experience remains the same or better.

Lesson 3: Always work directly with vendors

One of the things we consider important when integrating a third party is to establish a process where it’s possible for the two companies to work together:

Regular check-ins during the implementation phase can save hours of searching the docs.
Asking for advice leads to best practices and gets the best possible approval rates.
Building a good relationship with vendors is useful when there is an incident or an issue that requires collaboration.

Strong vendor ties turn hurdles into quick fixes—build them early.

Lesson 4: Anticipate vendor failures

Payment providers are software, and therefore can fail. Tolerating vendor failures is important to keep services available, which is why we think it should be considered from day one.

Here are some ways to tolerate vendor failures:

Use multiple payment providers. If one fails, switch to another automatically.
Implement retries to mitigate network issues or API failures.
Use message queues and circuit breakers to prevent failures from spreading.
Create dashboards and alarms for non-2xx responses or other failure scenarios.

Lesson 5: Don't deal with payment data directly

All information processed and generated by cloud services is potentially compromised. There is a huge attack surface that includes application logs, spreadsheets and reports, and operational dashboards. Making this data safe is a business in itself because it takes a considerable amount of effort.

It’s better to replace payment data with tokens. Using a PCI-DSS-compliant vendor to tokenize card numbers, bank account details, and other sensitive information massively reduces the risk of a data leak.

Lesson 6: Build troubleshooting capabilities

Issues will arise and troubleshooting is inevitable. In most cases the problem involves a customer with money stuck somewhere in the system.

Whether this falls under the engineering or the operations team, it is important to have the proper controls in place so these issues can be identified and resolved quickly:

Tracing: An append-only store is a great way to understand the history of a given payment. By storing all the updates that happen during the payment lifecycle we get a permanent log of everything that happened, which makes it much easier than using application logs to troubleshoot an issue.
Visibility: Expose the payment details internally in a way that everyone understands what happened, specifically if a payment was charged incorrectly. If the customer support agents are able to use that information, they’re likely to escalate fewer cases to engineering.
Tooling: In some scenarios, the payment may need manual intervention like a void, a refund, or just refreshing the data. Have readily available tools that can do this. Again, if customer support agents can use these tools, they’ll have more autonomy to sort these cases out.

By putting the right controls in place - tracing, visibility, and tooling - you can resolve payment issues faster, reduce escalations, and keep customers happy. A well-prepared team turns troubleshooting from a fire drill into a smooth, efficient process.

Conclusion

Building a scalable and reliable payments engine has been challenging but rewarding. Through a lot of trial and error, we eventually ended up with a solid set of tools and practices that have helped us scale our platform and stay afloat during bull runs.

Are you passionate about payments engineering? MoonPay is hiring for a number of open roles across engineering, product, and operations. Come be part of the team shaping the future of payments!

Buy

Back to the main menu

Sell

Back to the main menu

Swap

Back to the main menu

Back to the main menu

Back to the main menu

Back to the main menu

Back to the main menu

6 lessons from scaling payments

Lesson 1: Payments should have their own data model

Lesson 2: Payments should be processed async

Lesson 3: Always work directly with vendors

Lesson 4: Anticipate vendor failures

Lesson 6: Build troubleshooting capabilities

Conclusion