TL;DR: A mid-sized European skincare brand came to us in early 2023 with a brief that looked straightforward on paper: reformulate their existing 8% glycolic acid toner to perform better on the EU market while keeping the same on-pack claims
TL;DR: They sent us 18 months of batch records alongside six bottles from returned units
Key Technical Parameters #
A mid-sized European skincare brand came to us in early 2023 with a brief that looked straightforward on paper: reformulate their existing 8% glycolic acid toner to perform better on the EU market while keeping the same on-pack claims. What followed was a seven-month project that touched pH engineering, packaging compatibility, preservative rebalancing, and eventually a full consumer study. The brand launched in Q1 2024. The reformulation cost them roughly €0.08 more per unit. Their return rate dropped by 61% within four months of relaunch. This case study walks through what we actually did, what failed, and what we’d do differently if we ran it again.
The Problem That Started Everything #
The brand had been selling their original toner for about two years with a modest but steady complaint rate — primarily reports of inconsistent skin feel and, in some cases, visible irritation at the same concentration and pH they’d been using since launch. Our initial assumption was that it was a supply chain issue with the glycolic acid grade. That turned out to be only part of the story.
When their UK distributor flagged that three retail accounts were pulling the SKU from shelves, that triggered an internal escalation on their side. They sent us 18 months of batch records alongside six bottles from returned units. We ran those under our QC-07 incoming risk assessment protocol. pH on the returned units ranged from 3.1 to 3.9 — against a target of 3.5. Free acid fraction calculations showed that at pH 3.1, the bioavailable glycolic acid was running at roughly 82% of total acid load. At pH 3.9 it was closer to 52%. Same label. Very different skin experience.
The root cause wasn’t the formula. It was the packaging. The brand had switched bottle suppliers at month 14 to cut component costs. The new PET bottle had a different oxygen transmission rate, and over a 90-day shelf window at typical European retail temperatures (which can swing between 15°C and 28°C in uncontrolled stockrooms), the formula was drifting. The original packaging had masked a buffer capacity problem that had always been there.
This is where the project actually began.
The Parameters That Determined the Reformulation Scope #
Once we understood the pH drift mechanism, we ran a full formulation teardown. The original formula used sodium hydroxide as the sole neutralizing agent. No buffer system. Buffer capacity was essentially zero below pH 3.8, which meant any CO₂ ingress or oxidative side reaction could push the formula toward more acidic territory without any self-correction.
We evaluated four reformulation variables in parallel across a 6-week bench phase:
Buffer system selection. Citric acid / sodium citrate at a 1:1 ratio gave us usable buffer capacity across pH 3.3–3.8. Lactic acid / sodium lactate was also trialed — it performed slightly better on skin tolerance in patch testing but added complexity to the free acid calculation for the EU dossier. We went with citrate.
Preservative rebalancing. The original formula ran phenoxyethanol at 0.9% with ethylhexylglycerin at 0.3%. At pH values below 3.4, phenoxyethanol activity increases measurably, which we suspected was contributing to the irritation signal. We pulled phenoxyethanol to 0.7% and increased the ethylhexylglycerin to 0.5%. Challenge testing under our standard ISO 11930 protocol confirmed preserved efficacy across the adjusted pH range.
Packaging requalification. The brand wanted to stay with their existing bottle supplier. We ran a parallel compatibility study across three closure types using the reformulated base at 40°C / 75% RH over 12 weeks. Two closure types showed no measurable pH shift. The third — the current production closure — showed a 0.22 pH unit drop by week eight. That one got replaced.
Humectant adjustment. The original formula had 3% glycerin and no other humectant. We added sodium PCA at 1.5% to support skin feel at the corrected pH, where the higher free acid fraction would otherwise register as more aggressive on compromised barrier skin.
| Parameter | Original Formula | Reformulated Version | Change Rationale |
|---|---|---|---|
| Target pH | 3.5 (unbuffered) | 3.5 (citrate-buffered) | Prevent pH drift during shelf life |
| Buffer capacity | None | 0.05 mol/L citrate system | Maintain free acid consistency |
| Phenoxyethanol | 0.9% | 0.7% | Reduce irritation contribution at low pH |
| Ethylhexylglycerin | 0.3% | 0.5% | Compensate preservative adjustment |
| Glycerin | 3.0% | 3.0% | Unchanged |
| Sodium PCA | 0 | 1.5% | Improve skin feel at corrected pH |
| Packaging closure | Failed at Week 8 | Requalified closure type | Eliminate pH drift source |
The most commonly overlooked variable in AHA toner reformulations is buffer capacity. Every brand specifies a target pH. Almost none specify a buffer capacity range. Without it, the target pH is a snapshot, not a guarantee.
What the Clinical Phase Actually Showed #
With the reformulated base locked, the brand commissioned a split-face consumer study through an independent CRO in Germany. The design was a randomized, single-blind, controlled trial with n=44 participants, 8 weeks, comparing the original formula at pH 3.5 (unbuffered) to the reformulated version at pH 3.5 (citrate-buffered, same stated concentration). Primary endpoint was transepidermal water loss (TEWL) measured by Tewameter at weeks 2, 4, and 8. Secondary endpoints included investigator-graded erythema and participant-reported skin comfort scores.
At week 8, the reformulated version showed a 23% lower TEWL increase versus baseline compared to the original formula. Investigator-graded erythema scores were 34% lower at week 4 in the reformulated group. Participant comfort scores improved by 1.8 points on a 10-point scale by week 8. The formulas were identical in stated glycolic acid concentration. The only meaningful variables were buffer system and preservative adjustment.
We’re honestly a bit cautious about overclaiming here — the study was 8 weeks, n=44, single-blind. It’s enough to validate the direction. It’s not a definitive clinical package for a medical device dossier. What it confirmed for us was that the buffer capacity intervention was doing real work, not just theoretical work.
This aligns with guidance in the SCCS Scientific Opinion on AHA formulations, which specifically notes the role of pH stability on skin penetration and tolerability assessment. Something the brand’s original supplier had apparently not flagged when they built the first formula.
Timeline, Cost, and What the Numbers Looked Like #
The full project ran 29 weeks from first brief to production sign-off. That’s longer than typical for a toner reformulation. The packaging requalification added about six weeks to a process that would otherwise have completed in 20–22 weeks. Worth it, but brands consistently underestimate how much packaging decisions can extend qualification timelines.
Cost delta per unit landed at €0.08, split roughly as: citrate buffer system (+€0.02), sodium PCA addition (+€0.03), packaging closure upgrade (+€0.02), and formulation documentation overhead for the EU responsible person update (+€0.01 amortized at forecast volume).
On the return side: before relaunch, the brand was averaging a 4.3% return rate on this SKU across their key retail accounts. Four months post-relaunch, that figure was 1.7%. At their sales volume, that translated to a direct saving that recovered the reformulation investment in approximately six weeks of post-launch sales. The retail relationship recovery took longer — one account came back at reduced initial buy. That’s a softer cost that doesn’t show in the unit economics but matters operationally.
We flag this kind of downstream commercial risk in every kickoff call now, particularly when a project is triggered by a retail escalation. The reformulation fixes the product. It doesn’t automatically fix the trust deficit with a buyer who already pulled the SKU.
For brands selling into the EU market, the EU Cosmetics Regulation 1223/2009 requires that any formulation change that could affect safety assessment triggers a dossier update. This project required one. It added two weeks and approximately €800 in responsible person fees — entirely manageable, but it needs to be in the project plan from day one.
For reference on preservative challenge testing standards applied during this project, we worked to ISO 11930:2019 criteria throughout.
Scalability: What Held and What Didn’t #
The bench formula scaled to 200kg pilot batches without incident. The citrate buffer system dispersed cleanly, and the pH target of 3.5 ± 0.1 was achievable within a normal mixing cycle. No surprises there.
What did require adjustment at scale was the temperature control during the neutralization step. At bench scale we were adding sodium citrate solution to the glycolic acid base at ambient temperature. At 200kg, the exothermic response from neutralization was pushing the batch to 38–41°C before we completed addition. That’s not catastrophic for this formula, but we’d specified a process temperature ceiling of 35°C to protect the sodium PCA. We adjusted the addition rate and added a cooling water jacket step. Problem solved, but it added about 40 minutes to the batch cycle. Not documented in the bench protocol.
This is a consistent gap we see in acid exfoliation technology scale-up: neutralization exotherms that are invisible at lab scale become real constraints at commercial batch sizes. The fix is straightforward once you know to look for it. The problem is that it only appears at scale, and brands sometimes interpret production delays as quality failures rather than process engineering gaps.
At 500kg batch size, the process ran clean after the protocol adjustment. We currently run this formula at 500kg standard batch, with a documented post-addition cooling hold of 15 minutes before pH measurement.
Our barrier repair and sensitive skin formulation team was involved in the final skin-feel validation rounds, particularly for the sodium PCA addition. Their input on the dose-response relationship at different pH ranges was useful — this kind of cross-team review isn’t standard on every project but it was the right call here.
For US market registration context, the FDA Cosmetics Guidelines don’t impose a specific AHA concentration limit at the federal level, but the brand’s US regulatory counsel independently flagged pH below 3.5 as a potential risk flag for their specific claims. That’s a separate conversation outside our scope, but we mention it because it’s a question that comes up in almost every multi-market AHA brief we receive.
Formulation Notes for Brand Partners #
When you brief us on a reformulation like this, the first questions we ask are: what triggered the brief, what market is it going to, and what’s your current packaging? Those three variables determine the scope before we even look at the formula itself.
The most common mistake we see in reformulation briefs is framing the ask as “adjust the pH” when the actual problem is buffer capacity or packaging compatibility. The pH number is easy to change. Making it stable across 24 months of shelf life under real retail conditions is the work. We’ll push back early if the brief is underspecifying this — it saves time and money for both sides.
A secondary mistake: brands sometimes request a formulation change to resolve a performance complaint without updating their EU cosmetic product safety report. Any change that could affect the safety profile requires a dossier review under EU Cosmetics Regulation 1223/2009. We flag this in the intake form we call the Project Scope Alignment sheet, but brands sometimes want to skip it. We don’t let them.
Timeline for a project like this: bench development and initial stability 4–6 weeks, packaging compatibility 8–12 weeks (run concurrently where possible), consumer study if needed 8–12 weeks (external CRO, outside our control), production sign-off 2–4 weeks. Real-time 24-month stability is initiated at first pilot batch and runs concurrently throughout. Lab samples in 2–3 weeks from brief receipt on most reformulation projects.
Frequently Asked Questions #
We want to reformulate our AHA toner — how do we know if it’s a formula problem or a packaging problem?
A: Send us your batch records and, if possible, two or three units from returned stock. We run pH on aged samples under our QC-07 protocol as a first step — if the aged pH has drifted more than 0.3 units from your target, packaging compatibility is almost always part of the story, not just formulation instability.
Does a formula change require us to update our EU Cosmetic Product Safety Report?
A: Yes, if the change affects the safety assessment. Under EU Cosmetics Regulation 1223/2009, any modification that could alter the product’s safety profile requires a CPSR update through your responsible person. Budget roughly €600–1,200 in RP fees and two to three weeks for the update cycle.
We’ve had irritation complaints but our pH is on target every time we test. What’s going wrong?
A: This is the buffer capacity issue. A formula can test at pH 3.5 at the point of manufacture and drift to pH 3.1 by month three in a warehouse, especially in regions with temperature variation. At pH 3.1, the bioavailable free acid fraction of glycolic acid at 8% total concentration is substantially higher than at pH 3.5 — we’ve measured this shift as a near-30% increase in free acid availability across a 0.4 pH unit drop. The number on the batch record looks fine. The product in the consumer’s hands is different.
What’s your MOQ and timeline for a reformulation project like this?
A: MOQ on a toner format is typically 500kg per batch, which translates to roughly 25,000–33,000 units depending on fill volume. Timeline from brief to production sign-off runs 20–30 weeks depending on packaging requalification scope. If your packaging is already qualified and stable, we can compress the schedule. If we’re requalifying closures or bottles, add six to eight weeks.
Is there anything about this type of project you’re still figuring out?
A: Honestly, the long-term interaction between citrate buffer systems and certain PET bottle chemistries over 24+ months is something our dataset is still building. Our accelerated stability results (40°C / 75% RH, 12 weeks) are consistently predictive. But we’ve had one case where a formula that cleared accelerated stability showed unexpected pH behavior at the 18-month real-time read. We don’t have a full explanation yet. Our current practice is to run real-time checks at 6, 12, and 18 months on every AHA toner with a new packaging combination — and we treat the 18-month read as a required gate before recommending any high-volume scale-up.
Have a product concept in mind? Contact our formulation team to request a complimentary brief review.
We had almost the same situation with an unbuffered 10% lactic acid toner we were co-developing with an OEM in Łódź — pH was spec’d at 3.8 but by month four of shelf-life testing we were seeing readings down to 3.3 on units stored at 35°C/75% RH. We didn’t catch it early enough because our stability protocol only pulled samples at T0 and T6, nothing in between. The batch had already gone to our German retail partner before we had T6 data, and pulling it cost us the account.
Curious whether the 0.05 mol/L citrate system caused any compatibility issues with the packaging — we’ve seen citrate-buffered formulas at that concentration accelerate metal ion leaching from certain pump mechanisms, and wondering if that was a factor in your materials qualification process.
We had almost the exact same pH drift issue with our Shenzhen OEM back in 2021 — 8% glycolic, unbuffered, and we were seeing ±0.4 unit swings between production runs because their water treatment system wasn’t consistent batch to batch. We didn’t move to a citrate buffer system until we’d already eaten two chargebacks from a UK retailer. The 0.05 mol/L level mentioned here is actually close to what we landed on after three rounds of reformulation trials.
The phenoxyethanol reduction actually caught my attention — we went through something similar with a Hangzhou OEM in late 2022 where dropping from 0.9% to 0.7% in a low-pH AHA formula triggered a challenge test failure on their end that didn’t show up in our in-house testing. Took us three rounds of back-and-forth to figure out the discrepancy traced back to them using a different ISO 11930 inoculum preparation protocol than our lab.
The “keeping the same on-pack claims” line is the part that doesn’t get talked about enough — because if the reformulation meaningfully changes the free acid fraction, you’re not technically selling the same product anymore, and any existing consumer study data you had is now propping up claims it wasn’t designed to support. We ran into this with a 10% mandelic toner in 2022 where a buffer system change shifted our free acid availability by about 12%, which our legal team flagged as requiring a new efficacy substantiation file before we could keep the “visibly smoother skin in 4 weeks” claim on pack.
Seven months concept-to-reformulation is actually fast for this scope — our 2022 relaunch of a buffered BHA toner with a Polish contract manufacturer ran 11 months just to get stability and challenge testing signed off before we could even book a production slot.
The 61% return rate drop tracks with what we saw after adding a citrate buffer to a mandelic/lactic blend for a German private label client in mid-2023 — complaints about “burning” almost disappeared within the first two quarters post-relaunch, and we’d attributed most of it to the same free acid inconsistency issue.
The pH range on those returned units (3.1 to 3.9) is exactly the kind of variance that’s almost impossible to explain to retail buyers without sounding like you’re making excuses. We had a comparable situation with a glycolic/mandelic hybrid in early 2023 where the returned units were fine by every spec except pH, and it took us weeks to trace it back to the glycolic acid supplier switching their synthesis process mid-contract without notification.
Worth flagging for anyone taking this formula into Southeast Asia — under Thailand’s FDA Notification for cosmetics (effective 2022 amendment), AHA-containing rinse-off and leave-on products above 3% concentration require a cautionary statement specifying the exact acid percentage and a UV protection advisory on-pack. We nearly missed that when filing our buffered glycolic SKU in Bangkok last year because our EU label listed glycolic as part of a “fruit acid complex” rather than by individual INCI percentage.
Curious how the free acid fraction held up across that pH spread on the returned units — a 0.8-unit range at 8% glycolic would put you in meaningfully different bioavailability territory, and I’m wondering whether you used potentiometric titration or just back-calculated from nominal concentration and measured pH when you were running the QC-07 assessment.