The art of leaking Environment Variables

The art of leaking Environment Variables
Image generated with Bing AI and modified with Adobe Generative Fill.

Recently, I read an interesting blog post about how an AI tool leaked its Firebase configuration through JS files. In that post, they mentioned how they found it which really interested me, so I decided to create a similar scanner but for .bot domains.

Within a few hours, I had already found a few matches. Plenty of them were false flags, as they just happened to reference known env variable terms in the code. However, several included .default() options with... the variable's content.

Here is an example with the real creds removed:

DATABASE_URL:n.z.string().url().refine(e=>e.startsWith("postgres")||e.startsWith("mysql")).default("postgresql://root:root@<HOST>:54329/<database>")

Yes, their username and password are both root...

Result of my database URL checker

Now, I shifted my focus to .ai domains. Everyone is jumping on the AI bandwagon at the moment so we are more likely to see slipups. Within a few hours , I had already found several live .env variables including OpenAI API keys (scanned ~37,500 websites)

One of these struck my attention - it was a CDN link for DeepBrain AI, a seemingly well-known AI tool. Within this was multiple secret variables - OPENAI_API_KEY, MONGODB and AWS_TRNASCRIBE_SECRETACCESSKEY! (yes, it is that badly spelt)

Unfortunately, it gets worse... On their main website, they have an API URL that returns any environment variable you want 🤦

Some I tested included AWS_KEY, AWS_SECRET_KEY and AWS_API_KEY, until I tried SECRET_KEY and had this response:

Next, I tried MONGO_DB and DATABASE_URL, landing on MONGODB which returned their full MongoDB url:

And finally, I retried OPENAI_API_KEY which returned a different key to what was previously shown:

Writing this blog post has driven me insane. I realised that they use Next-Auth for their authentication, and with that they use the default environment variable names, meaning they are easy to find.

Google Oauth Creds:

Microsoft Oauth Creds:

I tried to contact DeepBrain AI about this, and after 9 days (of mostly waiting) it was finally fixed.

Statistics

Looking at my logs, the most common leaked credentials from my scanner are as followed:

Closing Notes

In the space of 4 days, my scanner picked up just under 1000 total logs of leaked .env variables for only 3 TLDs scanned. Thats.... a lot! Please remember to keep your credentials PRIVATE! At the end of the day, maintaining the security of your data is your responsibility, especially when running a large business. People will try to gain unauthorised access, so please don't leave the key under the mat.

Credits for discovering the API endpoint go to Whanos!