The Wreck of the Edmund Fitzgerald: Modeling Decomposition in Extreme Environments

Originally appearing on his 1976 album, Summertime Dream, “The Wreck of the Edmund Fitzgerald” is a powerful ballad written and performed by folk singer Gordon Lightfoot . In 1976, the song hit No. 1 in Canada on the RPM chart, and No. 2 in the United States on the Billboard Hot 100. The lyrics are a masterpiece, but there was one specific line that always stood out to me: “The lake, it is said, never gives up her dead.” Following the singer’s death in 2023, the song reconnected with older fans and reached new generations of listeners, making it to No. 15 on Billboard’s Hot Rock and Alternative category.

Listening to it again after all these years, I was inspired to research what that line meant and if there was any truth to it. What I discovered was very illuminating: Lake Superior really doesn’t give up her dead, and the science behind it is as haunting as the song itself.

It turns out that line is not poetic license. It’s physics.

When 29 souls went down with the Edmund Fitzgerald on November 10, 1975, they stayed down. Not because of some mystical property of the Great Lakes, but because of a perfect storm of temperature, pressure, and biology that we can model mathematically.


Note: This post contains scientific discussion of decomposition and forensic pathology in the context of maritime disasters.


“The lake, it is said, never gives up her dead / When the skies of November turn gloomy”


Read more →

The Birthday Paradox in Production: When Random IDs Collide

You generate a UUID. It’s 128 bits total, with 122 bits of randomness. That’s 340 undecillion possible values. Collision-proof, right? Your system generates a million IDs per second. Still safe? What about a billion?

As I like to say, common sense and intuition are the enemies of science. Common sense tells you that with 340,000,000,000,000,000,000,000,000,000,000,000,000 possible values, you’d need to generate at least trillions before worrying about duplicates. Maybe fill 1% of the space? 10%?

Math shows us the uncomfortable truth: You’ll hit a 50% collision probability after generating just \(2.7 \times 10^{18}\) IDs. That’s 0.0000000000000000008% of your total space. At a billion IDs per second, you’ve got 86 years. Comfortable, but not infinite. Drop to 64-bit IDs? Now you’ve got 1.4 hours. Just enough time to duck out for long lunch and return to a disaster. And 32-bit? 77 microseconds. Faster than you can blink.

You might know that the birthday paradox proves that just 23 people have more than a 50% probability of sharing a birthday. What you may not know is that this isn’t just a party trick; it’s the same mathematics that determines when your “guaranteed unique” database IDs collide, why hash tables need careful sizing, and when your distributed system’s assumptions break.


“In a room of 23 people, there’s a greater than 50% chance two share a birthday. In your database, collisions arrive far sooner than intuition suggests.”


Read more →

Hash Collisions: Why Your 'Unique' Fingerprints Aren't (And Why That's Usually OK)

In 2017, Google researchers generated two different PDF files with identical SHA-1 hashes, finally proving what cryptographers had warned about for years: hash functions don’t create truly unique fingerprints ( Stevens et al., 2017 ). This “SHAttered” attack required 9 quintillion SHA-1 computations, which is the equivalent to 6,500 years of single-CPU computation. The attack cost approximately $45,000 in cloud computing resources, making it accessible to well-funded adversaries but not casual attackers.

Yet despite this proof, we still trust hash functions for everything from Git commits to blockchain transactions to password storage. The reason is simple: while collisions are mathematically inevitable, meaningful collisions remain virtually impossible. The full story of hash collisions is more nuanced than “unique” versus “not unique.”


“In cryptography, ‘secure’ has always meant ‘secure for now’.”


Read more →

How Large Language Models (LLMs) Tokenize Text: Why Words Aren't What You Think

When you type “I love programming” into ChatGPT, you might assume the model reads three words. It doesn’t. It reads somewhere between three and seven tokens, depending on how the text is split.

When you ask Claude to count the letters in the word “strawberry,” it often gets it wrong. The reason is simple. Claude never saw the word “strawberry” as a complete unit. It saw tokens like "str", "aw", "berry" and tried to reason about letters it couldn’t directly access.

And when early GPT-3 users discovered that typing “SolidGoldMagikarp” caused the model to behave erratically - generating nonsense, refusing requests, or producing bizarre outputs - the culprit wasn’t the model’s training. It was a glitch token: a tokenization artifact that never appeared in training data, leaving the model with no learned representation for how to handle it ( Rumbelow & Watkins, 2023 ).


“To a language model, text isn’t a stream of words. It’s a sequence of tokens. The way those tokens are created determines what the model can and cannot understand.”


Read more →

How Large Language Models (LLMs) Handle Context Windows: The Memory That Isn't Memory

When you have a long conversation with a large language model (LLM) such as ChatGPT or Claude , it feels like the model remembers everything you’ve discussed. It references earlier points, maintains consistent context, and seems to “know” what you talked about pages ago.

But here’s the uncomfortable truth: the model doesn’t remember anything. It’s not storing your conversation in memory the way a database would. Instead, it’s rereading the entire conversation from the beginning every single time you send a message.


“A context window isn’t memory. It’s a performance where the model rereads its lines before every response.”


Read more →

Rethinking the Three-Second Traffic Rule: When Physics Says It’s Not Enough

While researching why car insurance rates are so extremely high in Las Vegas, I started thinking about the three-second rule and its validity. As I’ve always heard, the three-second rule refers to how far you should be behind a car in traffic. The idea is that you pick out a fixed roadside marker and you are supposed to pass that marker at least three seconds after the car in front of you. That rule is simple enough, yet deceptively deep once you unpack the physics.


“Three seconds is a rule of thumb. Physics reveals the truth.”


Read more →

Modeling Heat Capacity and Evaporation with Python: Why Water Warms Slowly but Cools Fast

Every summer, it feels like a small miracle when the pool finally warms up enough to swim. In Nevada, where the air temperature can sit above 100°F (38°C) for weeks, you’d expect the water to keep pace. Yet, somehow, it takes forever to warm, and only a few cool nights can undo all that progress.

The same phenomenon shows up in a stick of butter. Butter melts quickly, while margarine stays stubbornly firm even under the same heat. That’s not coincidence; it’s thermodynamics.

The butter versus margarine comparison is a staple example in nutrition science. It shows how the proportions of fat, water, and solids affect how much energy it takes to change temperature. Butter, with more fat and less water, heats up and melts quickly. Margarine, full of water and unsaturated oils, absorbs more energy before softening because water’s specific heat is much higher.


“A pool in the desert and a stick of margarine in the kitchen both tell the same story: water resists change.”


Read more →

How Large Language Models (LLMs) Learn: Calculus and the Search for Understanding

When you interact with a large language model (LLM) such as ChatGPT or Claude , the model seems to respond instantly relative to the question’s degree of difficulty. What’s easy to forget is that every word it predicts comes from a long history of learning where billions of gradient steps have slowly sculpted its understanding of language.

Large language models don’t memorize text. They optimize it. Behind that optimization lies calculus. I’m not referring to the calculus you did with pencil and paper. I’m talking about a sprawling, automated version that computes millions of derivatives per second.

At its heart, every LLM is a feedback system. It starts with random guesses, measures how wrong it was, and then adjusts itself to be slightly less wrong. The word “slightly” in this context is the essence of calculus.


“Each gradient step represents a measurable reduction in error, guiding the model toward a more stable understanding of language.”


Read more →

How Large Language Models (LLMs) Think: Turning Meaning into Math

When you enter a sentence into a Large Language Model (LLM) such as ChatGPT or Claude , the model does not process words as language. It represents them as numbers.

Each word, phrase, and code token becomes a vector — a list of real-valued coordinates within a high-dimensional space. Relationships between meanings are captured not by grammar or logic but by geometry. The closer two vectors lie, the more similar their semantic roles appear to the model.

This is the mathematical foundation of large language models: linear algebra. Matrix multiplication, vector projection, cosine similarity, and normalization define how the model navigates this vast space of meaning. What feels like understanding is actually the alignment of high-dimensional vectors governed by probability and geometry.


“Linear algebra and geometry do more than support AI; they create its language of meaning.”


Read more →

How Large Language Models (LLMs) Read Code: Seeing Patterns Instead of Logic

Developers are accustomed to thinking about code in terms of syntax and semantics, the how and the why. Syntax defines what is legal; semantics defines what it means. A compiler enforces syntax with ruthless precision and interprets semantics through symbol tables and execution logic. But a Large Language Model (LLM), reads code the way a seasoned engineer reads poetry, recognizing rhythm, pattern, and context more than explicit rules.


“When an AI system ‘understands’ code, it is not executing logic; it is modeling probability.


Read more →