The Negative Consequences of LLMs

Large Language Models (LLMs) like OpenAI’s GPT series, Meta’s LLaMA, and Google’s Gemini have revolutionized how we write, code, and communicate. They autocomplete sentences, generate working code, explain abstract concepts, and even replace customer support agents.

But this leap forward comes with real risks—technical, ethical, economic, and professional, that developers need to understand and confront.

1. Code Quality and Overreliance

LLMs can generate functional code quickly, but they do not understand the systems they touch. Developers who rely heavily on them risk introducing subtle bugs, performance regressions, or security vulnerabilities.

A study by Fu et al. (2025) found that approximately 29–36% of Copilot-generated Python and JavaScript snippets contained at least one security weakness (e.g., CWE‑78, CWE‑330, CWE‑94, and CWE‑79).

Blind trust in LLM output leads to a “copy-paste culture”, where developers stop questioning code correctness and drift away from core software engineering principles like test-driven development and design by contract. Vibe coding has become a meme for good reason.

2. Risks of Data Leakage and Model Exploitation

LLMs present significant risks related to data leakage, both due to how they are trained and how they are used.

LLMs are typically trained on massive internet-scale datasets scraped from forums, code repositories, technical documentation, websites, and social media—often without proper consent, licensing, or security filtering. This introduces multiple vectors for data exposure and abuse.

Training-Time Risks: Inadvertent Data Leakage

  • Unintended exposure of sensitive content: Training data can include hardcoded credentials, personally identifiable information (PII), proprietary source code, private medical, and legal documents. Studies have shown that large models like GPT-2 and GPT-3 can reproduce verbatim sequences from their training sets when prompted correctly (Carlini et al., 2021).

Inference-Time Risks: Prompt Injection and Data Exfiltration

  • Prompt injection attacks allow adversaries to manipulate a model’s behavior by embedding malicious instructions into user inputs or third-party content. These attacks can bypass content filters, extract confidential information, or take control of downstream tools integrated with the LLM (e.g., code execution or file access).

  • Data exfiltration through integrations occurred when Google Bard (a precursor to Gemini) had access to user emails and documents via extensions. Attackers used prompt injection techniques to extract private data from Google Docs and Gmail (Embrace The Red, 2023).

  • Data poisoning targets the model’s training data, subtly altering its behavior or introducing backdoors by injecting adversarial examples. This can degrade performance or cause targeted misbehavior when specific trigger inputs are encountered.

These threats are difficult to detect and even harder to mitigate, especially as LLMs are increasingly embedded in tools that process real-time user content such as emails, source code, and internal documentation.

Operational Risks: Prompt Data Sent to Third Parties

  • Transmission of proprietary data to external models: When organizations use third-party LLMs via cloud services, any sensitive or confidential information included in a prompt is sent outside their control. Even if the provider claims not to store or use the data, there’s a risk of unintended retention or future use in model training, particularly if terms of service are vague or subject to change.

Running models on-premise mitigates this risk. Tools like ramalama provide sandboxed execution environments using containers, offering greater control and security for sensitive workloads.

Mitigation Strategies

Protecting against these risks requires a layered security approach:

  • Careful curation and vetting of training data
  • Application of differential privacy techniques (Dwork & Roth, 2014)
  • Strong isolation between system and user context
  • Rigorous prompt validation and input sanitization
  • Fine-grained access control for users and services
  • Use of structured APIs instead of natural language interfaces where feasible

3. Intellectual Property and Licensing

LLMs usually do not cite their training sources. Generated text and code may inadvertently contain copyrighted or GPL-licensed material, especially in longer completions.

This creates a legal gray zone:

  • Who owns the output?
  • Can it be safely used in proprietary software?
  • Is it ethical or legal to deploy AI-generated code in production without auditing its origin?

GitHub Copilot and OpenAI were sued in 2022.

  • DMCA Violations: Did GitHub and OpenAI distribute licensed code without required attribution or license terms?
  • Contract Breaches: Did Defendants violate open-source licenses and GitHub’s own Terms of Service?
  • Unfair Competition: Did Defendants misrepresent licensed code as Copilot’s output and profit unjustly?
  • Privacy Violations: Did GitHub mishandle user data, violate privacy laws, or fail to address a known data breach?
  • Injunctive Relief: Should the court prohibit Defendants from continuing the alleged misconduct?
  • Defenses: Are Defendants protected by any legal defenses or statute of limitations?

4. De-skilling of Developers

LLMs like ChatGPT and GitHub Copilot have boosted productivity, they also introduce a subtle but significant risk: skills atrophy. As junior developers rely on AI to write code, they may skip foundational learning. Meanwhile, senior developers risk disengaging from the deep work of problem solving, debugging, and optimization.

This creates teams that appear to move faster, but often lack the expertise to handle unexpected failure modes. The code may compile, the tests may pass—but the understanding is shallow. Over time, this can lead to a decline in engineering judgment, architectural intuition, and the ability to reason about edge cases.

This phenomenon isn’t just theoretical. It has parallels in cognitive offloading, a well-documented concept in psychology where reliance on external tools (e.g., GPS, cameras) reduces internal skill retention over time (Risko & Gilbert, 2016). In software development, AI-assisted coding shifts mental load away from understanding to completion. This shift can be beneficial in the short term—but dangerous when overused or unexamined.

Some additional discussions (many exist):

Teams must recognize that LLM-assisted development is not a substitute for expertise. Used wisely, these tools accelerate work. Used blindly, they degrade the very skill that makes software resilient.

5. Bias, Misalignment, and Harmful Outputs

LLMs inherit and amplify biases present in their training data—cultural, racial, gender-based, and technical. This can lead to:

These issues aren’t just ethical—they affect tooling adoption, internationalization, and inclusivity in developer ecosystems.

LLMs also suffer from a deeper alignment problem: they’re trained to generate plausible text, not to understand user goals or ensure factual correctness. As a result, they may:

  • Hallucinate false or misleading information
  • Prioritize pleasing the user over truth due to reward-tuned behaviors
  • Misinterpret instructions, becoming evasive or overly helpful in dangerous ways

These failures aren’t malicious—they’re side effects of statistical pattern-matching at scale. But as LLMs are integrated into critical workflows, small misalignments and hidden biases can scale into systemic risks.

6. Environmental Impact

LLMs have a significant environmental footprint, primarily due to their high energy consumption and hardware demands during training and inference.

  • Training LLMs requires massive compute and energy. GPT-3’s training consumed an estimated 1287 MWh, emitting over 500+ tons of CO₂ (Walther, 2024).
  • Data centers often rely on water-based cooling, consuming large amounts of water for temperature regulation
  • Inferencing: Individual queries can be inexpensive, but LLM providers process billions of queries daily—adding up to substantial energy usage and CO₂ emissions unless powered by renewables (Jegham et al., 2025).
  • Lifecycle: GPUs and TPUs carry environmental costs from rare earth mining and chip fabrication. Frequent fine-tuning and retraining increase the footprint.

If you’re deploying LLM-based tools across CI pipelines or developer workflows, you may be multiplying that carbon footprint daily.

7. Job Displacement and Role Changes

While LLMs augment productivity, they also reshape the labor market:

  • Low-level tasks (e.g., boilerplate writing, documentation) are increasingly automated.
  • The demand for high-level system architects may rise, while entry-level developer roles shrink.

This impacts not just hiring but mentorship and career growth. If juniors never write glue code, who becomes the next senior?

AI has negatively impacted other fields, such as Radiology. One study notes, “The worry that AI might displace radiologists in the future had a negative influence on medical students’ consideration of radiology as a career.” This fear has contributed to the current shortage of radiologists (Bin Dahmash et al., 2020). A similar fear in software may drive fewer students to enter the field.

8. Proliferation of “AI Slop”

“AI slop” is a critical term used to describe the low-quality, error-prone, or incoherent output generated by AI systems, particularly LLMs. It’s a growing concern in both technical and cultural discussions around AI’s impact. Ironically, this slop may make it difficult for future LLM model development.

Conclusion

What Can Developers Do?

  1. Audit LLM Output – Treat AI suggestions like Stack Overflow snippets: useful but untrusted.
  2. Invest in Fundamentals – Algorithms, architecture, and debugging still matter.
  3. Advocate for Transparency – Push vendors for training data provenance and licensing clarity.
  4. Measure Impact – Include carbon cost and security review in tool adoption discussions.
  5. Mentor Actively – Help juniors learn with LLMs, not through them.
  6. Use renewable resources – Push for data centers who strive for carbon neutral footprints.

LLMs are reshaping how we write code, learn new tools, and collaborate, but their influence is far from neutral. These systems encode risks alongside their capabilities: security vulnerabilities, skill degradation, data privacy violations, and a growing environmental footprint.

As developers, we must not treat LLMs as magic oracles. We must engage critically, question their outputs, understand their limitations, and resist the urge to automate judgment. These tools can accelerate our work, but only if we remain grounded in the fundamentals of software engineering.

The future of our profession shouldn’t be dictated by convenience, hype, or vendor promises. It should be shaped by thoughtful practitioners who take responsibility for the systems they build, and the tools they choose to use.


Disclaimer

This post was written with assistance from ChatGPT-4o. While useful, the model occasionally hallucinated citations, quotes, or research papers. It was oddly fun to ask about sources it confidently invented, only for it to concede they didn’t exist.

Read more...

Bcachefs, an introduction/exploration

Introduction & background information

NOTE: This content is from an internal talk I gave, thus the reason it may read like a presentation

So what is bcachefs?

bcachefs is a next-generation copy-on-write (COW) filesystem (FS) that aims to provide features similar to Btrfs and ZFS, written by Kent Overstreet

  • Copy-on-write (COW), with goal of performance being better than other COW filesystems
  • Full checksums on everything
  • Mult-device, replication , RAID, Caching, Compression, Encryption
  • Sub-volumes, snapshots
  • Scalable, 50+ TB tested

Why the need for another FS?

According to Kent[1], paraphrased here

Read more...

How-to: Writing a C shared library in rust

The ability to write a C shared library in rust has been around for some time and there is quite a bit of information about the subject available. Some examples:

All this information is great, but what I was looking for was a simple step-by-step example which also discussed memory handling and didn’t delve into the use of GObjects. I also included an opaque data type, but I’m not 100% sure if my approach is the most correct.

Read more...

sshd attack traffic

I firmly believe that security through obscurity is a fail. However, I do believe that all things being equal, making it a bit more obscure is better as long as you aren’t introducing more failure points, like a port knocker that has it’s own security bugs. Thus I’ve always run my sshd service on an alternative port. It’s simple, and keeps my logs clean and shouldn’t cause any additional security risks. Of course I use a secure configuration and keep my software up to date. However, I found out that in the past few weeks that my port of choice has been discovered.

Read more...

Perl6 Rename?

I saw this referenced today on lwn.net

IMHO if you make a language incompatible with previous versions it should be renamed. I’ve thought this many times with the python2 -> python3 change. I suppose some will find this irritating, but I think it would make things less confusing.

Read more...

Python, the perpetual time suck

I used to like Python. Like others I enjoy the productivity it offers and the vast and plentiful libraries that exist. However, over time that fondness has turned to loathing.

The thing that should have been apparent to me long ago is that the Python folks don’t appear to care about end users. They seem to have lost touch with the fact that Python is very popular! Each and every time they make core language behavior changes, API changes, and deprecate things, a lot of code has to accommodate. It’s a non-trivial amount of work to keep Python code working. Especially so if you’re trying to support code that has to run across multiple versions spanning many years. The test matrix just keeps on getting bigger. The code hacks to accommodate versions becoming more and more intrusive.

Read more...

DBUS Server side library wish list

Ideally it would be great if a DBUS server side library provided

  1. Fully implements the functionality needed for common interfaces (Properties, ObjectManager, Introspectable) in a sane and easy way and doesn’t require you to manually supply the interface XML.
  2. Allows you to register a number of objects simultaneously, so if you have circular references etc.  This avoids race conditions on client.
  3. Ability to auto generate signals when object state changes and represent the state of the object separately for each interface on each object.
  4. Freeze/Thaw on objects or the whole model to minimize number of signals, but without requiring the user to manually code stuff up for signals.
  5. Configurable process/thread model and thread safe.
  6. Incrementally and dynamically add/remove an interface to an object without destroying the object and re-creating and while incrementally adding/removing the state as well.
  7. Handle object path construction issues, like having an object that needs an object path to another object that doesn’t yet exist.  This is alleviated of you have #8.
  8. Ability to create one or more objects without actually registering them with the service, so you can handle issues like #7 easier, especially when coupled with #2, directly needed for #2.  Thus you create 1 or more objects and register them together.
  9. Doesn’t require the use of code generating tools.
  10. Allow you to have multiple paths/name spaces which refer to the same objects. This would be useful for services that implement functionality in other services without requiring the clients to change.
  11. Allows you to remove a dbus object from the model while you are processing a method on it.  eg. client is calling a remove method on the object it wants to remove.
Read more...

How-to Stratis storage

Introduction

Stratis (which includes stratisd as well as stratis-cli), provides ZFS/Btrfs-style features by integrating layers of existing technology: Linux’s devicemapper subsystem, and the XFS filesystem. The stratisd daemon manages collections of block devices, and exports a D-Bus API. The stratis-cli provides a command-line tool which itself uses the D-Bus API to communicate with stratisd.

1. Installation

# dnf install stratisd stratis-cli

2. Start the service

# systemctl start stratisd
# systemctl enable stratisd
Created symlink /etc/systemd/system/sysinit.target.wants/stratisd.service →
/usr/lib/systemd/system/stratisd.service.

3. Locate a block device that is empty. You can use something like lsblk and blkid to locate, eg.

 
# lsblk
    NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
    sda      8:0    0   28G  0 disk 
    ├─sda1   8:1    0    1G  0 part /boot
    ├─sda2   8:2    0  2.8G  0 part \[SWAP\]
    └─sda3   8:3    0   15G  0 part /
    sdb      8:16   0    1T  0 disk  

# blkid -p /dev/sda
  /dev/sda: PTUUID="b7168b63" PTTYPE="dos"

Not empty

Read more...

Debugging gobject reference count problems

Debugging gobject reference leaks can be difficult, very difficult according to the official documentation. If you google this subject you will find numerous hits.  A tool called RefDbg was created to address this specific need. It however appears to have lost effectiveness because (taken from the docs):

Beginning with glib 2.6 the default build does not allow functions to
be overridden using LD_PRELOAD (for performance reasons).  This is the
method that refdbg uses to intercept calls to the GObject reference
count related functions and therefore refdbg will not work.  The
solution is to build glib 2.6.x with the '--disable-visibility'
configure flag or use an older version of glib (<= 2.4.x).  Its often
helpful to build a debug version of glib anyways to get more useful
stack back traces.

I actually tried the tool and got the error “LD_PRELOAD function override not working. Need to build glib with –disable-visibility?” , so I kept looking. Another post lead me to investigate using systemtap which seemed promising, so I looked into it further.  This approach eventually got me the information I needed to find and fix reference count problems.  I’m outlining what I did in hopes that it will be beneficial to others as well.  For the record, I used this approach on storaged, but it should work for any code base that uses gobjects. The overall approach is:

Read more...

Security considerations with github continuous integration

Continuous integration  (CI) support in github is a very useful addition.   Not only can you utilize existing services like Travis CI, you can utilize the github API and roll your own, which is exactly what we did for libStorageMgmt. LibStorageMgmt needs to run tests for hardware specific plugins, so we created our own tooling to hook up github and our hardware which is geographically located across the US.  However, shortly after getting all this in place and working it became pretty obvious that we provided a nice attack vector…

Read more...