Oct 31, 2025·6 min read

Open-source license audit for inherited repos: a practical plan

An open-source license audit helps you spot risky dependency licenses and missing notices early, so customers and partners do not raise compliance issues later.

Open-source license audit for inherited repos: a practical plan

Why inherited repos create license risk

When a repo changes hands, you inherit more than code. You inherit every license choice, every dependency, and every missing notice the previous team skipped. If the project started as a quick prototype (or was generated by an AI coding tool), licensing is often treated as “later” until someone outside the team asks hard questions.

The failure modes are usually simple, but expensive:

  • A dependency sits under a license a customer won’t accept.
  • Required license texts or copyright notices aren’t included in what you ship.
  • The build pulls in transitive packages nobody reviewed.
  • A copied snippet, font, icon, or asset has terms that don’t match your intended use.

Customers and partners ask for proof because they’re protecting themselves. If they ship your software, embed your SDK, or resell your product, they can become responsible for compliance too. Enterprise procurement teams commonly request a Software Bill of Materials (SBOM), a third-party notices file, and a clear statement of how you meet license obligations. If you can’t produce those, deals slow down, reviews get stricter, and legal teams may block rollout.

A common misunderstanding is to treat repo ownership as license rights. Owning the GitHub org (or having a contractor hand you the code) doesn’t automatically grant permission to use, modify, or distribute everything inside it. Open-source licenses are permissions with conditions. You want to know which conditions apply before you ship, especially if you distribute binaries, sell a paid SaaS, or give partners a copy.

The practical goal of an audit is straightforward: find what’s actually included, spot gaps early, and produce clear compliance artifacts. It won’t replace legal advice, and it won’t resolve every edge case, but it can surface issues while they’re still cheap to fix.

A license is the rulebook for how you can use someone else’s code. When you inherit a repo, you also inherit its rules and paperwork. A useful first pass is to separate “low-friction licenses” from “share-alike licenses,” then confirm you’ve met the basic obligations.

Permissive licenses (like MIT and Apache-2.0) usually allow use in almost any product, including commercial ones. The tradeoff is administrative: keep copyright notices, include the license text when required, and don’t claim you wrote the original work.

Copyleft licenses (like GPL) also allow use, but they can require that if you distribute a product that includes the code, you also share the source of the combined work under the same terms. AGPL is similar, but it can extend the “share the source” trigger to software offered over a network, which is why SaaS teams often avoid it.

In practice, “license obligations” often means unglamorous work:

  • Include required license texts and copyright notices with your distribution
  • Provide attribution in a THIRD-PARTY-NOTICES (or similar) file
  • Track modifications when the license expects it
  • For copyleft, be prepared to share source when required
  • Avoid using trademarks or names in ways the license forbids

Dual licensing surprises teams because the same project can be offered under two different sets of rules. A library might be “GPL for open-source projects” and “commercial license for closed-source products,” or it might contain files under different licenses. If someone pulled code from the wrong place, you can end up with stricter terms than you expected.

You don’t need to memorize every license detail. You do need to know which licenses are easy to comply with, which ones change your distribution duties, and which ones can block a deal when a customer’s compliance team spots them.

Where licenses hide in a modern repo

Most license issues in inherited code aren’t in the one dependency you remember adding. They show up in places people don’t review until a customer asks for proof.

Start with direct dependencies (packages listed in files like package.json, requirements.txt, go.mod, Gemfile). The bigger surprise is transitive dependencies: packages your packages pull in. One direct library can bring dozens of transitive licenses, and those can change after a minor version bump.

Frontend repos add another hiding place: bundled assets. Build tools often roll code into a single minified file, and that process can strip headers or notices that would normally travel with source code. Copied snippets are similar. A helper function pasted from a blog post, a gist, or a Stack Overflow answer can still carry license requirements.

Also check what ships around your app, not just your app. Containers and base images can include OS packages (and their licenses) that end up in production. If you inherited a Dockerfile, you inherited its licensing baggage too.

Common hiding spots:

  • Dependency manifests and lockfiles (direct and transitive)
  • Vendor folders, copied libraries, and “utils” directories
  • Built frontend bundles and embedded fonts/icons
  • Container base images and installed OS packages
  • “Mystery code” added quickly, including AI-generated code that may mirror licensed sources

Red flags you can spot in 10 minutes

You don’t need a full audit to spot likely trouble. A quick pass can tell you whether a repo might fail a customer or partner review.

Look at dependency manifests and lockfiles first. Hundreds of dependencies with no lockfile (or multiple lockfiles that don’t match) is a signal you won’t be able to prove what shipped. Watch for private registries or Git-based dependencies pulled directly from repos. Those often lack clear license metadata.

Next, check for build artifacts committed to the repo. Folders like dist, build, vendor, third_party, or copied minified bundles are common sources of missing notices. If code is copied in rather than installed through a package manager, you may have no automatic way to gather license texts.

If the repo is a monorepo (for example, packages/* or libs/*), open a few internal packages. Missing LICENSE files or unclear ownership notes can become a mess when packages are published separately.

Finally, check what’s actually delivered. If there’s no third-party notice file (often THIRD_PARTY_NOTICES or NOTICE) in releases, installers, or container images, that’s a common compliance gap.

A fast triage:

  • Lockfiles missing or inconsistent with the build
  • Git-based or “unknown license” dependencies
  • Committed vendor or bundled third-party code
  • No notices file anywhere in the release process

If more than one of these shows up, plan a deeper review before someone else finds it first.

Step-by-step: run an open-source license audit

Stop guessing about dependencies
FixMyMess will triage lockfiles, vendor folders, and build outputs so your inventory matches production.

License audits go smoother when you treat them like inventory first, paperwork second. Start by identifying what you actually ship, because obligations often depend on distribution.

1) Build a clean inventory of what’s delivered

List every deliverable that leaves your control: a web bundle, a mobile app, a desktop installer, a container image, an on-prem package, even a ZIP you send to a partner.

Then collect the files that define your dependency graph. Versions matter.

  • Build manifests like package.json, pyproject.toml, requirements.txt, Gemfile, go.mod
  • Lockfiles like package-lock.json, yarn.lock, pnpm-lock.yaml, poetry.lock, Gemfile.lock, go.sum, Cargo.lock, composer.lock
  • Container and deployment files (Dockerfiles, Helm charts, build pipelines)
  • Vendor folders or copied code (vendor/, third_party/, sometimes libs/)
  • App store metadata or “About” screens (common places to display notices)

2) Scan, confirm, and record obligations

A practical workflow most teams can follow:

  1. Generate a dependency list (direct and transitive) from lockfiles and build output.
  2. Identify the license for each dependency version. Flag anything “unknown” or “custom.”
  3. Cross-check license data across sources (registry metadata, repo LICENSE, file headers).
  4. Record the obligations you must satisfy (attribution, license text inclusion, notice placement, source-sharing triggers).
  5. Decide what’s acceptable for how you deliver software (SaaS-only vs shipping binaries or on-prem packages).

A desktop app usually needs third-party notices bundled with the installer. SaaS-only distribution can be different, but customers and partners still expect clean records.

Inherited repos often come with missing lockfiles and copied code. If you can’t reproduce the dependency set, fix that first. Otherwise every license decision is shaky.

Common mistakes that trigger compliance problems

Most compliance issues aren’t caused by bad intent. They happen because teams inherit a repo, ship fast, and never verify what’s inside.

Problems that show up repeatedly:

  • Metadata doesn’t match reality. A README claims “MIT,” but package metadata is wrong or missing an SPDX identifier.
  • Copied snippets without attribution. “Just a helper function” can still carry license conditions.
  • Strong copyleft in the wrong place. GPL/AGPL dependencies can create obligations that don’t match your sales or hosting model.
  • Non-code assets get ignored. Fonts, icons, images, and UI kits often have separate licenses.
  • Someone relied on hearsay. A blog post or comment isn’t a license. The license text and the dependency’s metadata are what matter.

Even if your direct dependency is permissive, a transitive dependency pulled in quietly might not be. Spot-checking only top-level packages misses the issues customers tend to find.

A common scenario: you inherit a prototype, see a single LICENSE file, and assume the whole repo is covered. Later, an enterprise customer asks for third-party notices and you discover bundled assets with their own terms and no attribution anywhere.

How to fix findings: notices, replacements, and documentation

After an audit, the goal isn’t perfection. It’s clarity: what you use, what you must ship with it, and what you must not do.

Start by creating (or updating) a THIRD-PARTY-NOTICES file. Keep it boring and complete. For each dependency, include the name, version, license, and where it appears (web app, server, mobile, container image). This is often the first thing partners look for.

Next, add required license texts. If a license says the text or copyright notice must be included with distributions, collect those texts in a licenses/ folder and reference them from THIRD-PARTY-NOTICES.

If you find a high-risk license for your business model (for example, strong copyleft in a library linked into a closed-source product), you generally have three paths: replace it, isolate it behind a service boundary, or obtain a commercial license. The right choice depends on how the component is used, not just the license label.

To make this repeatable, add a short COMPLIANCE.md that explains how you generate the inventory and where the paperwork lives. Keep it practical: which lockfiles are in scope, what you exclude (dev-only tools, test frameworks) and why, and which commands reproduce the scan.

Finally, set a lightweight intake rule for new dependencies: don’t merge a new package until its license is known (with a real SPDX identifier where possible) and notice updates are handled when required.

What partners usually want to see in a compliance packet

Make the codebase maintainable
FixMyMess refactors spaghetti architecture so compliance work and future releases stay repeatable.

Most partners aren’t asking for a legal essay. They want proof that you know what third-party software is inside your product, what rules apply, and who keeps it current.

A good packet usually starts with a short summary: which license families appear in what you ship (permissive like MIT/Apache, weak copyleft like LGPL, strong copyleft like GPL/AGPL), plus scan scope and date.

Then comes evidence: a dependency inventory someone else can verify, with names, exact versions, detected licenses, and where each item was found (lockfile, vendor directory, container layer, build output).

You’ll almost always need actual texts and attributions, not just labels.

Practical minimum to include

  • A short license summary for the shipped product
  • A dependency list with names, versions, and licenses
  • Third-party notices (required attribution and full license texts where required)
  • Notes on exceptions (commercial licenses, written approvals)
  • A named owner and a simple review cadence

One detail that reduces back-and-forth: call out gray areas upfront. Examples: “Package X had no license file, so we replaced it,” or “A GPL dev tool was removed from the production image and is excluded from the SBOM.”

Example scenario: inherited prototype, enterprise customer review

A startup buys a working prototype from a contractor and pushes it into production fast. Demos look good, so the team focuses on features and sales. Three months later, an enterprise customer starts a security and procurement review and asks a simple question: “Can you provide a list of all third-party software and licenses?”

The team runs an audit and finds a copyleft dependency (GPL/AGPL) in the core workflow. It saved time early on, but the terms don’t match how the startup plans to sell and distribute the product.

They pause the deal and do three things in parallel: confirm the exact dependency/version and where it’s used, replace it with a permissive alternative and refactor the integration, and clean up the paperwork so the customer sees a complete, consistent story.

What they send back is short and specific:

  • A dependency list with names, versions, and licenses
  • A third-party notices file matching what ships
  • A note documenting the removed dependency and the replacement
  • A record of scan date and tool settings

After the deal closes, they add a license check to CI, require notice updates before releases, and maintain an “allowed licenses” list that matches their business model.

Quick checklist and next steps

Stabilize an AI-built codebase
Turn a messy AI-generated prototype into a stable build you can audit and release with confidence.

Treat license work like a release requirement, not a one-time scramble. Before customer conversations get serious, you should be able to answer: “What exactly are we shipping, and under what licenses?”

A pre-release baseline:

  • Complete inventory of what you ship (including transitive dependencies and bundled assets)
  • Pinned versions and known licenses (no “unknown,” “custom,” or “see README” left unresolved)
  • Notices ready to ship (third-party notices, required copyright statements, required license texts)
  • Reproducible builds (lockfiles committed and actually used)
  • A clear owner for ongoing updates

If you have a release in the next 1 to 2 weeks, prioritize notices and repeatable builds first. That’s what partners can verify quickly. If you have more time, add a dependency intake rule so licensing isn’t rediscovered during procurement.

If the repo is inherited or AI-generated, expect gaps: missing lockfiles, unused packages, copied code without headers, and build artifacts that bundle third-party files. When the codebase is already unstable, it’s often faster to fix the build and inventory first, then do the paperwork.

Getting help when the repo is messy or AI-generated

When a repository is inherited and half-working, license work can feel like guesswork. It gets harder when code was produced quickly by AI tools, because dependencies may be added without much review, snippets may lack headers, and shipped artifacts can include third-party files nobody remembers.

It helps to split the work into two tracks:

  • Legal decisions: interpreting obligations (for example, whether your distribution triggers copyleft terms, or how to handle past releases that missed notices).
  • Engineering cleanup: inventory, automation, and fixing the repo so you can repeat the process every release.

A practical pattern is that engineers produce the facts (dependency list, licenses, notices), and counsel reviews edge cases and approves wording.

If you need hands-on help to turn a broken, inherited prototype into something you can ship and defend during reviews, FixMyMess (fixmymess.ai) focuses on diagnosing and repairing AI-generated or inherited codebases, including refactoring, security hardening, and deployment preparation. A clean, stable build is often the fastest path to a trustworthy dependency inventory and a notices packet partners will accept.

Before asking anyone for help, gather a few basics so the work stays precise: repo access (or a sanitized export), your current build and release steps, a sample shipped artifact (image/zip/installer/bundle), and any customer questionnaires you’ve already received.

FAQ

What’s the first thing I should do when I inherit a repo and worry about licenses?

Start by assuming you need evidence, not assumptions. Identify what you actually ship (web bundle, container image, installer), then generate an inventory from the exact lockfiles and build output that produced that shipment. If you can’t reproduce the build, fix that first so the rest of the audit isn’t guesswork.

Why do customers ask for an SBOM, and what is it really?

An SBOM is a machine-readable inventory of the components in your product, including transitive dependencies and versions. Procurement teams ask for it because it helps them verify what’s inside your deliverable and whether any licenses or vulnerabilities create obligations for them too. If your SBOM doesn’t match what you ship, it can stall security and legal review.

Do I really need a THIRD-PARTY-NOTICES file if we’re using mostly MIT/Apache packages?

A THIRD-PARTY-NOTICES file is the human-facing place where you list third-party components and provide required attributions. Many licenses also require you to include the full license text and copyright notices with distribution, so a notices file often points to those texts bundled with your release. The goal is simple: someone can look at what you shipped and see you met the obligations.

How do lockfiles affect license compliance?

Lockfiles pin the exact versions you used, which is what license compliance depends on. Without them, you can’t reliably prove what was in a given release, and transitive dependencies can drift between builds. A missing or unused lockfile is one of the fastest ways to fail an enterprise compliance questionnaire.

If we’re SaaS-only and don’t ship binaries, can we ignore licenses?

Often, yes. SaaS changes some obligations because you may not be distributing binaries, but customers and partners still expect a clear inventory and proof you’ve met attribution requirements. Network-focused copyleft licenses like AGPL can also create source-sharing triggers even for hosted software, so you still need to know what you’re running.

Where do license problems usually hide in a modern codebase?

They commonly hide in transitive dependencies, copied code in vendor/ or “utils” folders, bundled frontend output that strips headers, and non-code assets like fonts or icons. Containers add another layer: your base image and OS packages may introduce their own licenses into what you deploy. An audit should check the shipped artifact, not just the top-level manifest.

How does AI-generated or prototype code change the license risk?

Treat it as a real risk until you can verify sources. AI tools can introduce dependencies quickly and may produce code that resembles licensed examples or snippets, and prototypes often ship with missing notices and inconsistent build state. The practical fix is to stabilize the build, generate a clean inventory from what actually ships, and then backfill notices and replacements as needed.

What should I do if I discover GPL or AGPL in a critical dependency?

Confirm the exact component, version, and how it’s used in the shipped product before reacting. If it’s in production code, the usual options are to replace it with a permissive alternative, isolate the functionality so it’s not part of a distributed combined work, or obtain a commercial license where available. Don’t rely on “we didn’t mean to” arguments; make the dependency story match your business model.

What should a “compliance packet” include for an enterprise review?

A good packet is short, consistent, and matches the release artifact. It typically includes an SBOM or dependency inventory with exact versions, the third-party notices and required license texts, and a clear scope statement explaining what was scanned and when. The fastest way to reduce back-and-forth is to document any exceptions or removals and show that your build is reproducible.

When should I get outside help, and what can FixMyMess do in this situation?

If the repo is messy, half-working, or AI-generated, you’ll often need engineering cleanup before you can produce trustworthy compliance artifacts. FixMyMess helps teams stabilize inherited codebases, repair build and dependency issues, and get you to a repeatable release process so you can generate an accurate inventory and notices that match what ships. If you’re under a deadline, bringing in help can be faster than trying to untangle broken builds while procurement is waiting.