Old XKCD, still relevant

LainTrain@lemmy.dbzer0.com · edit-2 16 hours ago

It’s complicated.

I know Stable Diffusion best so I’ll speak to that, they used to the LAION-5B dataset, which is, in practice freely available to download and use:

https://www.kaggle.com/code/vitaliykinakh/guie-laion-5b-collect-and-download

https://github.com/opendatalab/laion5b-downloader

It’s also on HuggingFace but it’s unavailable.

https://huggingface.co/datasets/danielz01/laion-5b

But you can use this smaller newer version:

https://huggingface.co/datasets/laion/relaion2B-en-research

Whether it’s appropriately licensed is an unsolved question though.

The dataset itself and the text portion of the text-imags pairs needed for training is CC-BY-SA, the newer versions linked above are CC-BY-4.0. https://creativecommons.org/licenses/by/4.0/deed.en

The images however are technically under their own copyright, which in practice means each of the billions of images could or could not have a licence that implicitly or explicitly forbids AI training use or forbids it only for commercial use.

Whether such a license is legally binding is at present unknown though, since licenses primarily deal with reproductions, which the pro-AI folks argue isn’t the case, and that training of NNs is more akin to viewing an image and memorising the patterns and relationships within, like a person viewing it.

That would make it non-infringing and therefore the model itself libre. In that case Mistral and LLaMa are also libre as long as the model itself is open source, which in this case really means “open weights”, so not like GPT and anything by “”“OpenAI”“”.

Weights are the result of a model being trained essentially. They’re they key bit that makes it or breaks it and how it works. Given that and knowing the structure of the model and framework used you can refine, modify and distribute it.

Those against AI will say that it’s more akin to file compression and that in one form or another it’s misuse. That would make the model an infringing derivative work and as such nor libre even if the model weights are open source.

In a way though you could argue that me vaguely memorising the imagery of a dude dressed in white holding a laser sword is just a lossy compressed copy of the copyrighted work of Star wars, and it’d be absurd to think that’s a violation and that infringement only occurs if I reproduce a work of substantial similarity commercially from that memory.

If I use Krita and draw a beautiful landscape which has been informed and inspired by at least in part by a movie I saw, is that copyright infringement or not? What if I use AI?

Well, current laws don’t say. We measure infringement in substantial similarity, provenance of information only comes in later (e.g. to prove against accidental similarity).

That’s also my own personal stance on the legal side of things, so up to you how you see it.

LainTrain@lemmy.dbzer0.com · 2 days ago

Based db0 as always.

LainTrain@lemmy.dbzer0.com · edit-2 17 hours ago

He’s using windows.

But while we’re on the subject, ~/.local/share is cancer and shouldn’t exist.

The appropriate path is /usr/share.

EDIT: Okay to be clear I mean that anything that could be global should go into /usr/share and massively save on space and effort if another user needs the same stuff.

Anything that doesn’t need to be global doesn’t need to go into /use/share but somewhere else in ~/.

The way it is now my ~/.local is a massive dumping ground of crap from configs to static app resources that should go into /usr/share to entire applications with snap or flatpak (why I don’t use them) to random config files.

It’s just a nasty mess on my home partition when it in most cases really doesn’t need to be.

Users below rightfully pointed out many exceptions like venvs and while I still believe there should be a more correct place for them to go e.g. (~/.venv, ~/.flatpak), but obviously they shouldn’t go into /usr/share willy-nilly.

I have removed the sass below because I should’ve been more comprehensive in my criticism before ad-hominem.

LainTrain@lemmy.dbzer0.com · edit-2 2 days ago

Mistral? Deepseek?

Not LLM but also SD which uses a very popular free dataset.

LainTrain@lemmy.dbzer0.com · edit-2 2 days ago

Lolwut? Public good is self-entitlement? Go read a fucking book. Communists are not pro-copyright, especially not when it only benefits the giant corpos.

Another day, another entitled artoid larping as progressive blocked.

LainTrain@lemmy.dbzer0.com · edit-2 2 days ago

As a socialist I believe intellectual property is a falsehood and technological advancement should be for the public good. Open source LLMs are for the public good.

Given the options between having open source LLMs and the US Govt banning non-corpo non-proprietary LLMs and giving a free pass to people like Musk and Altman and Zucc to monopolize, I happily pick the former.

You’re delusional if you think they will pay anyone, the only way zucc will pay is with a guillotine.

Corpos will make inter-platform deals that’ll simply make all online data licensable for the right price and enrich each other so you can’t avoid it while still actually being a career creative, but price out academic researchers and the public sector so that all fruits of it stay behind closed R&D doors and be free of ethics etc.

Continuing in your role as a useful idiot, you’ll also most likely also foot the bill for it via subsidies from your taxes to “develop the AI sector” in some anti-China dick measuring contest by the US.

You will then be sold this data back via proprietary chat bots via a monthly subscription and you better pay up because once it gets really good, it will become mandatory to use for just about any job, leaving you with no choice.

Or you can support FOSS LLMs.

LainTrain@lemmy.dbzer0.com · 4 days ago

Jesus Christ this is Windows-tier insane computing behaviour from Ubuntu. Fuck Ubuntu.

LainTrain@lemmy.dbzer0.com · 4 days ago

Removed by mod

LainTrain@lemmy.dbzer0.com · 4 days ago

Unsubscribe from politics communities temporarily? Or do you browse by All or something?

LainTrain@lemmy.dbzer0.com · 4 days ago

But those are the “good 'ombres”

LainTrain@lemmy.dbzer0.com · edit-2 4 days ago

Which ones? What for?

Really the only service of such extreme convenience I can’t help but use it is cloudflare tunnels for quickly selfhosting and their cheap asf domain registrar.

Still, I don’t host anything via cloudflare per se even as a noobie.

Not sure what else you need.

LainTrain@lemmy.dbzer0.com · 4 days ago

No-ip’s DDNS is relatively wonderful.

Scammy in a sort of eastern european sleazy used car dealership way, I had to talk to their support once about a double charge and it felt like Roman from GTA IV was on the other side.

LainTrain@lemmy.dbzer0.com · edit-2 5 days ago

Fuck cloudflare, fuck the corporations and fuck their shitty centralised web, past present and future.

Long live the free and open internet!

LainTrain@lemmy.dbzer0.com · 5 days ago

So you’ve got these people eager to support candidates, but no candidates for them to support.

This is delusional. Majority of voters don’t think about this nonsense at all.

https://www.npr.org/2024/11/21/nx-s1-5198616/2024-presidential-election-results-republican-shift

For example, in Maricopa County, Ariz., home to Phoenix, Harris got roughly 61,000 fewer votes than Biden in 2020. Trump, on the other hand, gained about 56,000, for a 117,000-vote shift in just one county.

https://www.nytimes.com/2025/02/02/us/democrats-ipsos-poll-abortion-lgbt.html

In a broad sense, the poll, which surveyed a representative sample of 2,128 adults nationwide, found that Americans think the Republican Party is more in sync with the mood of the country. The issues that people said mattered most to Republicans were also, for the most part, the issues that mattered to them: immigration, the economy, inflation and taxes.

LainTrain@lemmy.dbzer0.com · edit-2 5 days ago

2025 is deemed the year of layoffs

Wasn’t 2023 the year of layoffs? Didnt like half a million people get laid off since 2020 per layoffs.fyi?

LainTrain@lemmy.dbzer0.com · edit-2 5 days ago

The problem is literally no one cares about Palestine. They should, but they don’t.

You chose the worst fucking hill to die on. Some brown people getting exploded in a place most people can’t find in a map on the other side of the world is nothing to an average working person compared to their rent, utility or grocery bills going up.

LainTrain@lemmy.dbzer0.com · 5 days ago

Had no clue this was a thing.

Darndest southern accent impression “Mos-caw Idaho, home of the sohv-yeaaaht yu-nyon and huckleberry pie”

LainTrain@lemmy.dbzer0.com · edit-2 5 days ago

Mfer at least y’all have proper winter with snow. In the UK it’s just rain and shit. At least we have no draft and a marginally better economy though.

LainTrain@lemmy.dbzer0.com · 6 days ago

But that’s the thing. I struggled too, so did I have it easier, or try harder? I humbly assume the former most of the time, but the sheer scale in question in this article plus the logistics of actually having to gamble that much make me wonder.

LainTrain@lemmy.dbzer0.com · 6 days ago

Seeding.

Can we archive the articles too? Particularly prevailing to LGBT topics? Right now as I understand they’ve only put hold on publishing those.

https://www.commondreams.org/news/trump-cdc

But in the future, retractions don’t seem impossible. I don’t want to duplicate the work if it’s being done somewhere already. Trans people are particularly easy to target, we should be archiving data from the American association of pediatricians too.

LainTrain@lemmy.dbzer0.com · 9 months ago

Old XKCD, still relevant

LainTrain@lemmy.dbzer0.com · 9 months ago

Does anyone know why SteamOS is based on arch rather than Debian?