@Barack_Embalmer - Lemmy

Barack_Embalmer@lemmy.world · 4 months ago

Everything about the exact timbre of your voice is captured in the waveform that represents it. To the extent that the sampling rate and bit depth are good enough to mimic your actual voice without introducing digital artefacts (something analogous to a pixelated image) that’s all it takes to reproduce any sound with arbitrary precision.

Timbre is the result of having a specific set of frequencies playing simultaneously, that is characteristic of the specific shape and material properties of the object vibrating (be it a guitar string, drum skin, or vocal chords).

As for how multiple frequencies can “exist” simultaneously at a single instant in time, you might want to read up on Fourier’s theorem and watch 3Blue1Brown’s brilliant series on differential equations that explores Fourier series https://www.youtube.com/watch?v=spUNpyF58BY

Barack_Embalmer@lemmy.world · 4 months ago

Yes digital media, and computers in general, are miracles of science and engineering. Is there some reason digital audio in particular inspires you in this way, as opposed to digital images?

Barack_Embalmer@lemmy.world · 4 months ago

Long list of numbers in sequence. Each represents how far away from equilibrium the speaker cone should be, at each point in time, as it vibrates back and forth.

Barack_Embalmer@lemmy.world · 5 months ago

I used to use FL Studio, but hated using Windows. I got almost all features (including VSTs) to work in Ubuntu under Wine, but had a problem with WineASIO, which I seemed to require to use the USB sound card properly.

Because of that, I since changed to a DAW called REAPER which is built natively for Linux and works flawlessly and is very nice. There is a program called Yabridge to help run Windows VSTs. I even got more complicated plugins with authentication like Addictive Drums 2 to work using Wine no problem.

If you want a fully FOSS solution there is Ardour which is also great but a little less slick than Reaper IMO.

Barack_Embalmer@lemmy.world · 9 months ago

That’s pretty disingenuous - it’s one of the many reasons that comprise a pattern of behavior whereby Microsoft makes Windows worse at each iteration. More bloat, more spying, more locked-down for user “security”. And for what? The dubious benefit of being “compatible” with other shitheel software providers like Adobe who use their monopoly power to stranglehold the corporate and professional media sectors? Toeachizown but IDK how anyone can use Windows by choice. The small amount I have to use it at work is torture enough.

Barack_Embalmer@lemmy.world · 10 months ago

I think it does look pretty cool. I applaud automotive design that dares to be different. Everything nowadays is a giant snarling grill with angry anime eye headlights up front, then a bunch of superfluous sharp creases and fake air vents to add visual elements for the sake of it. Tesla took a boldly minimalist approach with this one.

Before you crucify me, note that I don’t particularly like the vehicle overall - it doesn’t seem to be a design that translates well to mass production, practicality of maintenance, or pedestrian safety. It’s no Alfa 33 Stradale, but I think visual flare isn’t an area you can fault it much.

Rivian has done a good job of embracing EV design features (e.g. lack of need for frontal air intakes) in a more conventional way.

Barack_Embalmer@lemmy.world · 10 months ago

We tend to think of these models as agents or persons with a right to information. They “learn like we do” after all.

This is again a similar philosophical tangent that’s not germane to the issue at hand (albeit an interesting one).

I think you’ll see that if you only feed an LLM art or text from only one artist you will find that most of the output of the LLM is clearly copyright infringement if you tried to use it commercially.

This is not a feasible proposition in any practical sense. LLMs are necessarily trained on VAST datasets that comprise all kinds of text. The only type of network that could be trained on only one artist’s corpus is a tiny pedagogical tool like Karpathy’s minGPT https://github.com/karpathy/minGPT, trained solely on the works of Shakespeare. But this is not a “Large” language model, it’s a teaching exercise for ML students. One artist’s work could never practically train a network that could be considered “Large” in the sense of LLMs. So it’s pointless to prevaricate on a contrived scenario like that.

In more practical terms, it’s not controversial to state that deep networks with lots of degrees of freedom are capable of overfitting and memorizing training data. However, if they have other additional capabilities besides memorization then this may be considered an acceptable price to pay for those additional capabilities. It’s trivial to demonstrate that chatbots can perform novel tasks, like writing a rap song about Spongebob going to the moon on a rocket powered by ice cream - which is surely not existent in any training data, yet any contemporary chatbot is able to produce.

As far as science and progress, I don’t think that’s hampered by the view that these companies are clearly infringing on copyright.

As an example, one open research question concerns the scaling relationships of network performance as dataset size increases. In this sense, any attempt to restrict the pool of available training data hampers our ability to probe this question. You may decide that this is worth it to prioritize the sanctity of copyright law, but you can’t pretend that it’s not impeding that particular research question.

As far as “it’s on the internet, it’s fair game”. I don’t agree. In Western countries your works are still protected by copyright. Most of us do give away those rights when we post on most platforms, but only to one entity, not anyone/ any company who can read or has internet access.

I wasn’t making a claim about law, but about ethics. I believe it should be fair game, perhaps not for private profiteering, but for research. Also this says nothing of adversary nations that don’t respect our copyright principles, but that’s a whole can of worms.

We can’t just give up all our works and all our ideas to a handful of companies to copy for profit just because they can read and view them and feed them en masse into their expensive emulating machines.

As already stated, that’s where I was in agreement with you - It SHOULDN’T be given up to a handful of companies. But instead it SHOULD be given up to public research institutes for the furtherance of science. And whatever you don’t want to be included you should refrain from posting. (Or perhaps, if this research were undertaken according to transparent FOSS principles, the curated datasets would be public and open, and you could submit the relevant GDPR requests to get your personal information expunged if you wanted.)

Your whole response is framed in terms of LLMs being purely a product for commercial entities, who shadily exaggerate the learning capabilities of their systems, and couches the topic as a “people vs. corpos” battle. But web-scraped datasets (such as Imagenet) have been powering deep learning research for over a decade, long before AI captured the public imagination the way it has currently, and long before it became a big money spinner. This view neglects that language modelling, image recognition, speech transcription, etc. are also ongoing fields of academic research. Instead of vainly trying to cram the cat back into the bag, and throttling research, we should be embracing the use of publicly available data, with legislation that ensures it’s used for public benefit.

Barack_Embalmer@lemmy.world · 10 months ago

The advances in LLMs and Diffusion models over the past couple of years are remarkable technological achievements that should be celebrated. We shouldn’t be stifling scientific progress in the name of protecting intellectual property, we should be keen to develop the next generation of systems that mitigate hallucination and achieve new capabilities, such as is proposed in Yann Lecun’s Autonomous Machine Intelligence concept.

I can sorta sympathise with those whose work is “stolen” for use as training data, but really whatever you put online in any form is fair game to be consumed by any kind of crawler or surveillance system, so if you don’t want that then don’t put your shit in the street. This “right” to be omitted from training datasets directly conflicts with our ability to progress a new frontier of science.

The actual problem is that all this work is undertaken by a cartel of companies with a stranglehold on compute power and resources to crawl and clean all that data. As with all natural monopolies (transportation, utilities, etc.) it should be undertaken for the public good, in such as way that we can all benefit from the profits.

And the millionth argument quibbling about whether LLMs are “truly intelligent” is a totally orthogonal philosophical tangent.

Barack_Embalmer@lemmy.world · 11 months ago

That’s how the salesman guy got Homer to buy the Mister Plow truck lol

Barack_Embalmer@lemmy.world · 11 months ago

I have to disagree about that last sentence. Augmenting LLMs to have any remotely person-like attributes is far from trivial.

The current thought in the field about this centers around so-called “Objective Driven AI”:

in which strategies are proposed to decouple the AI’s internal “world model” from its language capabilities, to facilitate hierarchical planning and mitigate hallucination.

The latter half of this talk by Yann LeCun addresses this topic too: https://www.youtube.com/watch?v=pd0JmT6rYcI

It’s very much an emerging and open-ended field with more questions than answers.

Barack_Embalmer@lemmy.world · 11 months ago

https://zombo.com/

Barack_Embalmer@lemmy.world · 11 months ago

In a sense… yes! Although of course it’s thought to be across many modalities and time-scales, and not just text. Also a crucial piece of the picture is the Bayesian aspect - which also involves estimating one’s uncertainty over predictions. Further info: https://en.wikipedia.org/wiki/Predictive_coding

It’s also important to note the recent trends towards so-called “Embodied” and “4E cognition”, which emphasize the importance of being situated in a body, in an environment, with control over actions, as essential to explaining the nature of mental phenomena.

But yeah, it’s very exciting how in recent years we’ve begun to tap into the power of these kinds of self-supervised learning objectives for practical applications like Word2Vec and Large Language/Multimodal Models.

Barack_Embalmer@lemmy.world · edit-2 11 months ago

Some reading material:

https://en.wikipedia.org/wiki/Predictive_coding

https://en.wikipedia.org/wiki/Bayesian_approaches_to_brain_function

https://plato.stanford.edu/entries/embodied-cognition/

https://global.oup.com/academic/product/surfing-uncertainty-9780190933210?cc=us&lang=en&

Barack_Embalmer@lemmy.world · 11 months ago

Many modern theories in cognitive science posit that the brain’s objective is to be a kind of “prediction machine” to predict the incoming stream of sensory information from the top down, as well as processing it from the bottom up. This is sometimes referred to through the aphorism “perception is controlled hallucination”.

Barack_Embalmer@lemmy.world · 11 months ago

The Japanese SCMaglev only has the cooling stuff on the train, not along the entire length of the track.

And I think there is a “high-temperature SC Maglev” in development in China too.

Barack_Embalmer@lemmy.world · 11 months ago

Why no superconducting maglev tho?

Barack_Embalmer@lemmy.world · 1 year ago

ML has already had a huge impact on the world (for better or worse), to the extent that Yann LeCun proposes that the tech giants would crumble if it disappeared overnight. For several years it’s been the core of speech-to-text, language translation, optical character recognition, web search, content recommendation, social media hate speech detection, to name a few.

Barack_Embalmer@lemmy.world · 1 year ago

Some of the current thought on shortcomings of LLM capabilities actually takes influence from human cognitive science, and what can be learned from those with neurological impairments. It’s thought that human language abilities are strongly dissociated from other reasoning abilities because individuals with aphasia can lack the ability to speak or comprehend language, yet be able to solve mathematical problems, engage in logical reasoning, enjoy music, categorize objects and events, etc.

It’s shown that LLMs develop a crude world model for performing reasoning tasks, yet it’s inextricably tied up with their language functionalities (since they are ONLY language based). The hope for future research is to develop AIs with world models and planning faculties that are decoupled from the language analysis module, which would mitigate hallucination and aid in interpretability.

Barack_Embalmer@lemmy.world · 1 year ago

Free speech online doesn’t even seem to be a particularly well-defined concept. Those who extol it the loudest are often looking to have the millionth “good faith discussion” about The Bell Curve, or use slurs as “just a joke”, or promote a “dating and lifestyle coaching” business to teenage boys. If all they want is carte-blanche to say absolutely anything without being censored, I guess they only need to spin up a web server of their own, or run a lemmy instance. But what they actually want is to bypass the moderation rules on widely-used platforms and shit on the social contract. It’s the same reason they don’t show pornography, snuff footage, or other damaging content on television.

Barack_Embalmer@lemmy.world · 1 year ago

Tangential question - what is stopping youtube from restricting access to their API for 3rd party apps like Reddit did?