AimwardDrift: Alas, Machine Learning

Alas, Machine Learning

Thoughts and ramblings for my own purposes.

Under the hood, it's both computational and communications bound as ever. In practical terms, this means there's a point coming where mass computational bounds will kick in. What's economically viable to build for computers limits us all, but then that's in effect what ML was built to address, in some ways. Still kicks in, just at a new level.

I wonder what the effective "word" length here is, or will be? Think of letters, then words, then sentences as 1 point, 2-point, ... n-point basis functions. Are paragraphs then the n+1 limit? Essay length? Not in the sense of not being able to construct longer systems, more in the sense of repetitiveness, enforced periodicity by basis set limit rather than formal limit. "Perception of the machine" falls here.

Some years back, generic sports articles based simply on the line scores began to be generated this way, similarly with AP style news reports. "So and so won, so and so lost, here's the breakdown" kind of thing. Certain kinds of traffic and weather reports could as well be generated this way. Web pages, summaries.

Where's the error creep in, and how do you work with or around it? Garbage in, Garbage out always applies. In a purely numerical context, new algorithms can always be measured. How do you insure accuracy here? Replicability, too?

In a couple of the major fields involved, when asked long ago I made the comparison to the periodic table: meaning that what was missing was an empirical map. How does X relate to Y? Everything is foggy and dim; is it even possible to lay out a map in such flickering shadow and light? Here then is ML coming in with at least a possible construction.

Which is of course where the formal part began, or one way into it. Here's this arbitrary data set we know nought of. How does it relate to itself? What can we do with this arbitrarily large volume of presumptive knowledge that we don't yet understand?

Suppose you had a library accumulated by a sage since passed on. The sage was mysterious, crusty and cranky, and disinclined to tell anyone of their methods. Now your hands pour over old manuscripts in forgotten tongues, all organized, clearly, but in some fashion our old friend forgot to teach us. What do we do? We don't speak any of these languages, we don't know how our friend did it, what they meant by putting this scroll next to this codex next to this little sheet of paper much scratched and stained.

Let us consult the crystal; can it tell us where and how and why each text fits with another? Can it summarize for us what is contained there, and, better, which questions we can ask of which text? What if we could, then, summon forth both a librarian to organize, to systematize, and a scholar to help us understand what we have? And, perhaps, if we're dreaming, a new sage to add to the collection of knowledge?

This last is, formally, where we break down. "Artificial Intelligence" was/is market speak. Machine learning is what the experts preferred, though whether the distinction continues to be respected with mass adoption, I dunno. That aside, the difference is that the first two questions relate to transformations contained within a data set.

The third relates both to generalization between data sets, and to generalization beyond data sets. Crudely, interpolation versus extrapolation, though just like diagonalization versus singular value decomposition the equivalence is there. Still and all... asking for something new becomes the frontier.

Just like Wikipedia, scholarly communities will be obligated to query refine and strengthen a given instance, out of self defense. You'll need to make sure that if such a thing is out there it's giving correct answers. This took a long time to even begin happening with Wikipedia, and it's only done now in narrow instances. Professional obligations will expand; disciplines unused to programming should now understand that they'll need to require it.

Just like every other stage of computational development: does the computer do what I need done? Calculator, spreadsheet, web, can I get the answer I need? Can I trust the answer? It's a tool, how do I use it?

Listen: transformative work is transformative. That this allows automated transformation is irrelevant. The copyright office recognizes this, it also recognizes that the person using the computer to transform my work into something else, no matter what work they've put into the computer, isn't creating something in the same way as they would have if they had written it themselves. Thus, at present, ML-generated works are not copyrightable.

This has many implications.

First, cheap copies where someone takes one of my works, changes just enough of it to fly under the radar at Amazon or wherever, and tries to cash in, becomes untenable in the long run. Why would you need to do that if you can just ask an instance to generate a new work? Even if it's incorporating my work into the melange, so what, that's what would happen anyway, just in bits on a computer rather than the memories of the next generation of artists.

Second, it means there's going to an almighty fight when the media conglomerates realize what uncopyrightable means in this context. Right now, the media conglomerates appear to recognize that their catalogue has significant value in the brand new future.

And it does. WarnerBros or Disney or whoever appear to sit on the gold mine for training the next generation of ML machines to spit out branded media.

Sounds great. Each house will be able to perpetuate their secret sauce, down to the actors and voices and music and images... too bad for them its not creation in the artistic sense, and thus, for now, uncopyrightable. Neither is it something they can prevent others from doing. At least not if they actually want someone to view their product in the first place. If you can use today's actors in perpetuity, so can anyone else, sayeth the copyright office.

Which of course means that the media conglomerates are going to raise high holy hell when they figure it out. Gods preserve us. You thought they bowed when Mickey was threatened, look out.

For movies and music, assuming that no one manages to completely screw up all of copyright law by doing something "novel", I suspect the fine line that makes this work economically for the conglomerates is finding someone who can use the ML systems to generate as a part of something larger. In other words, ML systems as an element of a broader, complete artistic creation process. Like sampling only with broader extent than audio.

At the same time, there will then also be now video and written story equivalents of Muzak, only generated for airplane seatbacks or waiting rooms or whatever.

So, video and audio; Dylan and Simon and all the rest selling their catalogues, Cameron and Avatar 2, the last great cash grabs available before the previous financial landscape changes irrevocably.

What then of text? I'll use Stephen King as an example here, not because I know anything of what he or his heirs are planning, but because he's one of the primary household names in the written word.

Suppose that someone involved recognizes that King's life's work represents not just a present value, but a future value: in an ML world, all of King's works become the basis for future works, long after the author has left us.

If the copyright office says, great, fine, but it's still not copyrightable, is this life's work valuable in the instance of generating ML work in the future?

It is if you've heirs then capable of their own transformations and creative contributions to the eventual new work. Or, failing that, well able to hire it done. If we accept that conglomerates will find producers and directors who can successfully generate "based on" work to be monetized, then so too can estates find a combination of writers to generate "based on" novels and stories.

Only, now, without even the need to go digging for half-finished trunk books, or outlines, or notes on, or all the other ways they've done it in the past. The computer can generate that outline to order. And the estate can commission, or ask a son or daughter or...

So, then, thus: if there's now multiple generations of writers who "grew up" as Star Wars or Star Trek or "insert media here" writers for hire, the future will hold then estate-trademarked (because remember this: you can't stop someone else from using already published works to do their own transformations. But that is what trademarks help with if used properly...) Stephen King media writers, and Dean Koontz media writers. Think what Brandon Sanderson did with the Wheel of Time, but now perpetually and at much larger scale. No longer half a dozen at best, but like Tom Clancy's estate, over and over again as needed. At least for the 70 years after the original author passes.

And, this applies not just to someone of the stature of King or Clancy or Koontz. Imagine what will happen with the Song of Ice and Fire. Or the Name of the Wind. Or even your own works, you little writer you. Maybe there's room here, not just for your heirs to continue a little bit of money coming their way, but to even extend it a little. We can all make a little business for our kids to work in, even if how they do it doesn't quite resemble the way we did it.

So: there are creative ways that ML will be used to jump the uncopyrightable hurdle. Book it, it's already happening. And thus, the financial landscape will change, not burn down.

This provides opportunity. Protection, in that the silly cheap copy bullshit will likely fall away as unnecessary. And yeah, they'll be using your work, but transformative is transformative, you're never protected from that. It's actually better to have your work then be part of a much broader library that's built from. Then it's part of a stew, not a sushi bar. And some idea that the sort of Tom Clancy/Frank Herbert perpetual zombiehood now becomes a tool that any reasonably savvy artist can use for their heirs and assigns.

It's kind of a big deal, ain't it? And in a good way if you're ready for it.

The doom and gloomers here are missing the forest for the trees, especially on one big thing: there's always someone better than you at what you do. So what? That great orators exist doesn't stop me from speaking to those I must needs talk to. That Andres Segovia played and was recorded stops me not at all from picking up my guitar. If I need to sketch, I don't let all the much better artists and draftsfolks out there prevent me from doing my little cartoons up.

Art is communication. If you are not simply to be a consumer of art, you will have your place to go to when you need it. You have a way to let your voice ring out. No one can prevent it.

And, perhaps, just maybe, and with care, the computer will show you some more new options for how to pass your voice on to others. Find that inner 15 year old that doesn't give a shit what their parents say, doesn't look for a moment to whether it's worth anything, doesn't know or care who's done it before, damnit they're gonna make their own art come hell or high water.

That voice? That hand? That story, that song? It's always yours. It's always you. Embrace it no matter what. Let the worry warts go bother someone else.

AimwardDrift

Wednesday, January 4, 2023

Alas, Machine Learning

No comments:

Post a Comment