Sunday, January 15, 2023

Ok, Machine Learning

Ok, Machine Learning

You are a scholar and a teacher. You're worried about these AI chat systems; you don't necessarily care that your students are using the thing. What you really care about is that if they do ask one of these systems a question, they get the right answer.

And, for your own research, you wonder if you can get a good answer to your own questions. How do you tell if they're any good?

First you go and feed one of these systems your own homework question, right?

Do not try this, at least not first thing out of the gate. You can be fooled by your own head if you try and "grade" the results without knowing what the system is doing.

Instead, try this. Ask it a question that looks and sounds like something Wikipedia can answer, then see if it does 2 things: do the answers corresponde to the Wikipedia page relevant to the question? And, just as importantly, does it use only the answers found in the Wikipedia page in question?

The first test is of course for accuracy. Note, I don't mean that the answer is quote for quote from Wikipedia, in fact it's better here if it doesn't quote pull directly. I just mean, do the facts and assertions match up to those of the Wikipedia page?

The second test is for completeness. Extra information here is not by default extra credit, and should be discounted unless you are dealing with a field you know well enough to find that information in a trustworthy, publically available digital source. This is a test for completeness: only trustworthy, creditable information that's publically available and verifiable independent of the chat system should be included.

And yes, you should also try this with "known shitty" internet questions. If you start seeing lunatic fringe answers in the results you know the system in question has not been evaluated completely for Garbage In, Garbage Out. Not all data sets are valid for the purpose presented.

You should also try this with other questions that, though you aren't necessarily expert in, you can readily track down both the Wikipedia page, and the top 10 or 20 field standard references to. This is a test for breadth of knowledge: has the system been built to fool you in particular?

And then, if you're ready for finding out if the system really knows its stuff, find out if it can do the same thing with a well-known review article in your field or one you're interested in learning...

You are an artist. Really, you're intrigued by whether these systems can work for you. And, deep down maybe you're worried that it's using your own art somehow. How do you know if the system is useful, first? How do you know that it's actually doing something artistically worthwhile, and not just copying in a hidden way?

First thing you do is feed it a prompt for one of your own artworks, right?

Don't do this first. Wait a bit on fishing for your stuff and try something else. Your eyes will play tricks on you.

Instead, try this: ask the system to reproduce your favorite Van Gogh. Or Rembrandt. Or whomever, just make it a public-domain piece that you know well. One that you've studied yourself.

How did it do? Now, find out if it can do Jackson Pollock, or Andy Warhol? And yes I'm serious, if it has Jackson's or Andy's work in its dataset, it should be able to reliably get to a named artwork. If not?

It's restricted in some way from reproducing that newer work. This can be good or bad depending on your view on copyright, but know that this means that, artistically, there's a hole in its view of the world somewhere. Whether or not its useful for your purpose I'll leave to your artistic mind.

Depending on how well it did with a newer, name artist, now is also the time to ask it if it's capable of producing one of your works. Then, if you're interested in how well it works under the hood, go on to find out how it combines two well known works to produce something you haven't seen before? Here's where you get to judge whether or not it can do something useful for you. What would have happened had Annie Lebowitz been able to work with Ansel Adams? How would Picasso have done the Sistine Chapel? What would Van Gogh's Forty Views Of Fuji look like?

You're a pro musician: you're booked. Can you use one of these systems to compose, produce? How do you know they're doing something useful and not just sampling?

First, ask it to reproduce a piece you know, and not one of your own. Bach, the Beatles, listen widely and deeply.

Did it work for all of your tests? Get wild: pick one and ask it to change the key. After that, ask it for a different rhythm.

Note: depending on what the algorithm is doing, these two questions in particular can be either very easy, or very nearly impossible. If it does work, then they're doing it properly (ie. signal analysis is involved at the important levels). If not, it's sampling in an obscured way, in which case you can ask it for your own works with a completely different purpose in mind.

The point being: an expert system that is only sampling (Type 1) has its uses. However, an expert system that can actually morph something properly (Type 2), like a key change or a samba to four on the floor rhythm change, now that's a different tool entirely. And, fundamentally, there's a very real difference in what's going on under the hood between the two: a sampling machine that reproduces one of your own works is straight up copying.

A music-signal analysis expert system can get to your work through a different route entirely. It sounds weird, but this kind of system may indeed know you well enough to reproduce something you wrote without directly copying.

In fact, this applies to the artist, the musician, and the scholar as well: if you find a system that can quote you, or that can reproduce one of your works, whether its a Copier (Type 1) or an Analyzer (Type 2) matters. Type 2 systems are the most useful, the most properly constructed, and the most likely to be capable of reproducing your work without directly copying it.

At least in the immediate gold rush mentality that always accompanies new tech, I would suspect that we'll see quite a few Type 1, Copier, systems, because it's one of the easiest ways to take computational and data analysis shortcuts that allow those in a hurry to produce something that can fool people into thinking they're dealing with a Type 2, Analyzer, system. But as with sampling as it already exists, Type 1 systems that can reliably re-word known information very much have uses, if in a quite different manner than do Type 2 systems.

No comments:

Post a Comment

Please keep it on the sane side. There are an awful lot of places on the internet for discussions of politics, money, sex, religion, etc. etc. et bloody cetera. In this time and place, let us talk about something else, and politely, please.