Focusing on legal aspects of AI

Transcript from October 18th Deep Dive: AI Legal panel

Stefano Maffulli:

All right, Well, thanks everyone, and welcome to Deep Dive AI. This is the third panel of an event series organized by the Open Source Initiative. We started with a podcast series with exploring how artificial intelligence impacts open source software from developers to businesses and the rest of us. And the objectives of the panels that we – or the series, the series of panel interviews is to better understand the similarities and differences between AI and classic software, Let’s call it that way, particularly open source software. Today’s panel is the third of four discussions. The next one is gonna be on Thursday and will be the final one. I’m Stefano Maffulli and I’m the Executive Director of the Open Source Initiative. And today I’m joined by Pamela Chestek, who’s the principal of Chester Legal in Raleigh North Carolina. She consults creative communities on open source brand marketing and copyright matters and prior to returning to private practice, she held in now’s position on, on footwear, apparel and high technology companies. And she’s she’s a frequent author of scholarly articles and you’ll find her on our blog. She’s an expert in intellectual property case law and certified by the North Carolina Board of Legal specialization trademark law. And she’s also a member of the OSI board.

Stefano Maffulli:

Thank you Pam for joining and you’re muted. So, Danish Contractor is an AI researcher working on problems in multi sentence question answering and dialogue systems. Very on point for the conversation today. Danish also chairs the Responsible AI Licensing Initiative and is also chair of the IEEE-SA working group on the responsible AI licensing. He’s also served as co-chair on the model governance working group at Big Science, which is an initiative of Hugging Face I guess, or hosted at Hugging Face. And he’s also was also named as the Top Innovators, one of the Top Innovators under 35 in India By the Indian Institute of Technology Daily, Oh, sorry, by the MIT Technology Review and Mint. Very, very impressive curriculum. Thanks for joining Danish. I’m very pleased to have you.

Stefano Maffulli:

Then we have Jennifer Lee. Jennifer is the Technology and Liberty Project Manager of the American Civil Liberties Union of Washington State. She advocates for state and local legislation to regulate powerful surveillance and AI based technologies. She leads the working group for implementing community focused policies related to technology, privacy, and civil liberties. She’s working with researchers, activists, and technologists to develop for communities the capacity to build a counter surveillance, I would say, and AI policy toolkits. So, thank you, Jennifer.

New Speaker:

Finally Adrin Jalali who we have misspelled his name on on the, on the panel here Jalali with an I. He has a PhD on machine learning for cancer diagnostics, and consultant for different companies on focusing on algorithmic privacy and fairness. Is also working currently for Hugging Face and where he maintains libraries related to fairness in machine learning.

Stefano Maffulli:

And ML Ops. Is also a core contributor to the open source packages scikit-learn and fair-learn. He’s also a member of the technical committee, contributor of NumFocus, the nonprofit supporting open code for better science. He’s also an organizer of PyData Berlin. Thank you, Adrin. Thank you for being here. Very pleased. So today I’d like to focus mainly on three topics. One is the AI, the fact that AI introduces new artifacts. And I’d like to understand the role of the current legal frameworks for intellectual property for these artifacts. And another topic is with AI and machine learning requiring lots of data. Copyright claims are appealing from every front, including unsuspected ones. What are the alternatives, if there are any, or are we going down the right path? And finally I’d like to talk about the governance and uses of AI systems and these individual artifacts, why it’s important and what’s the role of developers, society and regulators for this?

Stefano Maffulli:

So let’s start from the beginning. AI doesn’t seem to have, you know, doesn’t have source code or executable code like classic software. And with these new artifacts starting from datasets or models, the way it’s the outputs generated by the models themself, copyright seems to be applied now to all these artifacts, despite the fact that when software, let’s call it classic software, was introduced, it was, it was a conscious policy choice to go that way. So what are – what’s happening here? What are, is this a good choice to apply copyright and to models to all the pieces and components, or what are the options we have available?

Pamela Chestek:

I’m happy to jump in on that one. I wanted to first sort of convey this sort of interesting exchange I had with a client yesterday who was attending an I dunno, machine learning or AI seminar. And she said, Oh, the speaker for IBM said that the datasets used to train AI, that’s fair use. And I said, Well, not so quick. I don’t think, like, no, he’s definitive. And it all comes out of basically this one, you know, Google, Google’s use of data in order to provide its search engine results was, was where this comes from. So, and since, since I had that conversation with her, I saw a Twitter thread that was sort of, you know, questioning that thesis. I saw there was a blog post that the R I – the Recording Institute of America is, is claiming that the use of their music for training is a copyright infringement.

Pamela Chestek:

So first off, just even on that very fundamental, that first piece of this data that we use, is it fair use? Which, you know, I think that’s really a big question to start with. I personally would not advise my client who’s gonna build a business on it and say, You’re home free. It’s fair use. I don’t, I think there’s a lot, a lot to, a lot to go into sort of this, your second premise that copyright covers all of this.I don’t think that’s been explored in any sense or way whatsoever. I think everybody’s just kind of speculating and assuming, but when the courts get to this, I think there will be a lot of unpacking because of this premise that ideas and functions are not copyrightable. I think that it’s gonna tease out and, you know, maybe it will be a matter of how good the advocacy on it is in favor of, of, for or against the fact that this is, is covered by copyright. I just, I think that’s still, we just don’t have a clue whether or not where and what pieces of this copyright will will apply to

Stefano Maffulli:

Danish. I see your mic up

Danish Contractor:

Yeah. So I think that that’s fair. You know, like it’s, it’s an area to be explored in terms of, I don’t think we’ve got definitive answers of what can be covered at the copyright and what can’t just yet. But I think at the end of the day, we have as researchers, we are releasing code data models and applications every day. And they’re, and it’s just by precedent that we’ve been all releasing them under open source licenses. So I guess if we unravel the debate on what’s really copyrightable or not, I think we’d have to go across everything that, you know, the software industry has been doing. And I think that could play a part in how that shapes up over time.

Stefano Maffulli:

There is definitely that. I saw that Twitter thread last night too. It was fascinating because there is, like Danish was saying, there is already a huge amount of conversations, not only conversation, but also elements artifact, let’s call it artifacts released assuming that the copyright applies. And what’s surprising to me is that the conversations we are hearing from people who have been promoting open access, open data, open science, and the Motion Picture Association sound start to sound very similar. Like there is, there is on one hand the intention to progress and to collaborate. On the other hand, there are restrictions that are being lifted all around with different explanations. Like Jennifer, one of the, one of the conversations I’ve heard, one of the topics I’ve heard mention multiple times is these balance of power like this fact that there is so many picture, for example, so many pictures available to people that have been data mind that have been and these are being used for surveillance like for nasty uses by either government or bad actors in general.

Stefano Maffulli:

So what are your thoughts on this front, like this massive availability of data?

Jennifer Lee:

I mean, it’s really concerning and I think, you know, whoever holds these data sets, whoever’s collecting the data you know, however it’s being used. I think when we think about developing new tools and collecting information, we really need to be thinking about the end impacts on who’s going to be the most harmed. And I, you know, though technology has advanced tremendously over time, I think it’s really important to remember that surveillance isn’t a new concept. People have always been surveilled over history and new ways of gathering enormous amounts of data just make it really easier to target historically marginalized communities. So, you know, one thing that we’re working on with the ACLU of Washington with our tech equity coalition is to bring that history of surveillance to the forefront of any conversations we have about technology development, deployment and regulation.

Jennifer Lee:

I think also this topic of data collection in conjunction with automated decision systems and algorithms is intricately tied to the conversation on data privacy and how we can stem the flow of data, data being collected and shared by both government and corporate actors. So there’s, it’s really interconnected. I think the surveillance laws that we’re working on, the privacy laws that we’re working on and the AI regulation laws we’re working on are, you know, they all impact each other. I hope that answers your question. Happy to elaborate.

Stefano Maffulli:

Indeed. And Adrin speaking of – okay, go ahead. I see you raise your hand.

Adrin Jalali:

Like another thing that I I think about going back to, to the kind of licenses, not every data that is used has no license or is vague. For example, when you look at GitHub’s Copilot most probably, most of the code used, most of the data used to train that model is licensed. And to me, when I license my software, I’m giving a sort of consent. It’s like I’m allowing people to use my product, my work in certain ways. And I don’t think I’ve ever answered the question, do I think that another organization should be allowed to train a model and make a profit out of it from the code that I have? And those things are licensed. It’s just that in those licenses, those questions are not answered. So another question is both with creative commons and the open source licenses, should we go and answer these questions in those licenses as well? Or are we just waiting for the court to come and say, according to that license, this is fair use? Or you’re allowed to do that or not.

Stefano Maffulli:

Pam?

Pamela Chestek:

Yeah, I think the, you know, relying on fair use, I think this discussion sort of elaborates, sort of illuminates, why fair use is sort of, it’s not maybe the right bucket or it’s not a great bucket because so as because, so for example, what Jennifer – there’s a huge difference in my mind. There’s a huge difference between collecting data on people’s images and then using that for the police, for example, using that to identify someone. They’re not doing any training with that. They’re just copying those images for – there’s no transformative use, which is sort of a primary piece of fair use. There’s no transformation there. They’re just using it for comparison purposes. So I think the fair use claim for that kind of use is quite different from a machine learning fair use. On the flip side, kind of to respond to Adrin yes, a lot of this stuff is licensed, but the problem with the license is you know, that is someone who is, who has agreed to use this data, and that isn’t necessarily going to be a great data set to be used to, to use for training.

Pamela Chestek:

So for example, you know, if copilot is trained on open source on all open source license, but no proprietary license software, is that really a great way to do it? Is that going to be, is that great, is that the best way to train the model? Is that a model that’s going to come out to be a well trained model? So, on one hand, I understand and I believe this is happening in other countries. I understand the value in not allowing the, the sort of owner of that data or the owner of those copyrighted works to have a say in whether or not their works are used for training. Because the training requires, you know, a reliable data set and if you are and if, if you’re only getting it voluntarily, maybe that’s not the best choice.

Pamela Chestek:

So I just wanted to kind of tease out sort of the problems on both sides of, you know, choosing, you know, of relying on fair uses. It would actually, I think, be great for Jennifer’s case. Cuz I don’t think that’s a fair use case to say that these images can be used for comparison. But maybe it should be for machine learning. So maybe fair use is the best way to tease that out. I don’t know. But fair use, again, notoriously difficult, only courts know when something is a fair use. You only know at the end of the lawsuit whether or not it was a fair use and you just spent, you know, if you’re Google and Oracle, you just spent a hundred million dollars to find out the answer to that question. So anyway, just pointing out sort of the legal problems we’re facing,

Stefano Maffulli:

Those are non-trivial legal problems. But I wanna go back a little bit to the original thought that I have that in the end, copyright was a sign consciously, it was some somewhat of a conscious decision to decide that software had to be covered by copyright. And it took like 15 years before a court case basically settled the argument in the United States. Now we are, like Danish was saying, we are basically rolling with the idea that we can assemble data sets and release them with licenses that have been threatened with the and even specify sometimes the concept of source code or executable code. And these artifacts don’t seem to matter that much. Are we really going in the right direction here or shall we stop and think and maybe come up with policies, suggestions? Danish, what are your thoughts on copyright? Is that, you know, acceptable? What, what do you hear from your communities?

Danish Contractor:

So I think, so, I’m not a lawyer, so I’m not sure, you know, like what the principal stance is on how copyright has been interpreted in different judgments. But I think as a community it’s, we’ve just been accepting that these are copyrightable artifacts, right? Because otherwise, if you’re applying creative commons licenses or data sets, if you’re applying apache 2.0 licenses on models, if you’re applying RAIL licenses more recently, there is an implicit assumption that the community has already made that these are copyrightable artifacts. Now, if that were to change either by law or by a court judgment, then I guess this would turn a lot of the arguments that, you know, the whole community relies on upside down. So like, I don’t know, I think the community as a whole has more or less accepted that a lot of these are copyright.

Danish Contractor:

But I think only now that we are starting to see generated models, for example, use art or code in some instances of, or without consent for particular applications. Is the question of copyright coming for even more, because if I’m copying somebody’s artwork in a particular style, do I have, is it transformative work? Am I, did I have permission to be able to do that? Was that permission explicitly granted, I think that’s what’s really leading to the discussion on copyright, because otherwise when you know, for all these years we’ve been releasing bill sets models, code with copyright and as a given.

Stefano Maffulli:

Right, bringing a very interesting, you know, widening the topic here and talking about the models and the output of those models also they involved conversations around copyright and what’s protectable. What are your – where are your thoughts, Pam, on this front?

Pamela Chestek:

Yeah, I –  it’s interesting, I think it’s an interesting social, I think it’s very different in terms, again, I guess, and I’m still focused on the data used to train models we haven’t even gotten yet to kind of the discussion on the rest of it. So, you know, the data to train models, I think, I don’t think there’s a one size fits all answer because training models on artwork, artwork is very clearly copyrightable and no one would dispute that. What about training a model on weather data, for example? So that’s, I don’t, nobody would say that the weather data is copyrightable, the compilation of the data – the compilation of it, no data point is copyrightable compilation maybe to that’s true under US law, true under EU law. But, but it’s, you know, may not, it may also may not be it would be a very small a very tight copyright that would, you know, a very thin copyright we would call it on if there is and only on, so it would only be on the selection and arrangement and coordination of that data, how that data is assembled, which you then, you know, unpeel to, to do the training.

Pamela Chestek:

So even kind of looking at it that way, and I think Danish, you know, what we see as a manifestation here of, I’m gonna put a license on it, on the assumption because it’s beneficial, because then people are clear, They don’t have to ask, they have to worry about this question, is it copyrightable? They don’t have to worry about is it okay if I use this? They have the answer for it. Because the person who has control of that data has expressed an opinion on it, and we can rely on that opinion and we can rely on that opinion in court, for that permission. So that’s one of the reasons that those licenses get applied. I think that’s, it’s, it’s net positive in this case. We can talk in other situations about, or the appropriateness of applying licenses to stuff that should be freely available to use for everyone. But it makes sense because then, you know, if you want people to use this data, they know they can. So that’s very beneficial. So I think applying a license, you know, is the safest thing to do in this world of we don’t know what’s going on.

Stefano Maffulli:

Adrin?

Adrin Jalali:

I think we can also look at different areas where these discussions have been had for a while. For example, when you look at healthcare, if I take DNA samples from a bunch of patients and I develop a drug, do I, can I own the drug which is derived from the data that clearly I don’t own? It’s somebody, somebody else’s DNA, or if I go to the doctor, can that data then be used by researchers to do healthcare related research? And different countries have very different approaches on that. Like I think if you go to Denmark by default, like researchers can use that. If you go to like other countries, they can’t. They have to have the explicit consent. And one to me, one of the resistance that we’ve had these discussions is because of the potential harms of using and leaking using healthcare related data and that data being leaked. Whereas we haven’t been having that discussion on voice and image and like I dunno, everything that people produce because we haven’t necessarily thought about the potential harms, but these harms are now real deep, fake is very real. Producing somebody, like producing art, using somebody else’s art is really real. And these have either financial or reputational harm like to people somehow the part of the community working on this part is not necessarily connected with these potential harms as much as the healthcare communities, or at least that’s how I feel.

Stefano Maffulli:

Jennifer, what do you see in terms of potential harms also? Like what are your thoughts on this?

Jennifer Lee:

I mean, financial and reputational harms are definitely just a few of so many harms that arise from non-consensual data collection. Whether or not that’s used by companies or just individuals trying to use that data for whatever purpose. Those harms can lead to stalking, domestic violence. It can lead to police violence, just like data can be used in so many harmful ways that just can seriously lead to life or death consequences. I think the healthcare example is a good one because we do have strong healthcare laws, but you know, currently we don’t have healthcare laws that cover like non HIPPA covered data that’s like data with healthcare apps or just, you know, fitness apps or just data collected location data collected by your phone, even via a weather app that could, that could be used for health purposes.

Jennifer Lee:

That’s concern, that could be data used to track people who seek abortions, for example, in the US. So I think thinking about the harms is really, really important when we’re talking about, you know, whether or not we’re talking about proprietary algorithms or open source algorithms or, you know, what types of datasets there, what is the end result of the impact of how that data’s going to be used. And I think that’s something that companies, governments, and individuals all really need to be thinking very carefully about. And you know, to answer your earlier question about policy regulations, I don’t think that in the policy space where even at the point of thinking about regulating different types of data sets or different types of algorithms like open source or, or not. So you know, I think the conversation about AI regulation is just starting. People have been talking about it for years, but in terms of policy, like actual laws around regulation, at least in the United States, that’s something we’re getting to just broadly.

Stefano Maffulli:

Right. No, there is definitely a lot of action on, on the regulation front, and I think we’ll, we’ll get back to that conversation given towards later on because I, I’m interested in diving a little bit deeper, and I see a little question also from the from the chat here about the another artifact that is, that is being distributed and is being applied copyright. I don’t know if you, if anybody wants to, wants to take it from the, from the chat.

Pamela Chestek:

Yeah, I’m, I’m happy to, I’m happy to. Cause I would like, I’m really curious, like I think that the datasets is, the datasets is sort of the, what, what’s most familiar to us, so maybe it’s the easiest to cope with. And so I, for the benefit of people who aren’t reading the chat, Emily wrote, I’ve seen copyright licenses applied to values for parameters of trained ML models. I was under the impression that this application of copyright would be similar to data, which is thin or none, As Pamela mentioned, I’m curious what the panelists think about whether copyright is a suitable legal protection for parameter values for ML models. So that, this is where I’m sort of going back, going back to what I originally said was, we don’t even, you know, we don’t we have, I have no idea.

Pamela Chestek:

I mean, personally, my view would be, no, I don’t, I don’t think that, I guess exactly what Emily says is if there’s any, I don’t, my inclination is there’s no copyright protection of parameters, copyright, creative works. So, you know, if we go back to that very sort of fundamental principle of copyright, what’s it for? I’m like, I, you know, I don’t know whether I sort of doubt that parameters would be covered by it. And my question back to Emily was who was doing this? Like and kind of go back to what I said to Danish, which is, are they doing it out of good faith to try to, you know, to put a license on it so that everybody understands, or were they doing it to be exclusionary? Which I think, which is also going to be the case, is people are going to try to claim some kind of, or exclusive, right?

Pamela Chestek:

You know, they’re gonna be over, over expansive. And that’s where the courts would theoretically come in and say, No, actually, you know, that’s not entitled to cover it. There’s no way to protect this. You know, just for example, parameters. And Emily replies, the example I’m thinking of is AlphaFold. They had the parameters subject to a Creative Commons license with noncommercial use restrictions, but recently changed it to allow for commercial use. I’m not familiar with AlphaFold, but that kind of you know, that kind of illuminates what I’m saying is, you know, what, what was their motive was benevolent or was it, you know, exclusionary.

Danish Contractor:

So I think you know, we have to just think about this or broadly, right? Without going into the specifics of say, AlphaFold or any particular machine learning model. So when you’re doing machine learning, what are you doing? You’re taking your data set and you are training a bunch of matrices in, you know, in today’s terms and getting values for those matrices that result in certain outputs, right? Effectively, that’s really what’s happening. And you’re learning what those matrices should do, and some other functions basically to just transform your input data in a way that’s stored in some mathematical values and numbers, which then lets you do what you’re trying to do for your end task. Now, I don’t know whether courts would view this as a transformation of data or not. It’s hard to say. But then this, the fact that I have got this model with certain values is something that I have got after figuring out what architecture, which is what code blocks I want to use, what what data I want to use, how long do I wanna train this, what is, what should be my learning rate, what should be my bat sizes.

Danish Contractor:

I’ve done a lot of thinking behind how to create this model and the parameters values that’ve been satisfied with for a particular end task or for whatever I’m evaluating a model for. And that learned state of a model is basically what we call a model when we are releasing a model as we speak a, as we call it. So now the question of copyrights. So, this artifact is not something that, you know, I could have done or how I could have done without really spending all of the time and effort that I just described. And to share that particular artifact. We could call that maybe software, maybe that could be one interpretation, or maybe it’s the form of transformative data, I don’t know. But it is an artifact nonetheless that can be shared and distributed. And if I’m doing so I guess, you know, like you said, Pam, it’s not something that we know how, how courts would view, but it’s, it’s still an artifact that researchers tend to distribute as a whole.

Danish Contractor:

And just by norms of the community, they have been attached with licenses that view them as copyrightable. Now, once they’re viewed as copyrightable by the community there have been open licenses, there have been licenses that restrict certain applications just because of the harm they could do. For instance the big signs Bloom LLM anticipated certain harms from a particular model and attached some restrictions on use on, on the weights of the model. I believe Stability AI also has done, has done some, and there are a whole bunch of other models that have also been released open source, some have restricted commercial use. It’s not just AlphaFold, OPT-175B from Meta has done this and, and a whole bunch of other models have also applied the same paradigm or approach to copyright as researchers view these as copyrightable artifacts. Yeah, I guess that’s what I would say.

Stefano Maffulli:

Yeah. Thanks. Adrin?

Adrin Jalali:

I think we also can’t completely separate the license of the model, the weights of the model from the license of the dataset, especially for very large models. Those models are in effect a database. They, they, they are really good at memorizing the data. When you go in the, in the, like the privacy field, you can extract a lot of the dataset from just the weights. Which is why, like, like, a lot of privacy concerns have been raised from there. And therefore, if I don’t, like, if I don’t have a dataset that I can release, then the question is, can I release my model? If, if people can extract a lot of information from that model, and I know that that is causes some harm, or I don’t have, I’m not allowed to release the data, how can I then release the model?

Adrin Jalali:

That’s one aspect that I think we don’t necessarily have answers to. The other one is we have talked a little bit about use. For example, in the OpenRAIL license, we talk about like, what are the uses that we want to avoid and what are the harms we want to avoid, but what are the types of modifications that we would like to allow or avoid from having? For example, if I have a model that I put some safety mechanisms in, for example, these days we talk a lot about certain biases that are like creeped into the model. And imagine that I could have mechanisms to avoid those. Can somebody take my weight and release my model? Can somebody take my model and remove those mechanisms and create a really harm, harmful, really biased model as people do? For example, we have this bottle that like goes and like generates, gets like further fine tuned on a terrible data set and starts being about producing really terrible content. Could I avoid that? Can I avoid that by only limiting the uses? Or can I start talking about what are the types of transformations that people are allowed to do on this model that I’m releasing?

Stefano Maffulli:

You’re introducing two wonderful topics that I wanted to talk about. One is the concept of harm to these AI models that or systems that I hear mentioned very often as of necessity to create special cases for AI different from any other dangerous tools that we have deployed in the past. So who, who of you wants to take it you know, introduce this concept? Why is AI so much more harmful than anything we’ve seen before?

Danish Contractor:

Without saying whether it’s helpful or not, it’s very different from software that we’ve seen previously, especially when it’s machine learning software. There, you know, so a lot of the times when we think about restrictions on use on AI systems a question that props up is what’s so special about ai, right? What, what, we could have done this with software harm can be derived from software. I could have a simple sorting algorithm that could sort people by height, and I could just have a threshold to say, I’m not going to allow people below six feet to apply for a job that’s harmful, right? And I have not used any ai, and it’s a simple source code. Now, do you wanna license sorting algorithms with restrictions of use? That’s,

Stefano Maffulli:

Yeah, I heard more specifically comments around software that guess is password, like password crackers or anything that is related to security research where, you know, these are dangerous tools and they’re freely available without going into conversations about, you know, gene editing or other technologies that are potentially harmful but still regulated in different ways than than software.

Danish Contractor:

Yeah, so AI is not regulated, right? It’s hard to define what is harm for regulation. So now in, in the interim period, right, you do – we do recognize that AI systems are different from traditional software in the sense that at least machine learning based systems, you don’t have the same amount of testing that you can have with regular software systems. So for instance, if I knew that my regular source code was having a particular error, I could probably reliably fix it once it’s identified. Now, if you tell me my generative model is producing a harmful piece of text, stop that. I don’t know whether as a machine learning creator, I know how to do that. I can maybe suppress that output, but if I try retraining, I don’t know what else I’m gonna be breaking.

Danish Contractor:

And there is nobody who can give you those guarantees about what’s happening. Even things like confidence. You can’t even be sure that if you want to have thresholds around confidence, that’s gonna be a reliable measure of restricting harm. So I think because of the fundamentally different ambiguity that you have with the operation of AI systems and the lack of guarantees, or even quantification of how good or bad an AI system might be for particular use cases barring some evaluations, some test sets which already have some bias built to them, there may not be reflective of real world harms and so on. It’s different, which is why as machine learning developers and creators, we need to anticipate possible harms, even if by even if, based on the limitations of the work that we’re developing.

Stefano Maffulli:

Right. Adrin I was going to you because I wanted to hear about your fairness. How do you assess that? But go ahead.

Adrin Jalali:

Oh, before even assessing I think one fundamental difference between AI and software or AI and humans is we, humans are very causal creatures. We understand those causal relations, and if you tell me why you make a decision, then it’s much, much easier and much more intuitive for me and a society to decide whether it was fair or not. Whether that was a harm that was okay for me, like, was it okay for me not to hire you or not? Whereas when we talk about AI systems, in most cases, we can’t necessarily interpret them. We don’t necessarily have tools or we don’t use the tools that would give us the explanations on why a system made a decision the way they did. And I think that’s where regulation would be really, really useful. Cause as long as this is not regulated, I don’t see companies going and trying to figure out, okay, like when I have a model, I have to also have the explanation.

Adrin Jalali:

If a customer comes and asks, Why did you not give me the loan? I can just tell them, Well, the computer said no and nobody questions that. Whereas if I force everybody, no, you have to give an answer to that question then I think the field would move towards a very different direction and we would be much more comfortable regulating them. But I don’t think this is something that would necessarily go in the license. Like when it comes to harm, to me it’s a, it’s a lot more of the matter of regulation than it is licenses. We shouldn’t be doing certain things no matter whether the person releasing that software or that model was okay with us doing that or not.

Stefano Maffulli:

I saw Pam and Jennifer

Pamela Chestek:

Actually, I was – I was going to, Am I frozen?

Stefano Maffulli:

No, no.

Pamela Chestek:

Okay. I was going to, I was going to ask a question for Jennifer because this is, and Adrin sort of led, it was kind of leading exactly where my question was, is sort of what is the, what is the role of regulation and what is the role of, you know, an individual one-on-one relationship of a license and, you know, how do we decide where the appropriate control for harm should lie?

Jennifer Lee:

That’s a really good question. And when the question was asked about how do we define harm, I, you know, the first question I thought of was, who actually gets to determine what’s harmful? Is it, you know, is it regulators? Is it the developers? Is it people who are using these technologies? It’s typically not the people who are actually experiencing the harm. It’s usually a very top down approach that you know, I think leads to the exacerbation of existing societal biases and existing harms. And you know, as it was mentioned, I think it’s, it’s concerning because the use of these types of systems often hides and legitimizes biases that the people who are developing, regulating these systems may take as the status quo, may take as a norm when those are norms that are quite harmful to individuals.

Jennifer Lee:

You know, I think of harms. So I should just give you a little bit of context. So I mentioned the tech equity coalition earlier, and these coalition members are not technologists, many of them are not lawyers or policymakers, they’re people who have lived experience of harms caused by surveillance and automated systems and artificial intelligence and just technologies in general. And, you know, I, I think that there’s a disconnect between people who are trying to regulate these systems via either by licenses or by laws or litigation. There’s a disconnect between people who are experiencing these harms and, and saying, Don’t build this technology. Not just like, how do you mitigate those harms? A lot of people are saying, don’t build it at all. You know, don’t use this, don’t use data. You know, for us, we don’t have control. And, and that’s really harmful.

Jennifer Lee:

A lot of volunteer developers who are using open source data sets are, are not going to be from these communities either. So I think there’s a larger question of power, and it’s, it’s, you know, it’s not an easy fix. It’s structural, it’s societal. But I think regulation can go a long way in mitigating some of those harms. Requiring transparency and accountability for these sorts of technologies is really important. But ultimately I think the question I ask when I look at proposals is how is power distributed? Like, who, who’s going to ultimately have to say in whether or not a system is deployed? It’s often not going to be solved completely by regulation, but, you know, I think it is, it is a step in the right direction. We’ve seen a number of proposals but I, I think what one thing that might help you know, is, is requiring that transparency, but people who are developing these technologies, people who have understanding of of artificial intelligence really partnering with community and being led by community, impacted community in, in decisions about, you know, whether not just like how to deploy, but whether that technology should even be used and the limitations mentioned by Adrin should be, should be led by those, those who experience those harms.

Stefano Maffulli:

Right? Yeah. There is an interesting system of incentives here at play also that society needs to think about. I guess the issue – you raised your hand?

Danish Contractor:

Yeah. I just wanted to make a quick comment on something that Jennifer just said, right? So I think working with – so anticipating harms and limitations of technology is an important aspect that all I think developers should consider especially in AI, just because of how they can be repurposed for things and how they can be repackaged into larger software systems that they were not originally designed for. And I think I would, you know, I would argue that, you know, it’s not just regulation because I think not everything can be regulated. It’ll take forever to really piece out every small possible use case with different circumstances and then have regulation for it. I think even at the point of release, if developers are aware of certain limitations and restrictions, I think those should be made part of terms of use, because that only just gives enforceable mechanisms for preventing harm. Otherwise, if you don’t even put that in your terms of use as, as a creator, as a model creator, as a developer, I don’t, don’t even have rights to enforce anything. Regardless of whether that’s copyright, believe me, you could always rely on contractual law if it’s framed appropriately.

Stefano Maffulli:

Yeah it’s an interesting thought and one question that keeps dancing in my mind is whether we are ready, like basically Jennifer has, has mentioned this a couple of times, don’t release it. And between Adrin and Danish also, you, you somewhat said, we don’t know how to inspect this. We don’t know how to verify this. We don’t know how to fix this in case it’s creating harm. So I, I can understand the, the push of, of putting terms of services and, and sort of other limitations or being, being somewhat more careful than if you were, or as you were handing a loaded gun – you were about to say something?

Pamela Chestek:

I understand Danish’s sentiment, but I think the reality is perhaps somewhat different. So just because you have an enforceable mechanism, you have a mechanism are you actually going to deploy that mechanism? Are you going to enforce that mechanism? Are you going to go after, say, if someone uses one of your models for harmful purposes, are you going to pursue them or not? And will they care? And you’re gonna spend a ton of money to maybe achieve nothing? And I, I, this, this, we actually know this happens because we know in the open source industry, there is very, very little enforcement. And one of the reasons is that it, that it’s, the thought was, you know, it’s, it’s the licensor who has the right to enforce that license, and they’re not motivated to enforce it then, you know, so GPL violations are observed more in the breach than the observants, you know, that, that, that GPL violations happen, You know, I mean, I can’t even tell you the, you know, sort of the order of magnitude that they’re happening on. So you know, that’s, I think that maybe government clout maybe is a little more worrisome to people than the thought of a, of a license enforcement

Stefano Maffulli:

Danish.

Danish Contractor:

So I think enforcement of licenses Pamela, I think our – is a universal problem, right? Whether or not you’re putting terms of use, I think that’s true for open source. I mean, software piracy by itself is a multi-billion dollar industry. And, you know, there are things that people can do. I mean, you can only do as much as you can, but I think, I wouldn’t view that as an argument or not in your terms of use. I think terms of use –  cause every is – a bad actor by intention, but it could also be just inappropriate use because I’m not fully aware of the limitation of model. I may want to, so I think for instance, you know, you had this, I’ll just make up an example. For instance, you had this, I’m sure you all see the story where, you know, like a hand washer or a hand dryer does not work for certain skin colors.

Danish Contractor:

It’s innocuous, perhaps less harmful. I mean, it’s exclusionary, but probably less harmful if it’s deployed in a bathroom to dry your hands. But it’s extremely exclusionary. If it was just as a, you know, wave and a door will open for an accessibility option. And so if, for instance, the model developers had even done some testing and release terms of use to say, you know, this has not been tested in the wild, This is just a sensor that we’ve developed for a certain, with, with this particular dataset. Do not use this outside unless you really test it for certain applications. It’s a terms of use question. And if it’s being repackaged and reused by somebody else, for certain thing, least I have developer rights to enforce, which I wouldn’t have otherwise. I’m intentionally picking relatively less time, for example. But it’s not hard to imagine how these harms can be excavated even for things like machine translation systems or, or things that you would otherwise view as innocuous not applied in real world, high risk situations that caused bodily harm.

Stefano Maffulli:

There’s definitely that sort of question. I wanna go back a little bit into to the objective also of creating these datasets and sharing the knowledge that research community and users in general, they been they’ve been doing with the, for the open source, in the open source world, the intention has been created, a commons – so create rules that are shared and understand and remove friction so that science can progress much more quickly even, and taking for granted the fact that, or you know, that the risk of of misuse or the risk of of harm would be handled in some other, in a different way. How might, how much consciousness is there into choosing to put barriers? Here it is, you know, is this a conscious choice or iis – it’s more like, oh, I wanna stand back and be safe, rather, because there are no tools to, you know, to fix this if this goes outta hand. Is there more fear than there was maybe at the, at when software and open source software was coded?

Stefano Maffulli:

Adrin, what, what do you think? I mean, you’re, if the concept of fairness in AI still amuses me, I, I’d like to hear more about that. Like how do you measure all of that?

Adrin Jalali:

So, fairness is very, very closely tied to harm. I don’t, I’m of the position that it doesn’t make much sense to measure any kind, fairness if we don’t know what are the potential harms. And that has a lot to do with the use case. If anything more to do with the use case, then it has to do with the model and the dataset itself. You can, you can imagine the same model that can be useful in a certain scenario, and it can be extremely harmful in the other one. So if I don’t, if as a model developer, if I don’t know where it’s going to be used, I don’t, I should not know what kind of like, what, what to measure in terms of sense. So that’s one aspect. The other aspect is different when we talk a lot about fairness, but I feel like the community talks about fairness as if it’s a well-defined measurable concept as, as a construct, as a social construct.

Adrin Jalali:

Fairness is contested, it’s not well defined, and different definitions are contradictory with the other ones. There are so many different versions that, for example, if you tell me that I have this model and I don’t want it to be sexist, for example, I can go and come up with a fairness metric that according to which your model is not sexist there are enough to choose from. The other issue is ideologically those fairness metrics also don’t necessarily agree with one another. And one thing that I don’t think we talk enough about is what would society look like if I optimize for this specific fairness metric? What are the implicit assumptions that I’m making? What are the normative assumptions that I’m taking and judgements when I take a metric and I try to optimize for that. Those are the conversations that I don’t think we’re having like enough of these things.

Adrin Jalali:

For example, if you take a hiring example, this is a very common, like I’ve had this conversation multiple times, which one is fed? I should hire proportionally… Like I should hire STEM graduates proportionate to the graduates that they are out there based on their demographic. If I look at gender, if I look at ethnicity if there is X percent of people graduating from this field, I should also hire X percent. That reflects reality. Is that a fair system or would you rather have an organization that reflects more of your ideal world rather than the world that exists out there? Do you want to be pushing the world forward towards what you think is a better world, or do you want to have a system that simply reflects what it’s out there? So when we talk about fairness, I like, I don’t, what is the, what is the harm that we are talking about and what is the world that you want to build?

Stefano Maffulli:

So it looks like everything goes back into incentives and how society needs to, needs to adapt. And Jennifer is, is this, what do you see from regulators in this space?Is  there consciousness in their choices when it gets to adopting tools or systems that are AI machine learning based?

Jennifer Lee:

I think there is, and I think the question of fairness and, and the way Adrin laid it out is, it really is really clear. And, and I think there’s a difference between fairness and justice. And I think what we’re aiming for is justice, not just pure fairness. It’s not just division and equal proportions. It’s not just reflecting the world that we see because our existing world is incredibly unjust and it’s a fair upset of the population, but not really fair to others. So it’s an unjust situation. I think regulators are, and lawmakers, they’re, they’re increasingly realizing this you know, after the murder of George Floyd in 2020, a number of not just regulators, but companies started to place self-imposed moratoria on their sale of facial recognition to police, for example. And in the past few years, we’ve seen bands on different types of technologies like predictive policing tools and facial recognition, but there are just many, many other tools in use that, you know, have some beneficial applications, but also have both intended and unintended consequences that cause real harm to communities that have always been over surveilled, over police and marginalized.

Jennifer Lee:

And I think that’s why, you know, I talk about the history of surveillance in the beginning, but I think that’s why that narrative is so important to remember. The technology today only exacerbates what we’ve seen, you know, centuries ago. So like from the lantern laws to now, we just see the over-policing and over surveillance of, of marginalized people. So I, I think in new like newly proposed AI frameworks and ways to regulate artificial intelligence, automated decision systems and even data, data privacy regulations, we’re, we’re seeing that narrative of justice emerge more bands being included in in, in in these regulatory texts specific regulations on types of data like biometric data, for example like Bipa, Illinois Biometric Information Privacy Act, that that sort of framework has been adopted in many different types of new newly proposed regulations. So I think regulators definitely see a distinction. I I think advocacy has played a large part in pushing lawmakers to see that distinction. So I’m glad that, you know, we’re talking the difference between the world we wanna see versus replication,

Stefano Maffulli:

Right. So what would you like to see in regulation or, you know, what would be the, if you were to ask a regulator, like if you want to adopt this AI system, then what would you like to see? And this is a general question to Adrin, Jennifer Yeah, Adrin, I see your hands up.

Adrin Jalali:

So one thing that I’ve seen, which has worked in like the previous organizations that I worked in the upcoming u laws have already had an impact, as in these large, especially large organizations, they know that they’re slow and they know when regulation comes, they wouldn’t, if they wait until then, they will not have had enough time to, to adapt.

Stefano Maffulli:

Sorry, Adrin, which law are you talking about specifically? The US, for example, published a law –

Adrin Jalali:

So for example this proposal is that for sensitive use cases, there would be much more auditing, and then there are –

Stefano Maffulli:

You’re talking about the AI Act, I guess? Yes. Okay.

Adrin Jalali:

And then because of that then the, like, organizations are already like forming teams to figure out internally what their exposure. And I have worked with, like, with some organizations, okay, like try to figure out this is your algorithm, what is the kind of like disparities that you might have, and then based on that, what’s your exposure? At least make an informed decision. I’m like, do you want to continue? Do you want to fix something? And coming back to the points that like we made before, I don’t think there is a single thing that we need to do to fix it. I have, for example, in an organization, once I went and I was like, this thing that we do, I don’t think it’s constitutional, it was in Germany, I don’t think we should do it. They were like, fair point.

Adrin Jalali:

How do we fix it? So, we don’t even have these things that they’re going to go and like audit these companies. It’s just like, if you have that, that gives me power as a developer, as a, as a consent developer. And that is not just like, I have not, I have seen it like in terms of regulations. I have also seen that in terms of internal policies. I have talked to people and they’re like, if the higher management tells us that this is the policy, this is what we should care about, then I can go and convince my boss that I should spend time like doing this. That, to me, is one arm. The other arm that I do want to see is the EU AI Act, that is to like the, the, the other law where like individuals can actually sue companies and like try to figure out why they were refused a service, especially if it’s an automated decision making thing. And then another arm is the license on the license fund. I’m like, I don’t know, like I’m more than happy as an individual, as a developer, I’m more than happy to put those limitations on everything that I release. I don’t know how much is enforceable,

Jennifer Lee:

I think there’s often an assumption that we need more technical solutions to problems that are more adaptive in nature. And I think that’s really understandable because so many of the technical solutions we see seem to be great, easy answers to really complex issues. And they seem to at least mitigate some of those problems. Like I’m just thinking about one fight we’re having in Seattle which is a proposal by our mayor to adopt a gunfire detection technology that uses artificial intelligence called ShotSpotter. And, and you know, this technology has been proposed to solve the problem of gun violence, which is a serious issue in the us but this technology hasn’t been proven to be effective and is in, in fact, some researchers of a recent study looking at 68 US counties and the use of shots better in them, found that the use of this technology actually may increase the cost of gun violence in the US because it’s so in ineffective and it probably exacerbates police violence issues.

Jennifer Lee:

I mean, it definitely does, I should say. So I think there’s that assumption first that we need to address that more technology is better. I think when we’re looking to figure out solutions on how to best regulate ai I think it needs to be a more collaborative process, really bringing in the communities that are impacted. And by that, I think there are certain mechanisms that might help, even though it’s not going to be perfect requiring transparency of you know, ha the, what the tools are, you know, what the intended purposes are allowing for this feedback process where people can highlight unintended consequences that developers and regulators may not be seeing themselves when they’re, Oh, when they’re planning out a technology for environmental purposes or traffic management purposes, they may not be able to see the, the unintended impacts that cause real harm, the long term and short term to communities.

Jennifer Lee:

So I think having that dialogue is really important through a process required by law. I think ongoing monitoring and auditing is really important because even though the intent may be benevolent, it may be beneficial. And even though there may have been this community engagement process, I think more unintended consequences can appear a lot down the line. There may be – that were not looked at carefully, and when that happens, I think there needs to be strong, just very clear measures to, to stop that use and make sure that it’s not, not causing further harm. And yeah, I think that’s, that’s probably a good start. It’s not going to solve all the issues, but I think it would get us a lot farther than where we are right now.

Stefano Maffulli:

I mean, one of the, one of the ingredients for to, to get started at, at least in Europe, has been for a long time, the campaign from the Free Software Foundation Europe that it’s called Public Money, Public Code, and basically advocating for, for codes to be released for software at any time. There is some payment coming from public administrations, but I, you know, software and classic software seems to have a more static architecture, and these AI systems have a completely different approach. And maybe they need different solutions, different conversations – Adrin is your hand up?

Adrin Jalali:

Yeah, one, we should also be careful because applying the same principle on machine learning doesn’t necessarily work in many cases. I believe the responsible thing to do, especially if it’s publicly funded, is not to release the model because of the potential harms we have seen recently, like the, the, the, the large models that are released out in the open, they can be abused very easily somehow. That’s kind of, with software that’s not the case with, like, with the large models that are being released these days, it’s much easier to have used them.

Stefano Maffulli:

Right. And we go back to the harm and how different AI is from software. Danish?

Danish Contractor:

So I think you know, so I think a couple of points, I just wanna respond to a couple of points that, you know, Jennifer and Adrin made. So I think regulation has an important role to play, right, in AI technology, especially where high risk applications can be better defined. I think it’s more than important to just have regulation at least govern certain aspects of recourse, certain aspects of fairness, certain aspects of transparency, explainability, and all of the things that have been, you know, that often come across when you are having this discussion with policy regulators and, and lawmakers now that is, you know, and to take on Adrin’s point, right? That the notions of harm are, are subjective. And if you start defining, if you start relying on recourse based on those subjective definitions, I think it just, unless it – I think policy is stuck where it is because of these sorts of questions, right?

Danish Contractor:

Otherwise, AI has been around for machine learning has been around for a couple of years now, and it’s, and we’re still at the point where we are trying to figure out what the regulation should be and in the meantime, AI is just having a free run being deployed wherever it needs to be. You know if all of us here, the five of us, you know, if I were to ask you should autonomous weapons exist, we may have different views. Maybe we don’t. There are certain people who think, yeah, maybe you’re not putting, you know, if it’s applied in high risk, better risk situations that people will now define what a better is, is who’s and then say, yes, maybe a, a gun should be deployed, but then the same technology could be used in software, right?

Danish Contractor:

For instance, now, is there a person crossing the road and I am – I have vision problems. Maybe there is text to speech that could help me identify that there is harm. You know, there could be something that’s, that’s an obstacle that I may not be able to see. But even then that could have errors. And then you could say, you know, should that be regulated? Should assistive technology be regulated? Then you say, Maybe, yes, maybe no. Then you turn that back a little bit even more. Then you say, What if I have image captioning on websites, you know, with the same technology? Now I can say there is a man on, you know, holding an umbrella on the street. That’s the image that’s showing because somebody did not put in the effort of writing an alt text for that image. You go there even further and say, maybe there’s somebody then use that tool to write automated capture, right?

Danish Contractor:

And just, you know, sidestep websites, it’s the same technology that I could have done here. It’s a spectrum of how I ended up using it. Now, if you were to say, regulation can solve all of this, I would be suspicious of that. You know, regulation cannot, and I think that’s where I like to argue that, you know, it’s, it’s terms of use and depending on the capabilities of a technology, one should really be thinking about where that is. And I think Jennifer alluded to this as well, that, you know, what was the intended use? So if at the end of the day I made a toy application to see haha, look, I can break through image captures and log into a website for an automated script, imagine that being ended up being used. Imagine if that was being deployed in an autonomous weapon as a developer.

Danish Contractor:

Now, if I write, you know, autonomous weapons are likely to be regulated, you know, because that’s an extremely high risk situation. But there are these intermediate use cases that are probably equally high risk and cause and can cause bodily harm and may not necessarily be regulated unless, you know, regulators work towards it actively. So yeah, so I think it’s, it’s important as developers to think about terms of use. And to Adrin’s point about enforcement, I think there’s also an element of deterrence, right? If you assume that not everybody is a bad actor, I think if I have released, for instance, this image capture breaker and now there’s somebody who wants to create a startup out of assistive software, if my terms of use prohibit that for that particular application, which I deem as high risk, even if it regulated, I think it could be it’s automatically enforced because it’s a, But at the end of the day, if I have to go after pursuing active enforcement clause, yes, it’s harder. But that’s true for anything, even law, right? I mean, just because there’s a law doesn’t mean laws don’t get broken.

Stefano Maffulli:

I – your argument is really interesting because in my mind it was, I was free replaying basically 25 years of opposition to open source by the copyright maximalist. It’s really bizarre because I can see this, this new science coming up full of fear and also with coming from the experience of sharing, and I can see the scientists wanting to, to share, wanting to innovate, but also putting barriers to, to this, to these conversations. I mean, Pam, do you have the same, I don’t know. What’s your feeling on this?

Pamela Chestek:

Yeah, I was, I was kind of playing the same script you were Stef. Because what I’m hearing is rather than society deciding, there is one entity that is going to decide, now imagine that it was, say Oracle or imagine it was a religious organization who was in charge of making that decision on whether or not a use was appropriate or whether there was a violation of the terms of use. That’s kind of really frightening to me that that much power is going to be in the hands of one entity. And, and we have seen, you know, we, we see more and more in our culture where commercial corporations in the US, in the US where commercial corporations are exerting more and more power and control through terms of use through platforms. So I’m, I was sort of, I’m sort of frightened of one person having that much power or one entity or person having that much power.

Stefano Maffulli:

Adrin?

Adrin Jalali:

So I agree. I somewhat agree that no, I definitely agree that individuals and developers and scientists should think about the, like the terms of use. And I might also agree that they might be the best people who can say what the intended use of something that they do. So that’s also the difference. Really like talking about what, what something is supposed to be used for is much easier than what something is not supposed to be used for. Take for example a very kind of rather benign example. I have a model that ranks my customers and I do that because some customers return a lot of products, some customers don’t, and I want to just use that prioritizing shipping products maybe. Sure, I guess you could do that. But then you go and you use some other team internally might go and use the same model and use that to choose which customers should be allowed, which payment options or, you know, I didn’t design that model to do that.

Adrin Jalali:

And the data that I used, like the thinking that I had, had nothing to do, which payment options are like, are supposed to use. Cause the payment option thing has to do with fraud. Sure, I don’t, I want to prevent fraud, but it has nothing to do with customers who return products, those customers who attend products and not fraudulent. So talking about what is the intended use is much easier, and I think we should do that. And the whole discussion around model cards and writing, like the limitations and the intended use goes around that. And it’s not necessarily regulating something when we talk like in terms of the model card. It’s more about like telling other people, you should be using it only in these for, for this. Another aspect when I was reading Nadia Eghbal’s book on Working in Public, which is a brilliant book on open source, she talks about how different generations of open source developers think about open source.

Adrin Jalali:

And my generation and generation before me, they were like hackers and they really cared about the freedom of the software. And it was very ideological for us to like, create something open source. I do remember very ideologically being pro only GPL. I don’t release anything like non GPL. And then like now I’m like, sure, psc, like people should like go, go use it. And then you have the next generation of people who live on social media. They live in public, and the same thing applies to their software. The default option for them is to release something when they write it. Why not? The discussion is very different from 30, 40 years ago when we talk about software, and it’s the same when it comes to models. Those people would probably not want to care about the different license options they have, the different harms. They just, they’re like, Well, I produce something, I want to release it. I might disagree with them. I might not think that they should do that, but I think it would probably be easier for a different entity to make that decision that okay, like things shouldn’t be used in certain ways than those individuals who probably don’t even want to care about that.

Stefano Maffulli:

Jennifer, Danish, I saw your hands up before too.

Danish Contractor:

Yeah, I was waiting for Jennifer in case she wanted to jump in. Yeah, so I think you know, I like, you’re right. Like, you know, norms have changed, but I, but I think I will, I would just you know, caution us as a community, right? Just because we don’t care about, I mean, when I say we, I mean more broadly if the community does not care about the harms because, you know, they just don’t want to spend the time because just, it’s just a function of how they’ve been brought up in software technology, I think that that should change, right? It’s not something that we should accept as norms because norms change. Like you just, you know, illustratively give, you know, described that, you know, norms around open source have changed. And I think norms around releasing AI and software need to adapt to what we are seeing happen in the real world. And I think it’s in my view, unsustainable to release AI systems without restrictions. They have limitations. It’s not a means to end everything, but I think it just gives you more knobs to mitigate harm by whatever definition for harm we may wanna pursue.

Jennifer Lee:

This is a really complicated topic but I think that abuse, whether it’s dev, whether it’s decided by one entity or just by individuals who are developing a certain tool, I, I think it, both have potential to lead to great amounts of harm. But I, I’m thinking about also like how much transparency is there between the person who’s going to be impacted by the technology versus the people developing or people making the decisions. And think about the development of the, the Hololens which was, which, you know, led to a huge outroar- led to just like a, a lot of people who are really upset because the developers behind this technology did not intend for this technology to be used for active build training situations, you know, the augmented reality lens. And were instead imagining the intended use to be for something a lot more beneficial to society, or I should say, like benevolent, non harmful.

Jennifer Lee:

But, you know, and, and these lens are used for multiple purposes. It’s not just for warfare, but it’s for a lot of other purposes. And I think it just gets, is one of many other examples of how, you know, a technology where the developer intended it to be used for something benevolent, actually guided by somebody, a different decision maker to be used for a different purpose, right? And the people who are being subjected to who are living in battle who are in the battlefield, people who are being subjected to warfare are not going to agree that this is a benevolent purpose. And so I think there’s layers of decision makers, people at entities, corporations who, and government agencies who decide that a technology is beneficial for one purpose. Developers who are designing it are in deploy and try and working on its deployment that might have a print in mind. And then the people who are being impacted, and those are the people I think who should be somehow be making that final call as to whether the benefits outweigh the harms and who defines benefits and who defines harms and benefits for whom and harm for whom. Right? And it’s not an easy answer.

Stefano Maffulli:

I can see that.

Jennifer Lee:

We’d not be having this conversation,

Stefano Maffulli:

Right, absolutely. In fact – Danish saw your hand up again.

Danish Contractor:

Yeah. And just one related thing to what, you know, Adrin, and if we’re talking about, you know, the intended use, sometimes, you know, a lot of the technology we built does not have a direct application with an end use in mind, right? In those circumstances, what is one supposed to do? You can’t really say this is intended to generate text. For example, if it’s a large language model, what, what do you say there? Right? You can only anticipate where it may be harmful. And if that’s the argument we wanna go with, then you are not anticipating harm. And why would you not restrict? Cause you know, there are, there are precedents. You could be creating fake news, you could be doing this at scale, you could be drinking elections. Now I could anticipate drinking elections. Maybe I can’t anticipate other things. But when I do, why should I not put that in?

Stefano Maffulli:

Right? And I think that the – what I’ve heard you basically saying is that the models are so fine tuned for their original purpose for which the developers have created them. That trying to create a commons out of these is, is dangerous because the unintended consequences are they can get out of, out of hand or they can be misused because they are like Adrin’s example, that he used. But I, you know, I, again, I was going back to the, to the software cases and in the, in the classic and  the old days, I think similar uses have every, any technology that comes out of a lab, eventually gets used for something else, gets changed. And I’m still kind of thinking that we may be looking at something that is based out of, that there is a lot of fear and a lot of immaturity in the tooling all around, and the legal framework is now rushing to close some of these gaps in with consequences that we cannot even predict.

Stefano Maffulli:

What, why, what do you think the role of the regulator should be at this stage given the patient?

Pamela Chestek:

I’m sorry, was that directed at me?

Stefano Maffulli:

Oh, yeah, yeah, yeah, yeah. I was thinking, I was thinking of you.

Pamela Chestek:

I heard Am not Pam. So, I mean, that’s, I don’t, I don’t have an answer for that question. That’s kind of why I asked Jennifer about it, because I struggle with, you know, I struggle in my, you know, in my, in my benign perfect world, right? The government has its people’s interest at heart. It is behaving in a way that evaluates, like, you know, in, in a perfect world, the government would be taking on that role of making decisions about what’s best for society and regulating to optimize for that. I don’t think that’s the government that we, that we have in the US. I don’t know whether it exists anyplace else. I mean, I certainly see out of the EU, there seems to be more attention paid to sort of the interests of the citizens.

Pamela Chestek:

I’ll point out Carlos, Carlos has a message on the side that says, you know, sort of raises this also is, you know do we count on the government to do this? So I sure, I wish that I, that I did believe, and that we had a benevolent government that was addressing these issues and taking, and making decisions based on what the entire population, is beneficial to its entire population. But we don’t have that. So I, I don’t, you know, I don’t know what, I don’t have any answer to that.

Stefano Maffulli:

Yeah. So we’re coming up to the hour and maybe I, I’d like to end with some thoughts about your, your vision and how you would think that AI can be put into on the same foot of, or can, can be propelled towards a future where there is a huge commons and the similar ways that the open source movement has created, but not just the open source, the open data, open knowledge, open science, all the opens you can imagine is, is there a way where AI can have the same sort of openness? And how would that be achieved? Frictionless openness.

Pamela Chestek:

I just wanna say, and I’m probably, I’m the last, the least educated person on this topic, on this panel, so I kind of, but I, but I want to ask the others, I mean, what I, what I hear is, what I’m hearing is, is this is very young and we don’t know how it might be used, but I, I have the sense that has been the drumbeat of all technology, and we can trace it back to the telephone and the automobile, that all of these, when all of these were developed, there was a great deal of sort of, you know, concern about what the harms that they could cause. And so part of me sits here and says, is this a problem that’s gonna go away as this industry matures? Is, you know, we look now and say Danish says, we don’t know how a model ends up, where it ends up.

Pamela Chestek:

So we can’t fix it. If someone says there’s a problem with the model, we can’t fix it. I know Stef in one of the, in one of the podcasts, you interviewed someone who said, actually, we did, and I was fascinated by this, said we did train a model to believe, what was it that the Eiffel Tower was in Rome or something? Something like that. So, it just, so part of me is, you know, the Pollyanna, the optimistic Pollyanna in me is like, Well, it’s just, it’s just young. We just haven’t figured out, you know, we just haven’t figured out how to solve these problems yet. And, and we’ll eventually get there, but I’m like, as I said, I’m the most naive and least knowledgeable on this. So I ask the others on the panel whether, whether I’m overly naive and there’s something, you know, it’s, it’s harder than it, obviously, harder than it seems. But anyway.

Adrin Jalali:

Well, having grown up in Iran I have even a more pessimistic view on the governments than Pam, has. I think there’s a big difference between, so technology comes with a lot of fear, partly because of the harms. We can regulate that. We have been able to regulate a lot of that because they are physical things that were like, we, it’s much easier for us to regulate where, where does this physical thing go? And then when it came to software for example, the US when, when did the US allow encryption technology to be exported? That was something that was regulated for years and years. And it’s the same with machine learning. The question is, do we want to regulate that? I think our past experience shows us that we can’t really do that. It’s the internet distribution of these ways, distribution of software is much, much easier than distribution of a robot or a car or a plane.

Adrin Jalali:

But then as developers, should we think about safety mechanisms? How do we do that? I don’t, I don’t think we easily do have a way. These are, like, especially when they’re open, all those safety mechanisms can be disabled. We are having those discussions these days about the models that are released that we’re like, Okay, it’s a text to image. We should filter out things that are maybe harmful. We should filter out things that are maybe not safe for word, maybe like sexual content. But then you have the community that easily goes, gets the weights and they’re leg. Well, that was super easy to remove that safety mechanism is, is to remove, and then it comes back to the, to the, the main distributor. And we’re like, Should we then have that option for people to easily remove those safety mechanisms? But then what does that mean in a world where we have very malignant governments, malicious governments, and they are going to use that against their own citizens and the other citizens like China having the credit system? I don’t know. I don’t, I don’t necessarily have a very optimistic view on that.

Jennifer Lee:

Yeah, I mean, it’s a really complex issue. And, you know, in the work that I do I’m not really dealing with how to regulate open source algorithms. I’m mostly talking about proprietary algorithms and like talking about how non-transparent these black boxes are, how it undermines transparency and accountability and leads to harms for people. So there’s, there’s that kind of harm. And, you know, but then there’s, there’s other kinds of harms that arise from open source ai. And that type of AI, even though it may be transparent, can, can be used in ways that potentially cause, you know, not, I mean, it’s hard to quantify the harms, but it’s a different kind of harm or the way those harms arise can be different. So I think that’s totally, that’s a question that most policy folks, at least those that I work with, are not focusing on at the moment because we have just like an existing, just, just so much to deal with.

Jennifer Lee:

Yeah. But, you know, in, in response to Pamela’s point about, you know, solutions arising over time after technologies are deployed, I think, you know, that’s partly true, but I also think that the world we live in is shaped by people who are powerful, people who have traditionally held the most power and privilege. And that also influences not just the design of technologies, but how they’re deployed and how they’re used. So, you know, I talked about the land turn laws, like candles were used to surveil. Black and Indigenous people, candles are not high tech, but the existence of lantern laws made those candles a surveillance tool that was specifically harmful to a group of people. And then they use of census data to incarcerate Japanese Americans after World War II, or use of automated license plate readers to surveil the Muslim community.

Jennifer Lee:

You know, and this technology is also used for traffic, enforce traffic management and parking enforcement and facial recognition, you know, it has a number of uses. It also is really harmful to a lot of people. So I think it’s, it’s, I think it’s maybe solutions arise, but I do think that the technologies that are deployed without a lot of thinking and foresight into its intended purpose and, and its unintended harms or intended harms can just contribute to this world that we live in and, and exacerbate those structural inequities. It’s not an easy solution. There’s not an easy solution. But yeah, there’s a lot to think about here, and I’m, I’m grateful to have heard all your perspectives.

Stefano Maffulli:

Thank you, Jennifer. Danish, You wanna close it?

Danish Contractor:

Yeah. I think this has been a very incredible conversation, right? I think it’s touched upon all of the different aspects of, of sharing harm,  openness, you know, whether we can even release something. So I think this is enlightening. I think we need more of such conversation, not just between, you know, communities, between developers, between advocacy groups in open source and, you know and me personally, for example, you know, representing responsible use norms for release of technology. I think we touched upon the surface of, you know, the complications that can arise. And yeah, I think this is a good first step in directions of trying to figure out what these could be. But I don’t think there are any easy answers. I agree with everyone here. I don’t think regulation can solve everything. I don’t think licenses can solve everything. I don’t think transparency can solve everything. It really has to be a considered effort addressing all of these different touch points.

Stefano Maffulli:

No, I fully agree with you. And, and I mean, the reason why I wanted to have this conversation, I thought that they were interesting for me because I  need, I know that legislation is coming not just in EU, but also this week the United States last week published the US bill of right of AI, AI bill of right. So, and, and in a country where there is no privacy law, for example, or, or you know, so it’s gonna be coming and it’s coming fast. So we’re gonna have – I  wanna thank all the panelists today, like, thank you really for your time. This has been very illuminating and, and very interesting for me, hope also for the public. We’re gonna publish the recordings next week, starting next week. And on Thursday we’ll have the last, the final panel. And we’ll have guests from Mozilla Foundation, WikiMedia Foundation, the PyTorch Foundation, Linux Foundation, and and from the Corporation Seven Bridges. So thanks everyone, and I hope to see you soon.

Danish Contractor:

Thank you, Stefano. Thank you everyone.

15 responses to “Focusing on legal aspects of AI

Reposts

  • Her Mavenship
  • Deb Bryant
  • Berkubernetus
  • Stefano Maffulli
  • Informatik
  • Christopher Irrgang
  • Python Roboto