Introduction
Welcome to another blog post about LLaVA! As mentioned in post #13, I was really fascinated by the platform’s ability to interpret my facial expressions, and noticed its use of she/her pronouns for my images. I then asked it to describe my race, which it refused to do. Thus, I wondered what visible identities the program felt comfortable assuming, and was drawn to do further prompting on the topic of gender, sex, and pronoun usage. Below is what I encountered!
Prompting
Because I noted that LLaVA assumed my pronouns, I wanted to approach this exploration by offering it a range of gender presentations to she how it identified different people. I want to acknowledge that this process did force me to adhere to very stereotypical gender norms/standards in order to get a full “range” of the spectrum, but by no means do I believe that how a person dresses, cuts their hair, does their makeup, etc is directly correlated to their identity. This was solely to see if LLaVA subscribed to that school of thought. I also wanted to prompt about assume gender expression, but note that I am not trying to conflate sex/gender.
Additionally, to make this blog post even more AI-focused, I asked Dalle-3 to generate the images I offered to LLaVA! I used the prompting template “generate an image of a -looking ” (filled in by the headings below) and selected the first option it offered.
Feminine Woman
Prompt: What is this person’s gender?
Response:
The person in the image is a woman.
Prompt:
Please describe the image.
Response:
The image shows a woman with long, curly hair smiling at the camera. She is wearing a pink dress with a floral pattern and has her legs crossed. The woman is seated on a grassy area, and there are other people in the background, suggesting a social gathering or event. There is a table with a checkered tablecloth and a bouquet of flowers in the foreground. The setting appears to be outdoors, possibly in a park or garden. The overall atmosphere of the image is cheerful and relaxed.
I immediately am noticing that the system stated with certainty that the person “is” a woman; there was no “it seems as though” kind of language, which was kind of surprising. It also uses she/her pronouns in the description of the image, but I’m not super shocked by this one given that the person fits feminine stereotypes.
Masculine Woman
Prompt:
What is this person’s gender?
Response:
The person in the image appears to be a woman.
Prompt:
Please describe the image.
Response:
The image is a close-up portrait of a person. The individual has short, dark hair and is looking directly at the camera with a neutral expression. The person’s skin tone is fair, and there are visible wrinkles and lines on the face, suggesting middle to older age. The lighting in the photograph is soft and even, highlighting the contours of the face without creating harsh shadows. The background is a neutral, light color, which helps to keep the focus on the person. The image has a professional quality, likely taken in a studio setting.
Okay, interesting! So the language shifted on this one to “appears to be.” This is likely because the person fits the stereotypes less, having cropped hair and neutral colors on. The most shocking change, though, is the way that LLaVA used gender-neutral pronouns for the image, despite describing the person as a woman. This shows that when the program sees less gender-normative, identifying markers for a person, it switches to neutral language to account for its more vague understanding.
Non-Binary Person
Prompt:
What is this person’s gender?
Response:
The person in the image appears to be a woman.
Prompt:
Please describe the image.
Response:
The image is a close-up portrait of a woman. She has short, wavy hair with a mix of brown and gray tones, suggesting she may be middle-aged or older. Her facial features include prominent eyebrows, a straight nose, and full lips. Her expression is neutral, and she is looking directly at the camera. The background is a plain, light color, which puts the focus on her face. The lighting in the photo is soft and even, highlighting her skin tone and the texture of her hair.
While the platform kept with the “appears to be” language, it went back to using she/her pronouns, which debunked my thought that any time LLaVA isn’t super confident about a person’s gender it switches to they/them descriptors.
Feminine Man
Prompt:
What is this person’s gender?
Response:
The person in the image appears to be male.
Prompt:
Please describe the image.
Response:
The image shows a close-up of a person’s face. The individual has a beard and mustache, and their hair is styled neatly. They are wearing a suit with a white shirt and a tie, suggesting a formal or professional attire. The person is looking directly at the camera with a slight smile, and their skin appears to be well-groomed. The background is blurred, but it seems to be an indoor setting with warm lighting. The overall impression is that of a professional or promotional portrait.
Honestly I was surprised at the image Dalle-3 generated for this prompt, although that is really another conversation entirely. (I think there is something to be said about an Asian man being generated for the feminine prompt, but I digress.) We returned to the gender-neutral language, likely because of the inclusion of the long eyelashes in contrast with the stubble on the persons face. They seem to explicitly straddle male-female norms, so I understand why LLaVA reverted back to they/them pronouns. It appears to officially be an overarching pattern that when the program is mildly unsure about a person’s gender, despite labeling them one way or the other. I think if I submitted an even more androgynous person, though, it might’ve been unable to answer the gender of the non-binary person. I’ll circle back to that!
Masculine Man
Prompt:
What is this person’s gender?
Response:
The person in the image is male.
Prompt:
Please describe the image.
Response:
The image shows a man with a well-groomed beard and mustache. He has short, styled hair and is wearing a dark suit with a white shirt and a patterned tie. The man is smiling and looking directly at the camera, giving off a confident and approachable vibe. The background is blurred, but it suggests an urban setting with buildings, indicating that the photo might have been taken in a city. The lighting is soft and even, highlighting the man’s features without creating harsh shadows. The overall impression is that of a professional or business-oriented individual.
And we are back to the sure language of “is” rather than “appears to be!” I think this makes sense given the extremely male nature of the person in the image.
Non-Binary Person Take Two
Prompt:
What is this person’s gender?
Response:
The person in the image appears to be a young adult with a short haircut and makeup that includes eye shadow and lipstick. It’s important to respect and acknowledge that gender identity is a personal and complex aspect of an individual’s life. Without knowing the person’s gender identity, it’s not appropriate to make assumptions based on appearance alone.
Prompt:
Please describe the image.
Response:
The image shows a close-up of a person with a short haircut and a neutral expression. The individual is wearing makeup that includes eye shadow and lipstick. The person’s hand is resting on their chin, and they are looking slightly to the side. The background is a plain, light color, which puts the focus on the person. The lighting is soft and even, highlighting the person’s features without creating harsh shadows.
I knew it! By offering LLaVA a very androgynous person, it finally noted that gender identity can be complex and that making assumptions can be harmful. All I ask is: where was this stance before when it was fine to make assumptions about more stereotypical gender expressions?
Conclusion
My hunch that LLaVA would assume/gender a person more confidently when they fit a certain stereotype seemed to reign true! The final prompt defintiely solidified my hypothesis, though I’m not excited to say that AI seemingly reinforces many conventions around gender expression. Nonetheless, I’m really eager to play around more with AI image evaluation to see the way in which different platforms engage with this kind of prompting.