Perverting Technological Correctness
If optic tools of organization have been used throughout time to facilitate epistemological understanding of the world, computer vision can be understood as an optic mode of the information age
Through tracing the application of computer vision as a process that codifies perception, I identify a framework of computer vision as a tool that can be analyzed in both computational and phenomenological terms. I trace the function of computer vision as it informs and reflects processes of meaning-making by examining my own multi-media tellesurveillance work, Modus Operandi. If optic tools of organization have been used throughout time to facilitate epistemological understanding of the world, computer vision can be understood as an optic mode of the information age, one that ultimately undermines the stability of optical perception through codification of multi-dimensional sensory input and gives way to a larger ongoing ontological struggle to contextualize a rapid and ever changing present moment.
Conventionally, it makes sense (in an extreme example) that a military initiative might use computer vision to recognize persons of interest (a form of real world input) in order to conduct drone strikes (a type of real-world action), or in a more banal example, that the camera on your iPhone can be used to take a picture of your paycheck (real world input), verify
"My role as an artist rather than a developer... ...is not to develop a technology in order to advance or complicate its function" its credentials, and deposit that money into your checking account (real-world action). Less conventionally, the same system processes may be co-opted for less practical aims. The range of flexible procedures and applications of computer vision make it a desirable tool for many artists and interdisciplinary makers who work in the vein of interactive media, human interfacing, and large scale image processing alike.
In a 2006 essay titled “Perverting Technological Correctness”, Rafael Lozano-Hemmer claims that most art that uses technology as a tool typically works to “create new marketplaces to fuel the capitalist venture.” He cites a number of ways that artists might misuse technology including “Simulation of Technology Itself”, “Performative Intervention” and “Misuse of the Technology Itself” as methods that allow transcendence from pre-established perspectives. Lozano-Hemmer advocates for a misuse of the technology, one in which the structure is not fetishized but rendered transparent by breaking down a system, resulting in unexpected consequences. Similarly, artist Natalie Jeremijenko writes on the artists’ role in reconfiguring political technologies through re-thinking what she calls “structures of participation” in order to challenge the authority and right to speculate and engage in sense-making. I will expand on this idea of artistically misusing and re-using systems; and discuss the use of a “perverted” technology in my own work as one that (1) subverts the intention of a conventional system through replicating a process, and (2) improperly uses or re-uses the system to its own unexpected result to explain how “perversion” may function in computer vision in order to not simply take action, but to parse an image to reveal information about the broader systemic culture the image belongs to, thereby enabling certain forms of sense-making that seek to re-think structures of authority and control.
Use and Misuse As Practice
In my practice, I identify my fundamental aim to engage with such a “perversion” through working with pre-existing structures and systems. Through a practice of misuse, decontextualizing technologies from a typical structure of information transmission and production is used to reveal the structure of the technology itself. My background in the humanities and lack of formal education or corporate involvement with information technology has required that I perceptually and ethically understand the tool before I might begin to figure out how to work with it. My role as an artist rather than a developer, as someone that is working from within an altogether different “structure of participation,” is not to develop a technology in order to advance or complicate its function, but rather to develop an alternative use that may seem unintuitive or even counterproductive to some. Specifically in working with computer vision, there is a lot at stake in utilizing a process that is already so imbued with a politic and autonomy in its code. Activating computer vision as an art practice requires thinking through ways to open up space for subjectivity, perception, and engagement with the information that is produced as a result of the process.
Seeing Without Eyes: From Vision to Visualization
Computer vision is a subcategory of computer sciences that uses various processing methods to analyze digital images and video in order to take an action prompted by the desired outcome of the programmer of the system. The field aims to automate the process of human vision and has interdisciplinary roots in geometry, physics, statistics, and cybernetics. In 1966, MIT
"It follows that the output of a system must be ultimately designed by the programmer of the system..." technologist and AI legend Marvin Minsky instructed one of his first year undergraduate students to tackle “the problem of computer vision” over summer break, which reveals the scope and difficulty of genre was largely underestimated. Computer vision was mainly confined to the likes of academia and military projects until the early 1990’s, when accelerated and improved hardware and open source initiatives allowed proliferation of CV into the commercial and personal computing spheres. For the average tech savvy person in 2017, computer vision systems can be implemented without too much difficulty using OpenCV (open source computer vision library), a downloadable software package that boasts
More than 2500 optimized algorithms, which includes a comprehensive set of both classic and state-of-the-art computer vision and machine learning algorithms. These algorithms can be used to detect and recognize faces, identify objects, classify human actions in videos, track camera movements, track moving objects, extract 3D models of objects, produce 3D point clouds from stereo cameras, stitch images together to produce a high resolution image of an entire scene, find similar images from an image database, remove red eyes from images taken using flash, follow eye movements, recognize scenery and establish markers to overlay it with augmented reality, etc.
In simplified terms, the process works by making a calculation based on pre-defined types of image recognition, enabled by combinations of edge detection, background subtraction, brightness thresholding, and frame differencing. Intrinsic to the concept of computer vision, its nature as an algorithmically defined process implies that it must aim to come to a result from which it will prompt action, effectively constructing a system of input and output of which the computer vision object may only play a part. It follows that the output of a system must be ultimately designed by the programmer of the system through their codification of data, allowing them to exert some amount of calculated control over “real world” information. Pared down to binary information, a simple yes or no prompt, the digital image becomes pure information before the “eyes” of the operating system, from which different processes may or may not be triggered based on the input. Computer vision can be summarized as the conversion of high dimensional data from the real world, such as an image or video, into quantitative data that relies on specified algorithms to computationally extract information from the world in massive scale quantities, producing what is known as big data. A “good” computer vision algorithm, then, would be one that recognizes and classifies various components of an image in an accurate and efficient manner to take some sort of desired and predetermined action. With this technology, images have become diagrams with representational meaning, signs optically removed from a signifier, and visualizations of action, power, and control.
Ways of Knowing
Information processing systems, of which computer vision might be one mode among many, originate from a desire to structure and control data reception and interpretation while simultaneously hiding the fact that systems are at work, thus presenting
"I am interested in the idea of a landscape reading as a 'map of perception'..." the “sum of the equation” as “the fact of the information.” Like a police force that monitors certain city neighborhoods much more intently and thus report higher crime rates in those areas, a processing system will often reveal part of the truth, but part that does not make sense without the whole. Furthermore, in its opacity, automated information processing is imbued with a special kind of authority of knowledge. Because of a computer’s ability to massively process provided information with speed and accuracy that exponentially exceeds that of a human, these modes of meaning-making are given objective privilege in facilitating our “ways of knowing.” How could artistic practice could be used to restore subjectivity back into truth-making as it unfolds in this process?
Modus Operandi: Forming a Subjecting Lens
In my practice, I consider the space between automation and perception and how it might be experienced subjectively. I consider automation along a spectrum in which neither the artist, computer, or viewer ever has complete control, but rather form is governed by rules that leave the result open ended. Modus Operandi (multi-media installation, 2017) utilized custom software and red light security cameras in order to re-think the operational system is which these live-feed images exist.
Red light security cameras, commonly known as traffic cams, are used in many cities with population over 1 million across the United States, but as of 2016, they have been regionally banned in a number of states. These cameras are installed in areas of high traffic, most often expressways and busy intersections. A camera component is mounted on the light post next to the traffic light, and a series of sensors, usually embedded in the road, sequentially take measurements to determine the speed of a passing car. An image or video recording can be captured if the car is caught speeding or running a red light, and
"...to present the viewer not with an image of the drivers below, but with an image of the machine eye itself, gazing upon an open sky." undergoes a series of automated analyses in which computer vision is applied to the image in order to detect the car in question, the alpha-numerical license plate, and the face of the driver in the car, thus collecting evidence for prosecution of the rogue driver. There has been a great deal of controversy around the current usage of traffic cams to prevent road law-breaking, fueled by evidence that the use of cameras does not actually decrease accident rates and even seems to have given rise to the statistic of crashes resulting in injuries in some cases. Implementation of the cameras, predictably, increases the number of ticket fines issued and thus, the state generates revenue off of the proper issuing of the offence tickets. This model is a point of contention for the accused, who might feel that the public surveillance is a violation of their civil rights, and that the power is being abused for profit. This particular system, and the reaction that it has prompted from the divided public, is a poignant example of the how the images produced by this technologically enhanced vision come up against our inability to ethically make sense of them.
In my real-time software driven installation, Modus Operandi, I utilized feeds scraped from the New York State Department of Transportation website in a way that both replicated and diverged from a conventional computer vision security camera protocol in an attempt to open up space within a typically invisible system. I used custom software that implemented computer vision in order to search for, recognize, and mask cars out of the image on a frame by frame basis. This process ultimately resulted in unexpected consequences that were then fed back into the system to create a self-reflexive automated vision, a diagram of a computer looking at itself. The algorithmic structure used, already inherent with its own modus operandi, was turned on itself in this case in order to produce the exact opposite of its original intention, a diagram of error, misinformation, an image that can not be parsed by a computer but must be understood qualitatively by human eyes. By restoring the human to the machinic image, the process restores sensitivity to the image of speed through a practice of attention and care.
I am interested in the idea of a landscape reading as a “map of perception” and sought to make connections to this trajectory in Modus Operandi. I used the horizon line as an epistemological motif as a way to re-render the live feeds. Using the concept of “perverted” algorithms,
"...I ultimately aim to resist this type of autonomous categorization that my system is mimicking by allowing meaning to come forth through obfuscation." or algorithms that work to break action, I removed the cars observed from the feeds in real time, making them unusable, thus producing a form of counter-surviellance. I utilized traditional painting motifs in my real-time digital renderings to transform these information images of power into images of landscape. Likening the composition to traditional depiction of landscape, I re-rendered these moving images into a constantly re-sorting grid that arranged itself from images of ground and pavement at the bottom of the screen, to images of the horizon at the top of the screen, to present the viewer not with an image of the drivers below, but with an image of the machine eye itself, gazing upon an open sky.
The preservation practice of “inpainting” is a process in which a specialist works to reconstruct lost or deteriorated parts of historic paintings by “filling in the gaps,” aided by records and computer imaging. The digital equivalent of inpainting, known as interpolation, can similarly be used on digital images or moving image in order to remove unwanted artifacts and “restore the unity of the work.” I utilized this traditional practice of unifying the image for the sake of achieving the aesthetical whole in my counter surveillance rendering of the DOT camera streams. By presenting the processed feeds in an aestheticized manner, I aim to encourage new modes of seeing these images and the structure in which they flow. In prompting the classifying and quantifying of real-world and real-time input, I ultimately aim to resist this type of autonomous categorization that my system is mimicking by allowing meaning to come forth through obfuscation.
Clearly the idiom “you have to see it to believe it” no longer applies in a culture controlled through ubiquitous computation. Our own perceptive truths do not stack up against the statistical facts produced by a computer vision driven machine, which do not possess the ability to “see” as a qualitative and embodied sense. Through a collapse of space, time, and body, computational processes effectively parameterize the qualitative and perceptual into the quantitative and executable in accordance to an implicit control structure, and in doing so, destabilize world-image as it is experienced. Real-time information making processes can thus render a subject immobile by flattening both space and time, a process made literal in the application of computer vision where perceptual space is taxonomized into binary information that enable executable command. In bringing transparency to the implementation of a data-collection and data-driven policy action structure, I use an artistic lisence to try to rethink the authority behind truth by requiring a different type of meaning-making. Perhaps, by interpreting the perceptual faculty of a machine using one’s own subjectivity, one might hope to reverse their immobility.