What do Google, Yahoo, AOL and Microsoft’s MSN know about you?


By Elise Ackerman
Mercury News

America’s top four Internet companies, Google, Yahoo, AOL and Microsoft’s MSN, promise they will protect the personal information of people who use their online services to search, shop and socialize.

But a close read of their privacy policies reveals as much exposure as protection.

The massive amounts of data these companies collect, which can include records of the searches you make, the health problems you research and the investments you monitor, can be requested by government investigators and subpoenaed by your legal adversaries.

But this same information is generally not available to you.

The risk is that personal information that can be traced to you will at some point be provided to someone else, like the 20 million AOL searches that were published on the Internet at the beginning of August and are now causing random AOL users to admit that they looked for “movies for dogs” or “welley shoes.”

Two months ago, the San Jose Mercury News began asking the Big Four Internet companies to clarify their privacy policies. The newspaper wanted to know precisely what information was recorded when someone made a date on Yahoo, sought help for addiction on MSN or plotted their daily peregrinations on Google maps.

How long was the data kept? Could someone’s Internet searches be cross-referenced with their horoscope habit? Could a person find out exactly what was stored about him or her? Could a person ask Google, Yahoo, AOL or Microsoft to delete that data?

How often was personal data being requested by law enforcement? Could someone subpoena someone else’s searches in a civil suit? Was this happening?

Few answers were forthcoming.

Google and Yahoo both said they kept data “for as long as it is useful.” Microsoft said it kept data “based on needs to run and maintain our online services effectively while protecting user privacy.”

AOL said in an interview that data was retained for “roughly up to 30 days” — but that turned out to be not entirely true.

The companies declined to provide any details about how often user information was given to law enforcement or to others.

“If these companies can’t give definitive answers about how they are handling this incredibly sensitive and private information, Congress needs to demand answers from them,” said Kevin Bankston, an attorney for the Electronic Frontier Foundation, a civil liberties group that has asked the Federal Trade Commission to investigate AOL’s disclosure of search records.

A few weeks after the Mercury News made its request to the companies, AOL published the searches of approximately 658,000 AOL users on a public Web site as part of an effort to share data with researchers. The searches, which were done from March to May, provided an incredibly intimate glimpse into the life of the searchers.

On March 1, AOL User 310416 looked for “how to self induce your own labor.” A few days later she searched on “true contractions,” then she did an “inmate search,” which took her to the Illinois Department of Corrections. Later in the month, she searched for “bedbugs” and “matress sets in illinois.”

AOL User 792334 looked for “aol privacy guard,” before progressing to “tan ropey bowel movements” and “symptoms of parasites.”

Some users looked for child pornography and sex partners. Others sought the “best way to avoid jury duty” and “misdamenor extradition to alaska.”

According to an analysis done by the Electronic Frontier Foundation of the AOL data, 106 users typed in what appeared to be Social Security numbers. More than 3,700 users typed in what appeared to be phone numbers, while more than 4,000 users entered what appeared to be a street address.

All of which showed how easy it would be to track a person down through their searches.

“Search logs are quite possibly the single most revealing record that we’ve ever had ability to create,” Bankston said. “They’re practically a printout of the goings on in your brain.”

While AOL is unique among the Big Four in that its users are easily identified by an AOL user name after they have logged in, people who frequent Google, Yahoo and MSN are also monitored by a combination of digital tracking systems.

First, there is the IP address that is assigned to every computer each time it connects to the Internet. Internet service providers, such as AT&T and Earthlink, retain records of all the IP addresses given to a particular subscriber for periods ranging from 30 days to seven years.

The Internet companies log these IP addresses every single time information is requested from their servers. In other words, there is an IP address associated with every Google search, every Yahoo video and every game played on MSN.

Once a person has an IP address, they can request a court order forcing Google to turn over the searches, or other user information, associated with that IP address.

The Big Four also employ unique alphanumeric strings, called “cookies,” to track their users as their IP addresses change. For example, Google typically installs one cookie in the browser of a person who wants to use its search engine and upwards of 10 to use Gmail. Yahoo installs four cookies to use its home page, plus one from the advertising network DoubleClick.

Users can delete the cookies, but the companies require that new cookies be accepted in order to use popular services, like Yahoo mail or the Google’s Notebook. These services require registration, and allow the companies to continue matching IP addresses with a particular user.

“Once you register with Yahoo! and sign in to our services, you are not anonymous to us,” Yahoo’s privacy policy states.

Microsoft’s policy says: “The information we collect may be combined with information obtained from other Microsoft services and other companies.”

The record of an online life can thus be assembled almost minute-by-minute by combining logs from different services recording not only Internet searches, but all other activity.

“This is sometimes possible, but it depends on which services and can be extremely difficult to do because the services were not built to do this,” said Nicole Wong, a Google lawyer, in a written response to the Mercury News.

“The concern is that as more and more data is stored electronically, the more risky the situation becomes,” said Joe Kraus, chief executive of JotSpot, and a co-founder of Excite, an early search engine and Internet portal.

Concern about computerized record keeping dates back to at least 1973, when a federal task force recommended a “Code of Fair Information Practice,” to protect citizens against “arbitrary and abusive record-keeping practices.”

The code had five tenets: There must be no secret record-keeping system. There must be ways for people to learn what information is kept about them and how it is being used. There must be ways for people to keep information that was obtained for one purpose from being used for another. There must be a way to correct inaccurate information, and organizations that collection information are responsible to prevent its misuse.

The code formed the basis of a number of federal and state laws, including the Electronic Communications Privacy Act and the Video Privacy Protection Act, which requires video-tape rental records to be destroyed after one year.

“The problem today is that privacy policies that the private sector is relying on do not provide the same type of protection as traditional privacy laws or codes of fair information practices would, because the companies do not clearly take on responsibilities and the policies do not clearly give individuals rights,” said Marc Rotenberg, executive director of Electronic Privacy Information Center.

“In fact many of these privacy policies actually operate as disclaimers or waivers.”

The Big Four defend their policies, saying they provide clearly written descriptions of how personal data is collected and used.

“Microsoft maintains a commitment to protecting the privacy of our customers and works very hard to develop notification approaches and processes that make it easier for our customers to understand how their information is used,” Peter Cullen, Microsoft’s chief privacy strategist, wrote in a letter to the Mercury News.

Indeed, the Big Four all collect personal information for the same reason: To make their services better and to provide a targeted audience to advertisers.

Already worth billions of dollars, online advertising is projected to reach $29 billion, or one tenth the U.S. total advertising spending, by 2010, according to research company eMarketer.

The more precisely Internet companies can match a user with an ad, the more money they stand to make. Hence the drive to amass personal information by the gigabyte.

Experts say the concentration of personal data kept by the biggest Internet companies is unprecedented — and potentially dangerous.

“Imagine that your life is recorded in such a way that never happened in the history of mankind and that information can be discovered in the course of litigation,” said John Palfrey, executive director of the Berkman Center for Internet & Society at Harvard University.

None of the Big Four would respond to questions about the nature or number times they have provided a user’s information to a third-party.

In sworn testimony to Congress in June, John Ryan, AOL’s chief counsel, said AOL was receiving over 14,000 subpoenas a year, not including search warrants or other orders related to suspected criminal behavior.

Local prosecutors say Internet companies are generally cooperative with criminal investigations, but they could not quantify the number of times any one company has been approached. They said investigators typically get search histories from log files on a suspect’s computer.

That is where North Carolina investigators looked for information about Robert Petrick after his wife’s decomposed body was found in Falls Lake. Prosecutors in Petrick’s murder trial told jurors that he had searched Google for words “neck,” “snap” and “break.”

Four days before he reported his wife missing on Jan. 22, 2003, they said he also researched the level of the lake, water currents and boat ramps.

Petrick was found guilty largely on circumstantial evidence, including the Google searches.

In retrospect, Assistant District Attorney Mitchell Garrell said it might have been more efficient to ask Google directly for the information because investigators had spent months sifting through approximately nine gigabytes of data on Petrick’s computers.

Jack King, a spokesman for the National Association of Criminal Defense Lawyers, said Internet searches are likely to be increasingly used to develop leads in criminal cases and to investigate people without their knowing.

“I predict it will be even bigger in the civil litigation field,” he added, noting that use of electronic records was initially embraced most enthusiastically by civil litigators.

In order to get your search data or other Internet information, a legal opponent in a civil suit would only have to ask for it.

A court order for producing documents, known as a subpoena, can be written and signed by an attorney of record in a civil case in California.

“As a general principle there are relatively weak standards for protecting that data,” said Eric Goldman, director of the High Technology Law Institute, Santa Clara University School of Law.

After receiving a subpoena, the Big Four say they notify the person whose information is involved and give them time — usually about two weeks — to fight it in court. (An exception are e-mails, which according to a recent California appeals court ruling must be subpoenaed from the people who created them.)

It may be difficult for many people to make a legal argument protecting their information, in part because few people can remember what they have searched for. The Big Four generally won’t let you review the data they have collected.

The exception is an option called “personalized search,” which is offered by Google and Yahoo. By logging in, users allow these companies to keep track of their searches regardless of changing IP addresses or cookies. In exchange, users get to see their search histories.

AOL also allows users to review searches, but only for 30 days. Afterwards, the searches are stripped of user names and IP addresses and saved indefinitely for research purposes. These are the searches that were published on the Web in early August.

While AOL’s mistake is unlikely to be repeated, attorneys say there is nothing to prevent search histories from becoming standard evidence in court.

At that point, the searches will no longer be in any way anonymous, and the intimate, awkward and curious stories they tell will become part of the public record.


3 thoughts on “What do Google, Yahoo, AOL and Microsoft’s MSN know about you?

  1. Privacy is the hot topic to discuss but people on the Internet are willing to share their personal information eventhough they knew that their privacy has been captured.
    Anyway, I would like to introduce you khmer blog website that I just created for khmer so khmer can get their own blog like in wordpress.com at http://khmerblog.com
    Please let me know if you want to move your blog to tinyworld.khmerblog.com 🙂
    Any comment and suggestion are appreciated.


  2. I frankly don’t care if these companies know what I search. I search torrents, books, anime and other random things, sometimes including porn. I have nothing to hide. Basically it comes down to those that have things to hide caring. Also, some of these companies, such as google uses the information to help bring you more accurate results.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s