Sunday, April 19, 2015

Apple #708: Behind the Daily Apple -- Constructing a Search Query

After my last entry (on movie trailers), Daily Apple reader Mahalia wanted to know how your Apple Lady does what she does.  As she put it, how does an entry get researched and written, and how do I find the answers to people's questions.

I told her I Google it.

Which made her laugh and say, OK, but it seems like my entries are more focused and thorough than a list of Google results, and there are complementary visuals, too.  So I said I was being flip, that there is more to it than just typing words into a Google search box.

And actually, when I've looked things up for other people, they'll say, "How did you find that?"  I'll show them the search I did, and they'll say, "I wouldn't have thought to do it that way."  I don't think what I do is anything particularly magical, in fact it seems pretty obvious to me, but then, I learned this skill in library science school years and years ago, so at this point, it is second-nature to me.  Not everyone knows how to put together a good search query.  And since that is where every Daily Apple begins, let me start there.

The Google search box. The secret to your Apple Lady's success.


  • There are lots of search engine websites out there -- Google, Bing, Dogpile, Ask, even AOL -- but I search Google.  It's the biggest.  Meaning, Google's web crawlers hit more pages of the internet than anyone else's do.  So with one search, I get results from the highest number of pages possible.
  • Dogpile is a meta-search that searches several search engines at once, so you would think that would get more inclusive results.  But in my experience, Google beats them anyway.  And by "beats them," I mean gives you a wider variety of results which are of better quality.
  • By "better quality," I think that might be best demonstrated when I show you the results of some sample searches.
  • Another reason I like Google is it allows you to use some advanced searching tools, like quotation marks, a symbol that means "or", and it also does automatic truncation.  What I mean by that will become apparent in a bit.

Constructing Your Query

  • The first part of figuring out what to type in the search box is deciding what you're looking for.  This sounds like a "duh" moment, but as in all things, it really does help to define to yourself what you're doing before you doing it.
  • Here's an example.  Once upon a time, a Daily Apple reader Dan asked me to do an entry on disposable lighters.  Specifically, he asked, 
What about doing a daily apple entry about the disposable lighter? i know zippos and other butane lighters have been around for awhile, but what about the plastic disposable Bic? So ubiquitous these days.
  • So I Googled "disposable lighters."  The first thing to note here is that I typed in my search in quotations marks.  As you see it here, I typed into the Google search box "disposable lighters" as opposed to disposable lighters.
  • This is important because the quotation marks mean I've told Google I want it to search for those two words next to each other, as a phrase (a.k.a. phrase searching).  If I had omitted the quotation marks, Google could have returned any results with disposable in one part of the document and lighters anywhere else in the document.  So I could have gotten results that might have had nothing to do with disposable lighters at all.
  • Google is smart enough, though, that even if I had omitted the quotation marks, it would automatically put the hits where disposable is next to lighters at the top of the list.  Not every search engine does that, and this is another reason to prefer Google -- it automatically ranks its results for you so that the results that most closely match what it thinks you want are put at the top of the list.
  • But for our purposes, the results I got from Googling "disposable lighters" weren't the best for putting together a Daily Apple entry.  Which is to say, the things that came up first were all shopping-related.  Links to pages where they're sold on Amazon, and other online stores.  It would be no fun to read a Daily Apple entry on "Where can I buy disposable lighters, and how much do they cost?"  That wouldn't be answering what Dan was asking, either.
  • So how should I narrow the "disposable lighters" field?  Since Dan's question referred to other lighters that had been around for a while, I chose to investigate the history of them -- when were they invented -- and how had they become so popular.
  • So I modified my search to "disposable lighters" history. 
[Editor's note about screenshots: Blogger does not allow screenshots to be copied & pasted into an entry. That would be way too easy.  They must be saved as images & uploaded. Doing that changes all sorts of things about the image -- mainly makes them too small. So if you want to read what the screenshot-images actually say, you have to click on them to see them in an englarged photo viewer. When you're done, click the x in the top right corner of the photo viewer to return to the blog.  Total pain in the behind, I know. But you can thank Blogger for this.]

Today's Google search results for "disposable lighters" history. One of the things I don't like about Google is that they pay their bills by putting paid-for links to product purchases at the top of their search results. This would be like walking up to your friendly local librarian and saying, "I need to know about the history of aspirin," and before telling you where to find information about that, she would whip out a bunch of samples and ask you rapid-fire, "Would you like to buy this aspirin? This other kind of aspirin is very popular, perhaps you would like to buy some of that instead." And you would answer her, "But I just want to know about aspirin."

Google has also taken to providing you with images that match your search results, which can come in very handy sometimes.  For example, if I've seen a type of bird and I type in what I think is its name or else my description of it, those images will show me if I've got the right name for the bird I've seen or will help me choose among slightly different images for the one that matches most closely with the bird I saw, and that will take me to a page that tells me all about that bird.  In other words, the image search sometimes helps me narrow my choices by kind or species.

Finally, please to note the hit that comes up at the bottom, just before the image results.

Natural Language Queries

  • Some people type in their searches as natural language requests, meaning they simply type in the sentence that is their question: when were disposable lighters invented? Or, what is the history of disposable lighters?
  • Back in the day, when online databases were all proprietary and expensive, they worked only literally.  If you typed in a search like that, you would be telling the database to find every instance of the word "what," every instance of the word "is," every instance of the word "the," and so on, and then combine the results to show you only those items that contained each one of the words in that query, regardless of where they happened to appear in the document.  Can you imagine, searching the entire internet for all pages that have the word "is"? 
  • So we online search librarians learned to ask the databases to search for only those words that are the most important.  We would leave out the little, omnipresent words like "is" and "the" and "where."  In fact, some databases got smarter and wouldn't even search for those words. Because they appear so often, it would take the database forever to retrieve them all, and it wouldn't even be that useful to list them.  These little omnipresent words are therefore often referred to as stop words.
  • Let's remove the stop words from our natural language request and see what we've got left: what is the history of disposable lighters?  What remains is history "disposable lighters."  This, by the way is known as a keyword search.  This is the kind of search I do most often.
  • The clever folks at Google have over the past several years made their search engine smart enough to handle natural language requests.  So if you were to type what is the history of disposable lighters into the Google search box now, you would likely get pretty good results.   

Today's results for a natural language search in Google for what is the history of disposable lighters.

  • Well, that's interesting. There are no paid-for links at the top of these results.  There are no Image results, either.  News about Google reports that they've been putting a lot of effort into making their search algorithms even better at processing natural language queries. This is because a) most people tend to type in natural language queries and b) lots of people have smart phones, and they're relying on apps to give them search results in very targeted areas. So Google has to make their search engine competitive and give better, more targeted and focused results for natural language queries.
  • So maybe my technique of using keyword searching is becoming outmoded.  I do use natural language queries occasionally. If a topic is especially arcane and I'm having trouble finding something with just keywords, I will give a natural language query a shot, to see if I get different results, or maybe one hit that's relevant that will give me some more information that I can then use as the basis for another, better search.
  • But based on these results here, maybe I should try the natural language method more often.  But this is only one example, and in general, I feel like I have better control over the results I get with a keyword search, so probably for some time longer at least, I'll continue to use the keyword method.  

Why it's Important to Define Your Question

  • But now let me get to the ultimate point I was trying to make with this disposable lighters example.  After I posted that entry on disposable lighters and Daily Apple reader Dan had a chance to read it, I asked him what he thought, if it answered his question.  He said actually, no, it hadn't.  
  • He said what he was really thinking of was those great big huge rafts of plastic that are floating around the oceans, how there are often plastic disposable lighters among those piles of floating trash, and how birds or fish eat them and are killed.
  • Well, that is a far more depressing and specific topic than the question he gave me.  But he did not tell me anything about that.  He did not narrow his topic further what exactly it was about disposable lighters he wanted to know.  Based on the question he sent me, I assumed he wanted a general history.  
  • As every good reference librarian knows to do, I should have asked him to verify my assumptions, and perhaps narrow the topic more specifically.  I should replied to him saying, "I think what you want to know is the history of disposable lighters, right? Who invented them and when and so on?"  And he could have answered back with, "Well, really what I want to know about is how all those plastic lighters get in the lakes of floating trash."  Then I would have researched that.  And we would have a very different Daily Apple about disposable lighters.  I probably would have called it Lakes of Floating Trash or something else, rather than Disposable Lighters.
  • But this is important for you to keep in mind, too, as you're searching the web.  Let's say you're out with your friends and you're talking about lions, and someone says, "You know, it's the females who do all the hunting." And someone else says that's not true, the males hunt too, and someone else says they do not, the males are useless, and it gets rather heated and gender-angry.  You want to put a stop to this by finding the answer on your smart phone about who does the hunting among lions.  Everybody's kind of drunk, and the argument is starting to spiral out of control, so you want to find the answer quickly.
  • If you were to type lions into your smartphone's Google search box, you would get a mishmash of everything in the world about lions. And I mean all kinds of lions.

Today's results of searching Google for the word lions.

  • You'd get the link to the Detroit Lions homepage, news about lions, news about Lions Clubs -- nothing even close to what you want to know.
  • So you have to go back and ask yourself what it is about lions that you want to know.  You want to know which lions do the hunting, the males or the females.  
  • Cross out all the little words in that sentence, and what do you have left?  Which lions do the hunting, the males or the females?  I haven't crossed out the or because that is a very useful word.  More on that in a minute.
  • Let's pretend you are in such a hurry, you decide you just want to know about lions and hunting. So you type lions hunt into the Google search box.

Today's results of searching Google for lions hunt.

  • Well, this is better. The very first hit is a link to a video showing male lions hunting and making a kill.  So that could be your answer right there.  You might be satisfied with saying, "Look, here's a video that shows male lions hunting.  Proof positive, male lions hunt!"
  • But someone could come back and say, "Oh yeah? Well, that's just one video. That probably hardly ever happens.  The females do most of the hunting.  Disney told me so."
  • That link toward the bottom of the page, How lions hunt, seems promising.  You click on it (as I did) and skim it but discover that it says nothing about males or females.  It has a lot of interesting information about how lions stalk their prey, how they aren't that fast so they have to hide and lie in wait for a long time, how they are very, very patient, waiting until the prey wanders close enough, and then they rush out and pounce.  It even says that lions don't even use wind direction in their favor all that much, they're just patient and they time their pouncing very carefully.
  • All that is very interesting, but it doesn't answer the question at hand.  And meanwhile, your group is getting more heated, starting to raise their voices, so you need to find an answer, and quickly.
    • Before I continue, I want to insert a note about automatic truncation.  You'll notice that Google automatically provided results that use all sorts of versions of the word hunt -- hunting, hunts, hunted.  
    • I don't know whether Google incorporated hunter and hunters or not, or if those results were less relevant and so appear farther down the list.  It could be that Google is smart enough to know that lion hunters is a different thing that lions hunting and sorted the search results accordingly.
    • But the fact that Google automatically searched for these various forms of the same word means that it automatically translated hunt into hunt*, where the * stands for any suffix that might follow.  This is called automatic truncation.  It can save you a lot of time and forethought, combining all sorts of relevant results that you might not have thought to gather on your own.
    • Another thing Google sometimes does is to automatically search for synonyms for your search terms.  In this case, it looks like it also searched for attack as a synonym for hunt.  More on that in a bit.
    • Let's pretend for a minute that you typed lions hunters, and you wanted Google to search for only those words exactly.  Let's pretend you didn't want it to find any synonyms or alternate versions of either of those words.  Then you would tell it to do a Verbatim search.
    • To do a Verbatim search, click on Search tools. In the mini-window that appears, choose Verbatim. That's it.  Google will then search only for the words you've entered as you've typed them.

Screenshot of how to do a Verbatim search. You'll have to click on the image to see it clearly (thanks, Blogger, for the obfuscation).

Synonym Searching -- The Magic of Or

    • But let's get back to that concept that just came up, which is searching for synonyms. Let's pretend Google didn't search automatically for synonyms, but that you wanted it to do so.  Let's pretend you were thinking expansively, as old-school online searchers know to do, and you knew that there might be pages that might not use the word hunt, but maybe they'd use a different verb that means something similar, so those pages would still be relevant.
    • In order to find those other pages, you would need to think of all sorts of verbs that mean roughly the same thing as hunt--attack, kill, stalk, chase. You would want to find all the pages that use any of those verbs: hunt or attack or kill or stalk or chase.  I'm emphasizing the word or here because it's important -- because of set theory.
    • When you say or, you mean you want both.  Remember in elementary school when you learned about the intersections and unions of two sets?  Intersections are when you want to know about the place where the two sets overlap and only where both things are true -- in search language that's and.  Unions of two sets are when you want to know about everything in both of the sets together.  In search language, that's or.
    • If you just typed or into your search string, Google or any search engine might not know that you mean that word as a search function.  It might think you want it to search for the actual word or, or it might treat it as a stopword.  In order to eliminate that confusion, most databases substitute a special character that stands for the set-theory-meaning of or.  In Google's case, that special symbol is |.
    • On a typing keyboard, you can find that symbol above the \ key.  You would type [shift +] \ to get |.  On a smartphone keypad, I have no idea where that symbol is.  Those keypads are annoying and impossible to use anyway and I hate them with an electronic passion.  Ahem.
    • So if you wanted to find lions hunt or attack or kill or stalk or chase, you would type that into Google as lions (hunt | attack | kill | stalk | chase).  You would group all the synonyms together into parentheses because search engines work like math.  
    • Remember how in basic algebra, you learned that you're supposed to do the math of the things in the parentheses first, and otherwise do the math from left to right?  Well, without the parentheses, Google would do the search math from left to right.
    • If you were to type in lions hunt | attack | kill | stalk | chase, instead of searching for lions and [all the rest of the verbs that mean to hunt], it would search for lions and hunt, or any pages that use the word attack, or any pages that use the word stalk, or any pages that use the word chase.  So you would get a whole bunch of pages on attacking in general, stalking in general, chasing in general, and there might be a few about lions hunting.  Which is not what you want at all.
    • Now, as we've learned, Google is smart enough to know not to do this -- most of the time.  But you can't always count on Google being smart enough to know what you mean every time.  So you'll get the best results the most often if you use the terminology and search strategy that is certain to get you the more accurate results.  Otherwise, it could be a case of garbage in, garbage out.
  • Now that you know about the magic of or and the importance of parentheses, you know to refine your search about lions hunting thusly:  lions hunt (males | females).
  • You enter your search this way because you want to find out about how lions hunt, specifically in regards to gender, whether they're female or male.  You don't want to know about how only male lions hunt, or only about how female lions hunt, you want to know about either one.  So you know to use the operator that means or.  You also know to put males and females in parentheses so that Google will know to or those two concepts together, and then and them with the concepts of lions and hunt.
  • So as I've said, you type lions hunt (males | females) into the Google search engine box. The results?  Bingo.

Today's results of searching Google for lions hunt (males | females)

  • Here, in stunning, tiny display are the results.  As you can see when you bring the screen really close to your face and squint hard, our first two hits are from Wikipedia and Yahoo Answers. They both say that both male and female lions hunt.  But you know that Yahoo Answers isn't all that authoritative, and Wikipedia can be a good place to start for information, but you should always verify anything you find there with at least one other source.
  • You notice a link to a UPI article from 2013.  You know that UPI is generally a very reliable news source, and 2013 is pretty recent. The headline says that what we thought was true about male lions has been shown to be something different, which suggests there will be a nuanced answer here.  You know that the truth often is nuanced, so this looks like a promising result.  
  • You click the link to the UPI article (as I did) and you discover that researchers have learned that female lions hunt, and so do male lions.  The difference is that female lions hunt cooperatively, in groups, at night, and in areas of open vegetation.  Male lions hunt less cooperatively, often solo, during the daytime, but in areas with lots of vegetation where it is easier for them to hide and stalk their prey. 
  • You could make a lot of guesses or assumptions based on this information.  Maybe the reason we thought only female lions hunt and males don't is because it's easier for us to see females hunting in groups in the open, even if it is in the dark of night.  We didn't see the male lions hiding in the brush, so we didn't think they were hunting.  Or maybe, because of changes in habitat (probably human-induced) there is less vegetation in which the males can hide and stalk prey, so we don't see them doing this as often, and so we assumed the males don't hunt.
  • If I were doing a Daily Apple on this topic, I would investigate those further guesses and assumptions and report on those findings to give you some more explanation and context.  
  • But since you are sitting among your group of friends who are getting more heated over this argument about male and female lions hunting, and it's about to break into some kind of gender war, this is enough information for you.  You announce to the group, "Male and female lions both hunt.  They just do it differently."
  • Your friends, who have worked themselves into a state where they are ready for a full-blown fight are a little disappointed not to have the fire to further their fight, that instead the water of truth has doused their ire.  But they know the right answer when they hear it, and they settle down.

Male lion hunting an eland. People tend to believe pictures more than words, so here's a picture.  We'll talk about how to find images in the next entry.  The eland got away, by the way.
(Photo from Africa Geographic)

Here's a female lion hunting a water buffalo, by herself, in the daytime.  Maybe there are other female lions nearby, I don't know.  But it's probably a good idea not to generalize too much about animal behavior.  As we who are animals ourselves know, we don't always conform to type. Especially where gender is concerned.
 (Photo from Facts Legend)

What You Know Now

  • Now you know how the Apple Lady begins working on every Daily Apple question.  
  • More importantly, now you know how you can construct a targeted search yourself.
  • Even if you didn't absorb all that stuff about putting your search terms in parentheses, or using the | character, or searching for synonyms, you probably absorbed the fact that Google is pretty smart.  Smart enough to parse your search for you so that, most of the time, even if the terms you type in are kind of off the mark or not that precise, Google's search logarithms will compensate for that and give you good results anyway. 
  • Maybe you even took in the fact that if you typed in your search as a question, the same way you might ask Siri a question (she's just a pretty voice on top of a search logarithm anyway), Google will give you pretty decent results.  And maybe even with fewer ads.
  • Hopefully you have also gleaned a much bigger-picture concept, which is that looking up information online is pretty easy.   In fact it's pretty dang easy, thanks to the internet and extremely well-engineered search engines like Google, to find out the answers to all sorts of questions.  But you probably already knew that from how often you use your smartphones to settle debates among your friends. 
  • But it's the internet and search engines that make all this possible.  Where once upon a time you had to go to the library -- and I dearly love libraries -- and look things up in dictionaries and encyclopedias and card catalogs -- and I dearly love all those things -- now you don't have to wait until the library is open and go there and search through those books -- oh, how I love books -- to slowly compile the answer to a question.  
  • Now all you have to do is type a few words into a box and push a button, and you get a raft of answers, links to a huge array of knowledge that has been built and constructed by scads of people over the course of decades.  I mean, when you think of the enormity of information at our disposal, the amount of people-hours and person-thought that's gone into all the knowledge that pops up almost immediately from one search, and you can tap into all of that and learn from it so quickly, it is beyond staggering.  It gives me the chills.
  • And we are only going to learn more.  We are only going to get better at this business of learning and teaching ourselves stuff.  If one day we learned things the way they do in the first Matrix movie, where they plug Neo into a machine, push a button, and he jerks a bit, and then he says, "I know kungfu," I think I would melt with delicious joy.  That sort of method of learning is probably not even that far in the future.  The more we know, the easier it is for us to learn more.  I'm repeating myself all over the place because I'm so excited thinking about it.
  • The more immediate point here is this: what I do here on this blog is not that difficult.  You can find the answers to your questions so easily.  Type a couple words into Google and push Search.  See what you find.  I guarantee you'll learn something.  It'll be good.

No comments:

Post a Comment

If you're a spammer, there's no point posting a comment. It will automatically get filtered out or deleted. Comments from real people, however, are always very welcome!