Header Shadow Image


Lost and Found: Finding files or looking in files on a Unix / Linux system

Linux and UNIX systems come with an entire zoo of utilities to use.  Unless you know about them they're not much of use, unfortunately.  One command you may find handy, in particular is the find command.  It functions much like the DOS / Windows findstr and dir commands, but is far more powerfull.  Here are a few handy one liners to search for or iterate through a collection of files for all instances of a string and display which file they come up in:

 

$ find ./ -exec grep "Cigar" '{}' \; -print

Or search for the word within a certain subset of files in a folder:

$ find / -name "*.txt" -exec grep "Cigar" {} \;

Alretnately you could use grep or egrep:

$ grep -R -L IBM /tmp

What's the difference between grep and egrep? egrep is like using 'grep -E'.  It allows the use of extended regular expression metacharacters.  In fact there is also fgrep, which stands for fast grep. fgrep doesn't recognize any regular expressions however. looks for lines with the word "Cigar" and prints them out along with the file name they appear in. Find is also handy if you need to remove a file that consists of special characters you just can't seam to get a handle on so you can delete it. Simply use ls with the '-i' option to get it's inode then pass it along to find:

$ ls -ali
$ find . -inum 15090092 -exec /bin/rm -i {} \;

Another handy variation is to use find to locate symbolic links to a file:

$ find -L / -samefile ~root/test/test.txt|awk '{ print "ls -ailFt "$1 }'|sh

The above combination pipes single entires to awk which in turn runs ls -ailFt through sh (One of the simpler shells available on many Unix variants) to produce a more detailed list of all links to the above specified file. The option '-exec' to find is a very powerfull combination to the command. find pipes anything it locates to '{}' and runs '-exec ' on each entry. Very handy however not without it's problems. On a busy production server, find you may not want to run it since it does chew up plenty of server resources especially disk and CPU. It's not uncommon to quickly load a box with find. A good alternate to finding a file is to run:

$ locate cigar.txt

Notice, this will also list any files such as 'cubancigars.txt.tar.gz' or 'americancigars.txt' or any other cigar foreign or not. Basically anything that contains 'cigar' in it.  The locate command uses an index file previously generated by theupdatedb command / service. This is good because the updatedb command only runs periodically to recreate the index locate uses. The only downside of using locate is that it requires updatedb to be ran fairly often (once every week or even daily (on a busy system) if you have alot of files moving in and out of the system) or your index locate uses will be out of date producing outdated results. updatedb will also use as much system resources as find when it does run so it's helpfull to run either command at a lower nice (grace) value:

$ nice -n 19 find …….

$ nice -n 19 updatedb & (& = run in background)

Here are a few more variants of find (NOTE: Don't forget to omit $ from the command.) Find files belonging to a certain user ID:

$ find /home/virtual/site1/fst/ -uid userid

Find files that are only directories without traversing further:

$ find -type d -maxdepth 1

Find and apply a command to each file found (Same as first example on this page):

$ find / -name "*.txt"|xargs grep "Cigar"

The same as above except we make sure 0's and spaces don't break it:

$ find / -name "*.txt"-print0 |xargs -0 grep "Cigar"

Check if a command is 'OK' before running it:

$ find path/ -name <file/pattern> -ok <command> {} \;

Combined with xargs to produce files containing a certain string:

$ find /home/myid \( -name \*.bak -o -name \*.dat \) -print0 | xargs -0 grep -l "SomeString"

Good Luck!

Leave a Reply

You must be logged in to post a comment.


     
  Copyright © 2003 - 2013 Tom Kacperski (microdevsys.com). All rights reserved.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 Unported License