Creating associative or hash arrays in bash using sed and strings without the use of arrays, looping and conditionals.
Hashes are a certainly very important part of any language. If you're not used to hashes, you may not see their potential at first. However, having used them in several languages now, hashes always ended up reducing my code significantly especially when only complex solutions would only do otherwise. However, bash or ksh for that matter, don't really come with such a construct. Reading up on how to do such a thing really didn't provide any elegant solutions that I wanted for most scenarios. What I really wanted is something that had these features:
- I needed to define my hashes easily similar to the pearl construct %hash= { "key" => "value", "key1" => "value2" }
- I needed easy key / value retrieval without excessive code and something with minimal implementation that's easy to use in looping and conditional constructs.
- Finally, some flexibility so I can have the freedom to define various kind of hashes depending on my needs.
- Avoid (NOT) using arrays (define -a array1), conditionals (if, case ) and loops (for, while, do)
Well what I did is using a combination of strings and the unix sed utility to accomplish all of the above in the below script (hashv retrieves values from keys. hashk retrieves keys from values):
#!/bin/bash mhash="Jan:01 Feb:02 Mar:03 Apr:04 May:05 Jun:06 Jul:07 Aug:08 Sep:09 Oct:10 Nov:11 Dec:12"; function hashv { hkey=""; mh=""; if [[ $2 != "" ]]; then hkey=$2; else echo ""; return 0; fi if [[ $1 != "" ]]; then mh=$1; else echo ""; return 0; fi echo $mh|sed -e "s/.*\([ \t]*\)\($hkey\):\([^ \t]*\?\)\([ \t]*\).*/\3/gi" } function hashk { hvalue=""; mh=""; if [[ $2 != "" ]]; then hvalue=$2; else echo ""; return 0; fi if [[ $1 != "" ]]; then mh=$1; else echo ""; return 0; fi echo $mh|grep "$hvalue"|sed -e "s/\([^ \t]*\)[:]\($hvalue\)/\|\1|\2\|/i" -e "s/.*\?[|]\(.*\)[|].*[|].*\?/\1/i"; } echo "____________________________________"; hashv "$mhash" "Dec"; hashk "$mhash" "12"; echo "____________________________________"; hashv "$mhash" ""; hashk "" "12"; echo "____________________________________"; hashv "$mhash" "Oct"; hashk "$mhash" "10"; echo "____________________________________"; hashv "$mhash" "Jan"; hashk "$mhash" "01"; echo "____________________________________";
Saving to a file hash.bash, the above gives the output:
# ./hash.bash ____________________________________ 12 Dec ____________________________________ ____________________________________ 10 Oct ____________________________________ 01 Jan ____________________________________
which is what I was looking for. It turns out that the above code is more flexible in it's implementation then I really had hoped for. If spaces and colon (:) isn't really satisfactory as a delimeter, I can easily change it to something else I know I won't be using. The string definition makes defining hashes easy, even more so then the perl counterpart I was used to (though, yes, probably not as powerfull) Which brings up another interesting point, I also have the ability to change the behaviour of this type of hash definition something I don't get in other languages.
Hope you found this useful! Enjoy and don't hesitate to write below!
[…] Might not be exactly what you’re looking for but… With regards to above, you could try this: Creating associative or hash arrays in bash using sed and strings without the use of arrays, looping… You’ll have to tweak that code for your tastes though to use different delimeters depending on […]
The above didn’t work for the following example:
[user@BS001 stage]$ cat gigahash.sh
#!/bin/bash
mhash=”abcd:55 bcda:4422 cdab:321 dabc:2131 eabcd:11″;
function hashv {
hkey=””;
mh=””;
if [[ $2 != “” ]]; then hkey=$2; else echo “”; return 0; fi
if [[ $1 != “” ]]; then mh=$1; else echo “”; return 0; fi
echo $mh|sed -e “s/.*\([ \t]*\)\($hkey\):\([^ \t]*\?\)\([ \t]*\).*/\3/gi”
}
function hashk {
hvalue=””;
mh=””;
if [[ $2 != “” ]]; then hvalue=$2; else echo “”; return 0; fi
if [[ $1 != “” ]]; then mh=$1; else echo “”; return 0; fi
echo $mh|grep “$hvalue”|sed -e “s/\([^ \t]*\)[:]\($hvalue\)/\|\1|\2\|/i” -e “s/.*\?[|]\(.*\)[|].*[|].*\?/\1/i”;
}
echo “____________________________________”;
hashv “$mhash” “eabcd”;
hashk “$mhash” “11”;
echo “____________________________________”;
hashv “$mhash” “”;
hashk “” “11”;
echo “____________________________________”;
hashv “$mhash” “bcda”;
hashk “$mhash” “4422”;
echo “____________________________________”;
hashv “$mhash” “abcd”;
hashk “$mhash” “2131”;
echo “____________________________________”;
hashv “$mhash” “cdab”;
hashk “$mhash” “55”;
echo “____________________________________”;
[user@BS001 stage]$
Now when I run the above script, “11” in
“11
dabc
” is wrong. i.e. “hashv “$mhash” “abcd”;” failed to print the correct value of 55.. it printed 11 (which is for eabcd hash key’s value). Looks like a little tweak to the above funcations would correct it.
[user@BS001 stage]$ ./gigahash.sh
____________________________________
11
eabcd
____________________________________
____________________________________
4422
bcda
____________________________________
11
dabc
____________________________________
321
abcd
____________________________________
[user@BS001 stage]$
OK..corrected the logic for hash array. Now the below script will run like a 1969 pony ass. It’s showing “55” now for abcd index …and also not worrying about “eabcd” index (as this index contains “abcd” in the string). See the comment in script for “^” and “$” characters usage in grep.
[user@BS001 stage]$ cat ./gigahash.sh
#!/bin/bash
mhash=”abcd:55 bcda:4422 cdab:321 dabc:2131 eabcd:11 giga:aks”;
function hashv {
hkey=””;
mh=””;
if [[ $2 != “” ]]; then hkey=$2; else echo “”; return 0; fi
if [[ $1 != “” ]]; then mh=$1; else echo “”; return 0; fi
## Note the placement of “^” character in grep.
echo $mh | sed “s/ /\n/g”| grep “^${hkey}” | cut -d’:’ -f2
}
function hashk {
hvalue=””;
mh=””;
if [[ $2 != “” ]]; then hvalue=$2; else echo “”; return 0; fi
if [[ $1 != “” ]]; then mh=$1; else echo “”; return 0; fi
## Note the placement of “$” character in grep.
echo $mh | sed “s/ /\n/g”| grep ${hvalue}$| cut -d’:’ -f1
}
echo “____________________________________”;
hashv “$mhash” “eabcd”;
hashk “$mhash” “11”;
echo “____________________________________”;
hashv “$mhash” “”;
hashk “” “11”;
echo “____________________________________”;
hashv “$mhash” “bcda”;
hashk “$mhash” “4422”;
echo “____________________________________”;
hashv “$mhash” “abcd”;
hashk “$mhash” “2131”;
echo “____________________________________”;
hashv “$mhash” “cdab”;
hashk “$mhash” “55”;
echo “____________________________________”;
[user@BS001 stage]$ ./gigahash.sh
____________________________________
11
eabcd
____________________________________
____________________________________
4422
bcda
____________________________________
55
dabc
____________________________________
321
abcd
____________________________________
[user@BS001 stage]$
Scripts logic for hashv and hashk funcations in the following lines is wrong in the sense that it’ll not capture all index/values if an index1 string is contained in another index2 string or if value1 string is contained in some other value2 string.
The following line
echo $mh|sed -e “s/.*\([ \t]*\)\($hkey\):\([^ \t]*\?\)\([ \t]*\).*/\3/gi”
in hashv() should be changed to:
echo $mhash | sed “s/ /\n/g”| grep “^${hkey}:” | cut -d’:’ -f2
AND
The following line
echo $mh|grep “$hvalue”|sed -e “s/\([^ \t]*\)[:]\($hvalue\)/\|\1|\2\|/i” -e “s/.*\?[|]\(.*\)[|].*[|].*\?/\1/i”;
in hashk () should be changed to:
echo $mhash | sed “s/ /\n/g”| grep “:${hvalue}$” | cut -d’:’ -f1
Then, with any set of index:value pair i.e. a:1 or aa:11 or ab:2 abab:23 or cd:2323 will work correctly.