NOTE : Implementing GLOBIGNORE
See all notes || ArchiveIssue :: Implement GLOBIGNORE #609
GLOBIGNORE is a bash-specific variable that lets you exclude files from glob expansion. It contains a colon-separated list of patterns - any file matching these patterns gets filtered out of glob results.
E.g.
GLOBIGNORE='*.o:*.tmp'
echo * # Shows all files except .o and .tmp files
This is useful for build directories, ignoring compiled files, or filtering out backup files from shell expansions.
The Problem: Glob Results Need Filtering
When you write echo * in a shell, it expands to all files in the current directory. But sometimes you want to exclude certain patterns - like object files in a build directory, or backup files ending in ~.
Bash solves this with GLOBIGNORE. When set:
- Glob patterns automatically include dotfiles (like
dotgloboption) - Files matching any
GLOBIGNOREpattern are filtered out .and..are always filtered (even if they don't match a pattern)
$ GLOBIGNORE='*.txt'
$ echo * # No .txt files in output
$ echo .* # Shows .hidden but not . or ..The Implementation
Where the Code Lives
The implementation is in osh/glob_.py in the Globber class. The key changes:
- Parse GLOBIGNORE patterns - Split the colon-separated string into individual patterns
- Filter glob results - After libc's
glob()returns matches, filter out anything matching GLOBIGNORE - Enable dotglob implicitly - When GLOBIGNORE is set, dotfiles are included in glob expansion
Parsing GLOBIGNORE
The tricky part: colons inside bracket expressions like [[:alnum:]] shouldn't be treated as separators. I handle this by tracking bracket depth:
def _GetGlobIgnorePatterns(self):
# type: () -> Optional[List[str]]
"""Get GLOBIGNORE patterns as a list, or None if not set."""
val = self.mem.GetValue('GLOBIGNORE', scope_e.GlobalOnly)
if val.tag() != value_e.Str:
return None
globignore = cast(value.Str, val).s # type: str
if len(globignore) == 0:
return None
# Check cache first
if globignore in self._globignore_cache:
return self._globignore_cache[globignore]
# Split by colon, but not inside bracket expressions
patterns = [] # type: List[str]
current = [] # type: List[str]
in_bracket = False
for c in globignore:
if c == '[':
in_bracket = True
current.append(c)
elif c == ']':
in_bracket = False
current.append(c)
elif c == ':' and not in_bracket:
if len(current):
patterns.append(''.join(current))
del current[:]
else:
current.append(c)
if len(current):
patterns.append(''.join(current))
self._globignore_cache[globignore] = patterns
return patternsCaching for Performance
Andy suggested caching the parsed patterns. Since GLOBIGNORE doesn't change often during script execution, we avoid re-parsing the same string:
self._globignore_cache = {} # type: Dict[str, List[str]]
The cache maps the raw GLOBIGNORE string to its parsed pattern list.
Filtering Results
After glob() returns matches, I filter them. This is a module-level helper function:
def _StringMatchesAnyPattern(s, patterns):
# type: (str, List[str]) -> bool
"""Check if string matches any pattern in the list.
Returns True if s matches any pattern, or if s is . or ..
(which are always filtered when GLOBIGNORE is set).
"""
basename = os_path.basename(s)
if basename in ('.', '..'):
return True
flags = 0
for pattern in patterns:
if libc.fnmatch(pattern, s, flags):
return True
return False
In _Glob(), the filtering has two branches - one for when GLOBIGNORE is set, and one for the default behavior:
if globignore_patterns is not None:
filtered = [] # type: List[str]
for s in results:
if not _StringMatchesAnyPattern(s, globignore_patterns):
filtered.append(s)
results = filtered
n = len(results)
else:
# Default behavior: just filter . and ..
dotfile_filtered = [] # type: List[str]
for s in results:
if s not in ('.', '..'):
dotfile_filtered.append(s)
results = dotfile_filtered
n = len(results)
out.extend(results)
return nImplicit dotglob
When GLOBIGNORE is set, bash enables dotglob automatically. I replicate this:
if self.exec_opts.dotglob() or globignore_patterns is not None:
flags |= GLOB_PERIODTest Coverage
The spec test file spec/globignore.test.sh went from oils_failures_allowed: 14 to oils_failures_allowed: 1 - meaning 13 previously-failing test cases now pass.
I also added a new test case to verify that . and .. are always filtered when GLOBIGNORE is set:
#### . and .. always filtered when GLOBIGNORE is set
# When GLOBIGNORE is set to any non-null value, . and .. are always filtered
# even if they don't match the patterns
shopt -u globskipdots
touch .hidden
GLOBIGNORE=*.txt
echo .*
## STDOUT:
.hidden
## ENDRunning the Tests
To verify the implementation:
# Run GLOBIGNORE spec tests
test/spec.sh globignore
# Run type checking
test/typecheck.sh
# Run in C++ mode (after building)
test/spec.sh globignore --oils-cpp-bin-shDemo
$ bin/osh
osh$ touch foo.txt bar.txt hello.py
osh$ GLOBIGNORE='*.txt'
osh$ echo *
hello.py
osh$ touch .hidden
osh$ echo .*
.hidden
osh$ GLOBIGNORE='*.o:*.h'
osh$ touch main.c main.o main.h
osh$ echo main*
main.c
Pull request :: #2505