v1.1.0 Add ASCII-only option, to mimic default RE2 behavior (#1)

* add ASCII-only option, to mimic default RE2 behaviour

This is a workaround, motivated by the difference in handling non-valid UTF8
bytes that Oniriguma has, compared to Go's default RE2.

See https://github.com/src-d/enry/issues/225#issuecomment-490043281

Summary of changes:
- c: prevent `NewOnigRegex()` from hard-coding UTF8
- c: `NewOnigRegex()` now propely calls to `onig_initialize()` [1]
- go: expose new `MustCompileASCII()` \w default charecter class matching only ASCII
- go: `MustCompile()` refactored, `initRegexp()` extracted for common UTF8/ASCII logic

Encoding was not exposed on Go API level intentionaly for simplisity,
in order to avoid introducing complex struct type [2] to API surface.

1. https://github.com/kkos/oniguruma/blob/83572e983928243d741f61ac290fc057d69fefc3/doc/API#L6
2. https://github.com/kkos/oniguruma/blob/83572e983928243d741f61ac290fc057d69fefc3/src/oniguruma.h#L121

Signed-off-by: Alexander Bezzubov <bzz@apache.org>

* ci: test on 2 latest go versions

Signed-off-by: Alexander Bezzubov <bzz@apache.org>

* ci: bump version of Oniguruma to 6.9.1

Update deb to get fix https://bugs.launchpad.net/ubuntu/+source/dpkg/+bug/1730627

Signed-off-by: Alexander Bezzubov <bzz@apache.org>

* ci: refactor oniguruma installation

Signed-off-by: Alexander Bezzubov <bzz@apache.org>

* refactoring go part a bit, addressing review feedback

Signed-off-by: Alexander Bezzubov <bzz@apache.org>

* ci: fix typo in bash var substitution

Signed-off-by: Alexander Bezzubov <bzz@apache.org>

* cgo: simplify naive encoding init

Signed-off-by: Alexander Bezzubov <bzz@apache.org>

* go: doc syntax fix

Signed-off-by: Alexander Bezzubov <bzz@apache.org>

* tixing fypos

Signed-off-by: Alexander Bezzubov <bzz@apache.org>

Kuba Podgórski avatar Kuba Podgórski

Tag #15 passed

  • Ran for
  • New branch build
AMD64
no language set
Git
Sorry, we're having troubles fetching jobs. Please try again later.