Skip to content

fix(migrations): sort migrations as ints not strings #1186

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jeffreydwalter
Copy link
Contributor

Fixes #1185

@jeffreydwalter jeffreydwalter changed the title bug(migrations): Sort migrations as ints not strings bug(migrations): sort migrations as ints not strings Apr 17, 2025
@jeffreydwalter jeffreydwalter force-pushed the jeffreydwalter-fix-migration-sort branch from 754bfd1 to 22f9eb4 Compare April 17, 2025 22:13
@jeffreydwalter jeffreydwalter force-pushed the jeffreydwalter-fix-migration-sort branch from 22f9eb4 to 60f07c3 Compare April 17, 2025 22:16
@jeffreydwalter jeffreydwalter changed the title bug(migrations): sort migrations as ints not strings fix(migrations): sort migrations as ints not strings Apr 17, 2025
@Aoang
Copy link
Contributor

Aoang commented Apr 18, 2025

Thanks for raising this issue and submitting the pull request. You've identified a valid point: the current lexicographical sort on the migration prefix isn't ideal for purely numeric prefixes like 0_, 1_, 10_, where it can lead to incorrect ordering.

Your proposed change to perform a numeric sort when the prefix is identified as a number is a good improvement for handling such cases and standard timestamps correctly.

However, implementing this change directly as the default behavior would unfortunately introduce a breaking change. For example, users who might currently have files named 01_... and 001_... and rely on the specific lexicographical ordering (001, 01) would see the behavior change (they would be treated as numerically equal: 1, 1). A more impactful example is files like 10_ten.up.sql and 2_two.up.sql; lexicographical sort yields (10_ten, 2_two), while numeric sort would yield (2_two, 10_ten), changing the execution order.

To accommodate the improved sorting logic without disrupting existing users, we could introduce it as an optional behavior. We could add a new Option to the migrator configuration that enables numeric sorting. By default, this option would be disabled, preserving the current lexicographical sorting and ensuring backward compatibility.

Furthermore, thinking about long-term flexibility, we could potentially abstract the sorting logic further. Perhaps defining a sorter interface or allowing users to provide a custom comparison function would be beneficial. This would empower users to implement their own preferred sorting strategy, catering to diverse naming conventions like the 0001_user_add_index, or any other scheme they might devise.

What are your thoughts on introducing this numeric sort capability as an opt-in feature initially? This seems like a safe way to add the functionality while we consider if a more customizable sorting mechanism is needed down the line.

Thanks again for your contribution!

@jeffreydwalter
Copy link
Contributor Author

jeffreydwalter commented Apr 18, 2025

I don't really have an opinion about how you guys solve the problem. I think giving user the ability to implement their own sorting logic is fine. I personally think using timestamps is too cumbersome. Thinking about this a bit more though, what probably makes the most sense here is to do a human friendly "natural" sort... Doing so would handle all three of the use cases you identified as expected. Really, the only behavior change would be to filenames that start with ints.

@jeffreydwalter
Copy link
Contributor Author

Perhaps something like this:

import (
	"sort"
	"strconv"
)

func sortAsc(ms MigrationSlice) {
	sort.Slice(ms, func(i, j int) bool {
		return naturalLess(ms[i].Name, ms[j].Name)
	})
}

func sortDesc(ms MigrationSlice) {
	sort.Slice(ms, func(i, j int) bool {
		return naturalLess(ms[j].Name, ms[i].Name) // reverse
	})
}

// naturalLess:
//   1. If both names start with a valid integer, compare numerically.
//      • Same numeric value → shorter literal wins (01 < 001 < 0001).
//   2. Otherwise fall back to plain lexicographic comparison.
func naturalLess(a, b string) bool {
	ai, errA := strconv.ParseInt(a, 10, 64)
	bi, errB := strconv.ParseInt(b, 10, 64)

	if errA == nil && errB == nil {
		if ai != bi {
			return ai < bi
		}
		return len(a) < len(b) // same value, shorter numeric string first
	}

	return a < b // lexicographic when either prefix isn’t a proper integer
}

@jeffreydwalter
Copy link
Contributor Author

jeffreydwalter commented Apr 18, 2025

Or this if you're not adverse to using a library:

import (
	"sort"

	"golang.org/x/text/collate"
	"golang.org/x/text/language"
)

var coll = collate.New(language.Und, collate.Numeric)

func sortAsc(ms MigrationSlice) {
	sort.Slice(ms, func(i, j int) bool {
		return coll.CompareString(ms[i].Name, ms[j].Name) < 0
	})
}

func sortDesc(ms MigrationSlice) {
	sort.Slice(ms, func(i, j int) bool {
		return coll.CompareString(ms[i].Name, ms[j].Name) > 0
	})
}

@Aoang
Copy link
Contributor

Aoang commented Apr 18, 2025

You're right, it's challenging to definitively say which sorting method is universally "better," as different users have different needs and conventions (it's hard to please everyone!). That's precisely why empowering users with the choice seems like the most flexible path forward.

Our general philosophy is to avoid breaking changes whenever possible, unless they are necessary to fix a bug that prevents the library from functioning correctly. We try to be very careful about this because introducing breaking changes, even with good intentions, can cause significant disruption and unexpected work for existing users who rely on the current behavior.

Therefore, providing the sorting capability as an opt-in feature aligns best with this principle. It allows users who need it to benefit from the new logic without negatively impacting others.

@jeffreydwalter
Copy link
Contributor Author

Great, when can I expect a fix for this then? Personally, I think the natural sort is the better solution here. You guys are going to make a mess of bun by being overly flexible and accommodating.

@bevzzz
Copy link
Collaborator

bevzzz commented Apr 21, 2025

Personally, I think the natural sort is the better solution here. [...] Really, the only behavior change would be to filenames that start with ints.

As @Aoang pointed out above, b/c is one of the primary concerns that guides our decision-making. Other projects and productive systems rely on Bun and we won't be introducing a change that may negatively impact them.

That being said, sorting filename prefixes numerically is a sensible addition and we will consider providing that as a Migrator option.

You guys are going to make a mess of bun by being overly flexible and accommodating.

Bun's documentation is very clear on what the convention for naming migration files is: timestamp + comment.

We want the library to be flexible and useful to as many users as possible; still, this does not come at the cost of reliability or consistency. In contrast, this issue/PR demands that we introduce a potentially breaking change because I [you] personally think using timestamps is too cumbersome.

Great, when can I expect a fix for this then?

The new option may arrive in one of the next releases. We'll keep #1185 as a tracking issue for this, so keep an eye on it 👍

@Aoang
Copy link
Contributor

Aoang commented Apr 24, 2025

You're right, natural sort is indeed a very sensible solution, and arguably a better default in isolation. We agree it has strong merits.

However, introducing it directly as the default behavior in the current major version (v1) is something we're hesitant to do precisely because it constitutes a breaking change. We believe avoiding such changes is crucial for maintaining stability and predictability for our existing users, which is generally a good practice, wouldn't you agree? We anticipate revisiting default behaviors like this when we plan for a future major version (like v2), where breaking changes are more expected.

Regarding implementing the option for alternative sorting methods, we'd be very grateful if you were willing to contribute that! It would certainly help get the feature available sooner. Otherwise, it will be added when I, another team member, or a community contributor has the bandwidth to implement it.

@jeffreydwalter
Copy link
Contributor Author

You have my contribution already. I'm not building your ridiculous request.

@bevzzz
Copy link
Collaborator

bevzzz commented Apr 28, 2025

@Aoang I am planning to do some work on Bun this week, will take care of introducing this additional option too, np 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Migration tool is doing lexicographical sort on timestamp part of the migration filenames
3 participants